Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19406
I read the original paper at
http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf and
think there might be two bugs in the implementation here that might actually
cause the problem.
First, targetError should not be rounded up. The algorithm in 2.2.1 does
not show that.
Second, this starts by considering sampled(1), not sampled(0). But the impl
and algorithm in the paper seem to be 0-based.
I would try it myself but for some reason I'm getting null from your query.
I'll try again on my end but what do you think of this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]