GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/15990
[SPARK-18559] [SQL] Restrict the lower bound of relativeSD in HLL++
## What changes were proposed in this pull request?
In `HyperLogLogPlusPlus`, `THRESHOLDS`, `RAW_ESTIMATE_DATA` and `BIAS_DATA`
all have the same length 15, and we probe these arrays by `p-4`, so we need to
guarantee 0 <= p - 4 <= 14. Otherwise it will cause
ArrayIndexOutOfBoundsException.
The pr also fixes the upper bound in the log info in `require()`.
## How was this patch tested?
add test case for checking validity of parameter relatvieSD
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark hllppRsd
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15990.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15990
----
commit 020fb2a4e85e26422f012c4ce1c497d797511619
Author: wangzhenhua <[email protected]>
Date: 2016-11-23T08:32:01Z
restrict the lower bound of relativeSD
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]