GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/21750
[SPARK-24754][ML] Minhash integer overflow
## What changes were proposed in this pull request?
Use longs in calculating min hash to avoid bias due to int overflow.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srowen/spark SPARK-24754
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21750.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21750
----
commit aed50d7eac6e44273312f147807b4a85454f832d
Author: Sean Owen <srowen@...>
Date: 2018-07-11T19:01:30Z
Use longs in calculating min hash to avoid bias due to int overflow
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]