[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88160/testReport)** for PR 20793 at commit [`177afcc`](https://github.com/apache/spark/commit/177afcc4277b604b783aef40d86d93d6a9add6fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20793 The question is that existing output of pseudo random/sample is guaranteed by public API. It seems it doesn't. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20793 At least some tests expect that particular values would be result of sample/random: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L550-L564 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20793 Ah, results are different since the number of operations are different. It may be an issue like #20630. I am curious why test are failure when seed is changed. Of course, I understand the sequence of rand must be reproducable with certain seed value in a package or implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88156/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)** for PR 20793 at commit [`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20793 Does `hashSeed` method produce same hash value after this change? ```scala scala> def hashSeed(seed: Long): Long = { | val bytes = ByteBuffer.allocate(java.lang.Long.SIZE).putLong(seed).array() | val lowBits = MurmurHash3.bytesHash(bytes) | val highBits = MurmurHash3.bytesHash(bytes, lowBits) | (highBits.toLong << 32) | (lowBits.toLong & 0xL) | } hashSeed: (seed: Long)Long scala> hashSeed(100) res3: Long = 852394178374189935 scala> def hashSeed2(seed: Long): Long = { | val bytes = ByteBuffer.allocate(java.lang.Long.BYTES).putLong(seed).array() | val lowBits = MurmurHash3.bytesHash(bytes) | val highBits = MurmurHash3.bytesHash(bytes, lowBits) | (highBits.toLong << 32) | (lowBits.toLong & 0xL) | } hashSeed2: (seed: Long)Long scala> hashSeed2(100) res7: Long = 1088402058313200430 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20793 **[Test build #88156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)** for PR 20793 at commit [`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20793 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20793 Good catch, LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org