[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6763 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112414995 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112414900 [Test build #34978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34978/console) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread SlavikBaranov
Github user SlavikBaranov commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32522805 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -223,6 +224,8 @@ class OpenHashSet[@specialized(Long, Int) T:

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32493399 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -223,6 +224,8 @@ class OpenHashSet[@specialized(Long, Int) T:

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112385110 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112385100 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread SlavikBaranov
Github user SlavikBaranov commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32510295 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -223,6 +224,8 @@ class OpenHashSet[@specialized(Long, Int) T:

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112385423 [Test build #34978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34978/consoleFull) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32511985 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -223,6 +224,8 @@ class OpenHashSet[@specialized(Long, Int) T:

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112001537 [Test build #34934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34934/consoleFull) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112000891 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112000863 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112030078 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112030001 [Test build #34934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34934/console) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread SlavikBaranov
Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111964817 @zsxwing are you talking about changing condition in `rehashIfNeeded` to something like this: if (_size _growThreshold _capacity MAX_CAPACITY) {

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32396635 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val INVALID_POS =

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111965952 I agree with the concern about the worse case scenario. Maybe the error message should be improved. `Can't make capacity bigger than 2^30 elements` will be confusing if

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32396907 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val INVALID_POS =

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-15 Thread SlavikBaranov
Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-112229759 @zsxwing Is the updated error message ok? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-14 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32377739 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val INVALID_POS =

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-14 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111801656 @zsxwing I'm not sure that's entirely safe, since the code appears to rely on rehash making more space. If it just does nothing when already at the max size, eventually

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-13 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111709951 @SlavikBaranov Could you check how `BytesToBytesMap.putNewKey` grows the capacity? I think you can use a similar approach to increase the max capacity from `0.7 * (1

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-13 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32370463 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val INVALID_POS =

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111602521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111602504 [Test build #34783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34783/console) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111611162 /cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111604237 LGTM and an important fix, potentially. Let me leave it for a short while for review. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111538075 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111538515 [Test build #34778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34778/consoleFull) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111537998 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SlavikBaranov
Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111540708 Sean, I've updated request. I've verified that the OpenHashMap works fine with 2^30 capacity. I didn't make a test for it, since it requires `-Xmx16g`

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111563561 [Test build #34778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34778/console) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111563599 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SlavikBaranov
Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111580363 @srowen Fixed, sorry about that. I wonder how could I miss it. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111580441 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111580475 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111581008 [Test build #34783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34783/consoleFull) for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32338908 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -45,7 +45,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-12 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111578197 Oh, hah, you'll have to change this test in `OpenHashMapSuite` now: ``` intercept[IllegalArgumentException] { new OpenHashMap[String, Int](1

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111270809 **[Test build #34701 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34701/console)** for PR 6763 at commit

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111270841 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread SlavikBaranov
Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111279484 I can't figure out how this failure could be related to my fix. The test I've added takes only a few seconds to complete:

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32269759 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val INVALID_POS =

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread SlavikBaranov
Github user SlavikBaranov commented on a diff in the pull request: https://github.com/apache/spark/pull/6763#discussion_r32276177 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -278,7 +279,7 @@ object OpenHashSet { val

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111359746 Yes, 2^31 is not possible at all. There are caveats to the actual max array size, yes, but this is really an orthogonal issue. I think it's best to not assert about the

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread SlavikBaranov
GitHub user SlavikBaranov opened a pull request: https://github.com/apache/spark/pull/6763 [SPARK-8309] [CORE] Support for more than 12M items in OpenHashMap The problem occurs because the position mask `0xEFF` is incorrect. It has zero 25th bit, so when capacity grows beyond

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-77564 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111215734 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111217656 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111217682 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-8309] [CORE] Support for more than 12M ...

2015-06-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6763#issuecomment-111217897 [Test build #34701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34701/consoleFull) for PR 6763 at commit