[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49585327 QA results for PR 1371:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16906/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49629961 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49630747 QA tests have started for PR 1371. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16914/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49643677 QA results for PR 1371:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16914/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49646830 The JVM fork one python daemon(daemon.py), then the daemon fork all the workers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1371 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49650014 Ah right, that makes sense. I've merged this in now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49574961 Are you sure about that? They're forked from Java, not from the Python process. If this is the case, please suggest another way to test this. We can't add a bug fix without a test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49577963 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49577994 Actually I see there are some doctests that I missed earlier, maybe that's okay. Though last time it failed Jenkins... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49578184 QA tests have started for PR 1371. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16906/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1371#discussion_r15148140 --- Diff: python/pyspark/rdd.py --- @@ -48,6 +48,35 @@ __all__ = [RDD] +# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable randomized +# hash for string +def portable_hash(x): + +This function returns consistant hash code for builtin types, especially +for None and tuple with None. + +The algrithm is similar to that one used by CPython 2.7 --- End diff -- My comment from before was deleted, but please add a link to where the implementation is from, or a reference to the Python source code for this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1371#discussion_r15148144 --- Diff: python/pyspark/rdd.py --- @@ -48,6 +48,35 @@ __all__ = [RDD] +# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable randomized +# hash for string +def portable_hash(x): + +This function returns consistant hash code for builtin types, especially +for None and tuple with None. + +The algrithm is similar to that one used by CPython 2.7 --- End diff -- Also explain what consistent hash code means, this comment doesn't say anything about the hash code of None being different across machines by default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49540402 Hey @davies apart from the small comments above, please add a test in `tests.py`. Jobs similar to the ones Matt posted would be great. Otherwise this might break again in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49562833 @Matei, our tests only run in local mode, but this issue can only be reproduced in multi-node cluster. Do we still need it ? On Sun, Jul 20, 2014 at 1:26 AM, Matei Zaharia notificati...@github.com wrote: Hey @davies https://github.com/davies apart from the small comments above, please add a test in tests.py. Jobs similar to the ones Matt posted would be great. Otherwise this might break again in the future. -- Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1371#issuecomment-49540402. -- - Davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49563398 Even in local mode, we launch multiple Python processes, one per core. Just set the master to local[4] or something like that. Some of our other tests do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49567522 Even with multiprocess, the hash of None are the same, because they are forked from the same one process. On Sun, Jul 20, 2014 at 4:33 PM, Matei Zaharia notificati...@github.com wrote: Even in local mode, we launch multiple Python processes, one per core. Just set the master to local[4] or something like that. Some of our other tests do that. -- Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/1371#issuecomment-49563398. -- - Davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49435527 i've confirmed that this patch addresses the reported issue... ``` ( len(sc.parallelize([((None, 1), 1),] * 100, 100).groupByKey(10).collect()) == 1, len(sc.parallelize([(((None, 1), 1), 1),] * 100, 100).groupByKey(10).collect()) == 1, len(sc.parallelize([((1, None), 1),] * 100, 100).groupByKey(10).collect()) == 1, len(sc.parallelize([(((None, 1), None), 1),] * 100, 100).groupByKey(10).collect()) == 1, ) = (True, True, True, True) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49451658 @mattf, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---