[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49585327
  
QA results for PR 1371:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16906/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49629961
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49630747
  
QA tests have started for PR 1371. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16914/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49643677
  
QA results for PR 1371:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16914/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49646830
  
The JVM fork one python daemon(daemon.py), then the daemon fork all the 
workers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1371


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49650014
  
Ah right, that makes sense. I've merged this in now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49574961
  
Are you sure about that? They're forked from Java, not from the Python 
process.

If this is the case, please suggest another way to test this. We can't add 
a bug fix without a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49577963
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49577994
  
Actually I see there are some doctests that I missed earlier, maybe that's 
okay. Though last time it failed Jenkins...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49578184
  
QA tests have started for PR 1371. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16906/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1371#discussion_r15148140
  
--- Diff: python/pyspark/rdd.py ---
@@ -48,6 +48,35 @@
 __all__ = [RDD]
 
 
+# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable 
randomized
+# hash for string
+def portable_hash(x):
+
+This function returns consistant hash code for builtin types, 
especially
+for None and tuple with None.
+
+The algrithm is similar to that one used by CPython 2.7
--- End diff --

My comment from before was deleted, but please add a link to where the 
implementation is from, or a reference to the Python source code for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1371#discussion_r15148144
  
--- Diff: python/pyspark/rdd.py ---
@@ -48,6 +48,35 @@
 __all__ = [RDD]
 
 
+# TODO: for Python 3.3+, PYTHONHASHSEED should be reset to disable 
randomized
+# hash for string
+def portable_hash(x):
+
+This function returns consistant hash code for builtin types, 
especially
+for None and tuple with None.
+
+The algrithm is similar to that one used by CPython 2.7
--- End diff --

Also explain what consistent hash code means, this comment doesn't say 
anything about the hash code of None being different across machines by default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49540402
  
Hey @davies apart from the small comments above, please add a test in 
`tests.py`. Jobs similar to the ones Matt posted would be great. Otherwise this 
might break again in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49562833
  
@Matei, our tests only run in local mode, but this issue can only be
reproduced in multi-node cluster. Do we still need it ?


On Sun, Jul 20, 2014 at 1:26 AM, Matei Zaharia notificati...@github.com
wrote:

 Hey @davies https://github.com/davies apart from the small comments
 above, please add a test in tests.py. Jobs similar to the ones Matt
 posted would be great. Otherwise this might break again in the future.

 --
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1371#issuecomment-49540402.




-- 
 - Davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49563398
  
Even in local mode, we launch multiple Python processes, one per core. Just 
set the master to local[4] or something like that. Some of our other tests do 
that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-20 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49567522
  
Even with multiprocess, the hash of None are the same, because they are
forked from the same one process.


On Sun, Jul 20, 2014 at 4:33 PM, Matei Zaharia notificati...@github.com
wrote:

 Even in local mode, we launch multiple Python processes, one per core.
 Just set the master to local[4] or something like that. Some of our other
 tests do that.

 --
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1371#issuecomment-49563398.




-- 
 - Davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-18 Thread mattf
Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49435527
  
i've confirmed that this patch addresses the reported issue...

```
 (
  len(sc.parallelize([((None, 1), 1),] * 100, 
100).groupByKey(10).collect()) == 1,
  len(sc.parallelize([(((None, 1), 1), 1),] * 100, 
100).groupByKey(10).collect()) == 1,
  len(sc.parallelize([((1, None), 1),] * 100, 
100).groupByKey(10).collect()) == 1,
  len(sc.parallelize([(((None, 1), None), 1),] * 100, 
100).groupByKey(10).collect()) == 1,
 ) = (True, True, True, True)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-18 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/1371#issuecomment-49451658
  
@mattf, Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---