[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-12 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45974068 Digged a little deeper and found the following. 1) ```saveAsPickleFile``` calls ```saveAsObjectFile```, which does its own grouping by a factor of 10, although

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45054683 BTW I forgot to add, but we should also add this to the PySpark programming guide. Opened https://issues.apache.org/jira/browse/SPARK-2013 to track it. --- If your

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-04 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45105819 @mateiz sure, pls assign it to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45015147 Hey sorry @kanzhang , one thing is still missing -- make the default batch size 10 instead of 1024 and add an optional batchSize parameter to saveAsPickleFile. --- If

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45015378 Yes, i was about to update it shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45023323 @mateiz made suggested changes and added doc test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45023558 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45023544 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45027222 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15406/ --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45027221 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45031538 The tests here are failing because the output of `collect` is not guaranteed to have a sort order. In other pyspark tests they just sort the output. e.g.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45032688 @pwendell thanks. I dropped the sort step in the most recent update after I saw tests on ```keys()``` and ```values()``` didn't call sort (they call ```collect()```

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45034383 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45034358 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45038749 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15417/ --- If your project

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45038745 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45041788 Looks good, thanks -- going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/755 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-45042088 Regarding the test on keys() and values(), it might be because the SparkContext was created with batchSize=2, so only one partition there has data. Not 100% sure though.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-06-01 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-4478 Ah sorry, I missed those. Then it's probably fine, though it would be cool to show one on SparkContext.pickleFile too just as documentation. --- If your project is set up

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-30 Thread kanzhang
Github user kanzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/755#discussion_r13241708 --- Diff: python/pyspark/context.py --- @@ -51,6 +51,7 @@ class SparkContext(object): _active_spark_context = None _lock = Lock()

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-30 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44685162 @mateiz I already have some tests on saveAsPickleFile method, which uses both pickFile and saveAsPickleFile. What more test cases do you have in mind? I could add one

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/755#discussion_r13165029 --- Diff: python/pyspark/context.py --- @@ -51,6 +51,7 @@ class SparkContext(object): _active_spark_context = None _lock = Lock()

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44484366 Hey @kanzhang, can you also add some tests for this? The easiest way is to add doctests in `context.py`. Look at how we create temp files in the tests for

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-23 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44049046 rebased to latest master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44049086 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44049093 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44052480 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-44052481 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15171/ --- If your project

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43403660 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-17 Thread kanzhang
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43403735 re-submit patch to change a typo in commit log (block size - batch size). --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43403661 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43404331 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43404332 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15067/ --- If your project

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43373144 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-1161] Add saveAsPickleFile and SparkCon...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/755#issuecomment-43373145 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15052/ --- If your project