Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45974068
Digged a little deeper and found the following.
1) ```saveAsPickleFile``` calls ```saveAsObjectFile```, which does its own
grouping by a factor of 10, although
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45054683
BTW I forgot to add, but we should also add this to the PySpark programming
guide. Opened https://issues.apache.org/jira/browse/SPARK-2013 to track it.
---
If your
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45105819
@mateiz sure, pls assign it to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45015147
Hey sorry @kanzhang , one thing is still missing -- make the default batch
size 10 instead of 1024 and add an optional batchSize parameter to
saveAsPickleFile.
---
If
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45015378
Yes, i was about to update it shortly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45023323
@mateiz made suggested changes and added doc test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45023558
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45023544
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45027222
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15406/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45027221
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45031538
The tests here are failing because the output of `collect` is not
guaranteed to have a sort order. In other pyspark tests they just sort the
output.
e.g.
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45032688
@pwendell thanks. I dropped the sort step in the most recent update after I
saw tests on ```keys()``` and ```values()``` didn't call sort (they call
```collect()```
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45034383
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45034358
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45038749
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15417/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45038745
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45041788
Looks good, thanks -- going to merge it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/755
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-45042088
Regarding the test on keys() and values(), it might be because the
SparkContext was created with batchSize=2, so only one partition there has
data. Not 100% sure though.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-4478
Ah sorry, I missed those. Then it's probably fine, though it would be cool
to show one on SparkContext.pickleFile too just as documentation.
---
If your project is set up
Github user kanzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/755#discussion_r13241708
--- Diff: python/pyspark/context.py ---
@@ -51,6 +51,7 @@ class SparkContext(object):
_active_spark_context = None
_lock = Lock()
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44685162
@mateiz I already have some tests on saveAsPickleFile method, which uses
both pickFile and saveAsPickleFile. What more test cases do you have in mind? I
could add one
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/755#discussion_r13165029
--- Diff: python/pyspark/context.py ---
@@ -51,6 +51,7 @@ class SparkContext(object):
_active_spark_context = None
_lock = Lock()
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44484366
Hey @kanzhang, can you also add some tests for this? The easiest way is to
add doctests in `context.py`. Look at how we create temp files in the tests for
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44049046
rebased to latest master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44049086
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44049093
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44052480
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-44052481
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15171/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43403660
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43403735
re-submit patch to change a typo in commit log (block size - batch size).
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43403661
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43404331
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43404332
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15067/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43373144
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/755#issuecomment-43373145
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15052/
---
If your project
36 matches
Mail list logo