[GitHub] spark pull request: SPARK-12416 Jms receiver

2015-12-21 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/10367#issuecomment-166349414 @willb ptal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2015-01-05 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-59191797 Well, good luck with adding something like that to HDFS... that is not the responsibility of filesystems. just so we're on the same page, i'm not advocating

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-10-13 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-58884300 @JoshRosen @pwendell any further comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-13 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-58885115 @mattf I understand what you're trying to say, but think about it in context. As I said above, the when to poll the file system code is the most trivial part

[GitHub] spark pull request: [SPARK-3867][PySpark] ./python/run-tests faile...

2014-10-11 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2759#issuecomment-58746696 +1 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-02 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57629838 @mattf don't know what you mean by functionality that is already provided by the system. I'm not aware of HDFS having any way to automatically do housekeeping of old

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-02 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57676552 Really, this is not an expensive process that will bring down the HDFS server i'm not concerned about bringing down HDFS. the operation run from spark or a log

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-02 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57687603 Then what's your argument? Which is the system tool that this code is replacing? in this case you can separate the timer code and the function that does

[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...

2014-10-01 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2554#issuecomment-57507588 much nicer. you could even remove the doc note about backward compatibility. +1 lgtm --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...

2014-09-30 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2554#issuecomment-57304443 thanks for identifying this issue and doing the analysis. the whole business of having a separate IPYTHON env variable complicates the situation. what about

[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...

2014-09-30 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2554#issuecomment-57304596 also, 'test $IPYTHON = 1 should be written as 'test -n $IPYTHON', requiring the value to be 1 isn't very shell-ish --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18148709 --- Diff: examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java --- @@ -31,7 +31,6 @@ * Usage: JavaSparkPi [slices] */ public

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18148726 --- Diff: examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java --- @@ -61,5 +60,7 @@ public Integer call(Integer integer, Integer integer2

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18148735 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java --- @@ -61,7 +61,8 @@ public static void main(String[] args) throws Exception

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18148749 --- Diff: examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala --- @@ -44,11 +44,11 @@ object GroupByTest { arr1(i

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18152078 --- Diff: examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala --- @@ -44,11 +44,11 @@ object GroupByTest { arr1(i

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2575#discussion_r18152125 --- Diff: examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java --- @@ -61,5 +60,7 @@ public Integer call(Integer integer, Integer integer2

[GitHub] spark pull request: SPARK-2626 [DOCS] Stop SparkContext in all exa...

2014-09-29 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2575#issuecomment-57157929 +1, lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-29 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-57258363 @JoshRosen a partition itself doesn't have much in the way of a user api. it wouldn't be difficult to wrap the java objects in a python Partition. we should then start

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-25 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-56805152 RDD._jrdd is very heavy for PipelinedRDD, but getNumPartitions() could be optimized for PipelinedRDD to avoid the creation of _jrdd (could be rdd.prev.getNumPartitions

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-25 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r18027788 --- Diff: docs/spark-standalone.md --- @@ -62,7 +62,12 @@ Finally, the following configuration options can be passed to the master and wor # Cluster

[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...

2014-09-21 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2478 [SPARK-3580] add 'partitions' property to PySpark RDD 'rdd.partitions' is available in scalajava, primarily used for its size() method to get the number of partitions. pyspark instead has

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-21 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56307508 fyi - re passing warnings to driver: https://issues.apache.org/jira/browse/SPARK-516 and https://issues.apache.org/jira/browse/SPARK-593 --- If your project is set up

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-20 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-56284715 i strongly suggest against duplicating functionality that is already provided by the system where these logs are written. however, if you proceed

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2444#issuecomment-56172067 +1 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17782052 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export HOSTLIST

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56197446 for some additional input, @pwendell - do you think requiring numpy for core would be acceptable? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1701] [PySpark] remove slice terminolog...

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2304#issuecomment-56240307 oh, i deleted this. use https://github.com/apache/spark/pull/2465 instead --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [PySpark] remove slice terminology from python...

2014-09-19 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2465 [PySpark] remove slice terminology from python examples You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark master-pyspark-remove-slice

[GitHub] spark pull request: [PySpark] remove slice terminology from python...

2014-09-19 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2465 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [PySpark] remove unnecessary use of numSlices ...

2014-09-19 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2467 [PySpark] remove unnecessary use of numSlices from pyspark tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark master-pyspark

[GitHub] spark pull request: [PySpark] remove unnecessary use of numSlices ...

2014-09-19 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2467#issuecomment-56245090 @joshrosen here's another quick one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-19 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17815117 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export HOSTLIST

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-18 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17730762 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export HOSTLIST

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-18 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17730829 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ $HOSTLIST = ]; then if [ $SPARK_SLAVES = ]; then -export HOSTLIST

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56123348 @JoshRosen it looks like @davies and i are on the same page. how would you like to proceed? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56125812 thanks for the feedback. i've changed the language to be more inline with your suggestion. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-56127248 +1 lgtm fyi, i checked, deleteOnExit isn't an option because it cannot recursively delete --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56129891 that's a very good point, especially about how it's an unsolved problem in general, at least on our existing operating systems. iirc, systems like plan9 tried to address

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56130009 This patch fails unit tests. i'm getting HTTP 503 from jenkins, but i'm gonna go out on a limb and say this doc change didn't break the unit tests. --- If your

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-16 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-55781471 +1 @srowen, using signal is too heavy handed i'm skeptical that the jvm can guarantee dir removal on failure (say kill -9 or a jvm segv). those cases are hopefully

[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2365#issuecomment-55582578 thanks, +1, lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17541864 --- Diff: python/pyspark/tests.py --- @@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self): self.assertEquals(partitions[0], [(0

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17542165 --- Diff: python/pyspark/rdd.py --- @@ -353,7 +353,7 @@ def func(iterator): return ifilter(f, iterator) return self.mapPartitions

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55601210 thanks @erikerlandson. @davies @JoshRosen how would you guys like to proceed? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17556524 --- Diff: python/pyspark/tests.py --- @@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self): self.assertEquals(partitions[0], [(0

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55635108 @davies that makes sense to me. the current message is: NumPy does not appear to be installed. Falling back to default random generator for sampling

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55636896 ok, i see. detect numpy in driver, record fact, if in driver and not on worker raise the warning, otherwise be silent. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55641879 here are the cases. i decided that if the driver doesn't have numpy then the workers shouldn't try to use it. this is consistent w/ the old code and maintains

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55644374 @davies i'm still digging, but maybe you know off the top of your head. what output path in the worker can be used to report the warning to the driver? --- If your

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55650680 i've not found a path, so i'm happy to leave this PR as is and log a JIRA for enhanced worker - driver log communication. deal? --- If your project is set up

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55667347 I filed SPARK-3538 to cover the improvement/enhancement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-14 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55523684 @erikerlandson i know you've been doing some serious work w/ sampling, what's your take on this? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-14 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2301#issuecomment-55523986 @mateiz @pwendell pls take another look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

2014-09-14 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2365#issuecomment-55540972 where's the message that was sent to the pyrolite folks? it looks like SPARK-2378 s targetted for 1.2, so it has a bit of time --- If your project is set up

[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

2014-09-14 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2365#issuecomment-55540988 if you do end up merging, what do you think about logging an issue for fixing up the workaround once pyrolite is update? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

2014-09-13 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2365#issuecomment-55495366 Maybe we should wait a couple of days to hear back from the Pyrolite folks and see if they will cut a new release. +1 unless there's agreed upon urgency

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-13 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2383 [SPARK-3519] add distinct(n) to PySpark Added missing rdd.distinct(numPartitions) and associated tests You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-13 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17515406 --- Diff: python/pyspark/tests.py --- @@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self): self.assertEquals(partitions[0], [(0

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-11 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-55256738 @JoshRosen will you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-11 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55256857 @davies @JoshRosen new thoughts on the topic of non-determinism? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-3470 [CORE] [STREAMING] Add Closeable / ...

2014-09-11 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2346#issuecomment-55258618 would like a unit test, but lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-3470 [CORE] [STREAMING] Add Closeable / ...

2014-09-10 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2346#issuecomment-55149328 since spark targets java 7 8, why not just use the correct AutoCloseable? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3470 [CORE] [STREAMING] Add Closeable / ...

2014-09-10 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2346#issuecomment-55152257 It doesn't target Java 7: https://github.com/apache/spark/blob/master/pom.xml#L113 you're right. i hope it becomes java 7+ and we can move to AutoCloseable

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2335 [SPARK-3458] enable python with statements for SparkContext allow for best practice code, ``` try: sc = SparkContext() app(sc) finally: sc.stop

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2335#issuecomment-54995675 @davies i'd appreciate your input on this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2335#discussion_r17314305 --- Diff: python/pyspark/tests.py --- @@ -1254,6 +1254,35 @@ def test_single_script_on_cluster(self): self.assertIn([2, 4, 6], out

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2335#discussion_r17314582 --- Diff: python/pyspark/tests.py --- @@ -1254,6 +1254,35 @@ def test_single_script_on_cluster(self): self.assertIn([2, 4, 6], out

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2335#issuecomment-55008148 This patch fails unit tests. failures are unrelated to this patch --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3458] enable python with statements f...

2014-09-09 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2335#issuecomment-55026585 This patch fails unit tests. still not this patch's fault --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-08 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-54804358 What's the behavior if some slaves have numpy but others do not have? Yeah, this was my original concern. Do both implementations give equal results

[GitHub] spark pull request: Provide a default PYSPARK_PYTHON for python/ru...

2014-09-08 Thread mattf
GitHub user mattf reopened a pull request: https://github.com/apache/spark/pull/2300 Provide a default PYSPARK_PYTHON for python/run_tests Without this the version of python used in the test is not recorded. The error is, Testing with Python version: ./run

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-08 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2301#discussion_r17234099 --- Diff: bin/spark-class --- @@ -105,7 +105,7 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | sed 's/java version

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-08 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2301#issuecomment-54868841 rebased on new master jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-07 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2313 [SPARK-927] detect numpy at time of use it is possible for numpy to be installed on the driver node but not on worker nodes. in such a case, using the rddsampler's constructor to detect numpy

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-07 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-54752766 This patch fails unit tests. the test suite failures appear to be from hive flume and unrelated to this patch --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-07 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2301#discussion_r17219036 --- Diff: bin/spark-class --- @@ -105,7 +105,7 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | sed 's/java version

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2299 [SPARK-1701] Clarify slice vs partition in the programming guide This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually

[GitHub] spark pull request: Provide a default PYSPARK_PYTHON for python/ru...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2300 Provide a default PYSPARK_PYTHON for python/run_tests Without this the version of python used in the test is not recorded. The error is, Testing with Python version: ./run-tests

[GitHub] spark pull request: [SPARK-1701] [PySpark] deprecated numSlices fo...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2302 [SPARK-1701] [PySpark] deprecated numSlices for numPartitions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark SPARK-1701-pyspark

[GitHub] spark pull request: [SPARK-1701] remove unnecessary use of numSlic...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2303 [SPARK-1701] remove unnecessary use of numSlices from pyspark tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark SPARK-1701

[GitHub] spark pull request: [SPARK-1701] [PySpark] remove slice terminolog...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2304 [SPARK-1701] [PySpark] remove slice terminology from python examples You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark SPARK-1701

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-06 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2299#issuecomment-54714295 This patch fails unit tests. the failures are in the kafka tests and unrelated to this patch --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1701] remove unnecessary use of numSlic...

2014-09-06 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2303#issuecomment-54716879 @JoshRosen only #2299 is truly for SPARK-1701, the others are tangentially related so i tagged them along with 1701, but they can all stand alone. my hope

[GitHub] spark pull request: [SPARK-1701] remove unnecessary use of numSlic...

2014-09-06 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2303 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1701] [PySpark] deprecated numSlices fo...

2014-09-06 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2302 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1701] [PySpark] remove slice terminolog...

2014-09-06 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2304 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-06 Thread mattf
Github user mattf closed the pull request at: https://github.com/apache/spark/pull/2299 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-06 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2305 [SPARK-1701] Clarify slice vs partition in the programming guide This is a partial solution to SPARK-1701, only addressing the documentation confusion. Additional work can be to actually

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-06 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2299#issuecomment-54723659 oops, i didn't realize renaming my branch would close this PR. i'll open another. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3353] parent stage should have lower st...

2014-09-06 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2273#issuecomment-54730702 nice, lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-04 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2144#issuecomment-54566230 So you guys should figure out a way to run this so that it doesn't get stale. For example it's fine to add some code to the script that runs all the tests except

[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

2014-09-01 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2183#issuecomment-54061965 What's the problem without this patch? I remember that the JVM will shutdown itself after shell exited. davies, i went back and tried to reproduce the shell

[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

2014-09-01 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2183#issuecomment-54062440 Is it better to put atexit.register() in context.py? So all the pyspark jobs can have this. i think it's a question of who owns the context. the owner is whomever

[GitHub] spark pull request: [SPARK-3285] [examples] Using values.sum is ea...

2014-08-28 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2182#issuecomment-53719188 +1 nice catch, the simpler the examples the easier they'll be to consume by their intended audience: folks who aren't experts yet --- If your project is set up

[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...

2014-08-28 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53719576 is the testing captured somewhere so this change can be evaluated in the future, maybe against other strategies? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3273]The spark version in the welcome m...

2014-08-28 Thread mattf
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2175#discussion_r16838727 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoopInit.scala --- @@ -26,9 +28,9 @@ trait SparkILoopInit

[GitHub] spark pull request: [SPARK-3264] Allow users to set executor Spark...

2014-08-28 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2166#issuecomment-53720190 lgtm, nice idea i've been using rpm installed spark, which provides a single version and location on all nodes. however, this will make for a clear path to running

[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

2014-08-28 Thread mattf
GitHub user mattf opened a pull request: https://github.com/apache/spark/pull/2183 [SPARK-2435] Add shutdown hook to pyspark You can merge this pull request into a Git repository by running: $ git pull https://github.com/mattf/spark SPARK-2435 Alternatively you can review

  1   2   >