[GitHub] spark pull request: SPARK-3337 Paranoid quoting in shell to allow ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2229#discussion_r17158127 --- Diff: sbt/sbt-launch-lib.bash --- @@ -180,7 +180,7 @@ run() { ${SBT_OPTS:-$default_sbt_opts} \ $(get_mem_opts $sbt_mem) \ ${java_opts} \ -${java_args[@]} \ +${java_args[@]} \ --- End diff -- Ah yes, getting rid of them(and alike) as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2245#issuecomment-54588876 Seems so unlucky to be trapped in different test suite in every run. @marmbrus @rxin Can you give the patch a retest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2245#issuecomment-54588910 Jenkins is down right now ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Optimize the schedule procedure in Master
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54588914 The PR is: https://issues.apache.org/jira/browse/SPARK-3411. Cause the filter will create copy of worker, so I change the way of filtering. The shuffle will create copies too, could we change its way ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2245#issuecomment-54589163 All right... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54589212 This change makes this shutdown hook lower than `FileSystem`'s, whereas it used to be higher. Also does this compile for `yarn-alpha` too? Given the time it went in, it probably works with all supported Hadoop versions but worth checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54589602 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54589820 @srowen It's confused but lower value is higher priority. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54589954 Ah I see. That's fine, I just wasn't sure which the intent was since I think the original description is missing a word. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2260#issuecomment-54590041 Ok merging this (and removed io1 for now). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...
Github user pdeyhim commented on the pull request: https://github.com/apache/spark/pull/2260#issuecomment-54590139 And what happens when the additional EBS volumes get added? We probably want to configure spark-env.sh and spark_local_dir with the new volumes correct? the place this happens is here: https://github.com/rxin/spark/blob/ec2-ebs-vol/ec2/spark_ec2.py#L674-L678 but that snippet only configures local disks in spark-env.sh and not the new EBS volumes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2260#issuecomment-54590209 the ebs volumes are not great for shuffle (bad small write performance). Let's hold that off for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2260 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] fix minor MLlib case typo
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2278#issuecomment-54590246 Merged into master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Docs] fix minor MLlib case typo
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2278 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54590354 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19800/consoleFull) for PR 2283 at commit [`717aba2`](https://github.com/apache/spark/commit/717aba2221fe974f218f6ecbffab77162c4c94ea). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AddJar(path: String) extends LeafNode with Command ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54590502 Ah, sorry it's my wrong. I confirm the logic of ShutdownHookManager, and higher value is higher priority. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...
Github user pdeyhim commented on the pull request: https://github.com/apache/spark/pull/2260#issuecomment-54590541 @rxin ok that's correct for smaller instance types. But FYI, EBS on larger instances (and ebs optimized instances) should perform well on shuffle read/write --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3086] [SPARK-3043] [SPARK-3156] [mllib]...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2125#issuecomment-54590611 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3409][SQL] Avoid pulling in Exchange op...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2282#issuecomment-54594495 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/2284 [SPARK-3412] [SQL] Add 3 missing types for Row API `BinaryType`, `DecimalType` and `TimestampType` are missing in the Row API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark missing_types_in_row Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2284 commit 3644ffa46ac06adb0096df4f13bc03d0f3904eab Author: Cheng Hao hao.ch...@intel.com Date: 2014-09-05T07:45:57Z Add 3 missing types for Row API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54595905 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54596140 @nchammas I'm guessing your OOM issue is unrelated to this one. ``` a = sc.parallelize([Nick, John, Bob]) a = a.repartition(24000) a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y, sc.defaultParallelism).take(1) ``` After the reduceByKey above, you'd have 24000 partitions and only 2 entries in them: (4, NickJohn) and (3, Bob). This bug manifests when you have an empty partition 0 and many remaining partitions each with a large amount of data. The .take(n) gets up to the first n from each remaining partition and then takes the first n from the concatenation of those arrays. For this bug to take effect on your situation you'd have to have an empty first partition (a good 23998/24000 chance). The driver would then bring into memory 23998 empty arrays and 2 arrays of size 1 (or maybe 1 array of size 2), which I can't imagine would OOM the driver. So I don't think this is your bug. The other evidence is that you observed a regression (at least the perf numbers later in your bug) and this has been the same for quite some time. The current behavior was implemented in commit 42571d30d0d518e69eecf468075e4c5a823a2ae8 and was first released in version 0.9: ``` aash@aash-mbp ~/git/spark$ git log origin/branch-1.0 | grep 42571d30d0d518e69eecf468075e4c5a823a2ae8 commit 42571d30d0d518e69eecf468075e4c5a823a2ae8 aash@aash-mbp ~/git/spark$ git log origin/branch-0.9 | grep 42571d30d0d518e69eecf468075e4c5a823a2ae8 commit 42571d30d0d518e69eecf468075e4c5a823a2ae8 aash@aash-mbp ~/git/spark$ git log origin/branch-0.8 | grep 42571d30d0d518e69eecf468075e4c5a823a2ae8 aash@aash-mbp ~/git/spark$ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54596256 Regarding the merge, I'm guessing this is too late to land in the Spark 1.1 release. Is it a candidate for a backport to a 1.1.x? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54596774 What's the implication here for other client code of the Spark API? It looks like there are mutability concerns in whether you can save a reference to the object you get back from the iterator in mapPartitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54596928 The correct assumption is to not reuse objects. However, in Spark SQL we exploited the implementation of the old shuffle behavior (which serializes each row object immediately without buffering them) to avoid allocating objects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix for false positives reported by mima on PR...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/2285 Fix for false positives reported by mima on PR 2194. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 mima-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2285.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2285 commit c050d1b363b01aed0df0706fae87ae4f86631067 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-09-05T08:26:48Z Fix for false positives reported by mima on PR 2194. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54597321 I don't see that contract in the API documented in the Scaladoc for the method: ``` 588 /** 589* Return a new RDD by applying a function to each partition of this RDD. 590* 591* `preservesPartitioning` indicates whether the input function preserves the partitioner, which 592* should be `false` unless this is a pair RDD and the input function doesn't modify the keys. 593*/ 594 def mapPartitions[U: ClassTag]( 595 f: Iterator[T] = Iterator[U], preservesPartitioning: Boolean = false): RDD[U] = { 596 val func = (context: TaskContext, index: Int, iter: Iterator[T]) = f(iter) 597 new MapPartitionsRDD(this, sc.clean(func), preservesPartitioning) 598 } ``` Should I send a PR documenting it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2194#issuecomment-54598665 @rxin There is a reason and (workaround type of)fix for this on #2285. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix for false positives reported by mima on PR...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2285#discussion_r17161813 --- Diff: dev/mima --- @@ -25,12 +25,16 @@ FWDIR=$(cd `dirname $0`/..; pwd) cd $FWDIR echo -e q\n | sbt/sbt oldDeps/update +rm -f .generated-mima* + +./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new --- End diff -- run with just new jars first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix for false positives reported by mima on PR...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2285#discussion_r17161831 --- Diff: dev/mima --- @@ -25,12 +25,16 @@ FWDIR=$(cd `dirname $0`/..; pwd) cd $FWDIR echo -e q\n | sbt/sbt oldDeps/update +rm -f .generated-mima* + +./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new export SPARK_CLASSPATH=`find lib_managed \( -name '*spark*jar' -a -type f \) | tr \\n :` echo SPARK_CLASSPATH=$SPARK_CLASSPATH -./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore -echo -e q\n | sbt/sbt mima-report-binary-issues | grep -v -e info.*Resolving +./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore old --- End diff -- run with old jars ahead of new ones. (since new ones cant be eliminated, tools project need them anyway.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tests meant to demonstrate the bug in SPARK-26...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/1588#issuecomment-54598213 Yep good to close -- we can refer to the ticket in the future if it comes back up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2284#issuecomment-54597712 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix for false positives reported by mima on PR...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2285#discussion_r17162225 --- Diff: dev/mima --- @@ -25,11 +25,15 @@ FWDIR=$(cd `dirname $0`/..; pwd) cd $FWDIR echo -e q\n | sbt/sbt oldDeps/update +rm -f .generated-mima* + +./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new export SPARK_CLASSPATH=`find lib_managed \( -name '*spark*jar' -a -type f \) | tr \\n :` echo SPARK_CLASSPATH=$SPARK_CLASSPATH -./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore +./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore old --- End diff -- and then old too.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-54600822 @JoshRosen @andrewor14 I use codeurl.hashCode + timestamp/code as codecachedFileName/code, I believe it is impossible that existing codeurl.hashCode/code collision and codetimestamp/code collision at same time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54601247 @marmbrus Seems hive parser will pass something like a.b.c... to `LogicalPlan`, so I have to roll back(and I changed `dotExpressionHeader` to `ident . ident {. ident}`). And I have done some work on `GetField` to let it support not just StructType, but also array of struct, or array of array of struct, or array of array of ... struct. The idea is simple. If you want `a.b` to work, then `a` must be some level if nested array of struct(level 0 means just a StructType). And the result of `a.b` is same level of nested array of b-type. In this way, we can handle nested array of strcut and simple struct in same process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/2230#issuecomment-54601682 I'm not sure how to modify `lazy val resolved` in `GetField` since it handles not only StructType now. Currently I just removed the type check. What do you think? @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54602306 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't include the empty string as a default...
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/2286 Don't include the empty string as a defaultAclUser Changes logging from ``` 14/09/05 02:01:08 INFO SecurityManager: Changing view acls to: aash, 14/09/05 02:01:08 INFO SecurityManager: Changing modify acls to: aash, 14/09/05 02:01:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash, ); users with modify permissions: Set(aash, ) ``` to ``` 14/09/05 02:28:28 INFO SecurityManager: Changing view acls to: aash 14/09/05 02:28:28 INFO SecurityManager: Changing modify acls to: aash 14/09/05 02:28:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aash); users with modify permissions: Set(aash) ``` Note that the first set of logs have a Set of size 2 containing aash and the empty string cc @tgravescs You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark empty-default-acl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2286 commit cf973a1b8f202cd7fe70cf60c701c62c51d2e702 Author: Andrew Ash and...@andrewash.com Date: 2014-09-05T09:30:33Z Don't include the empty string as a defaultAclUser --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Fix for false positives reported by mi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2285#issuecomment-54604364 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19803/consoleFull) for PR 2285 at commit [`24f3381`](https://github.com/apache/spark/commit/24f338120c33d353136c056544fe59ade7696af7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't include the empty string as a default...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2286#issuecomment-54604796 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19804/consoleFull) for PR 2286 at commit [`cf973a1`](https://github.com/apache/spark/commit/cf973a1b8f202cd7fe70cf60c701c62c51d2e702). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...
Github user mubarak commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54606563 @tdas Can you please review? Thanks ![screen shot 2014-09-05 at 1 42 28 am](https://cloud.githubusercontent.com/assets/668134/4163160/b9b9b538-34e3-11e4-9fae-0e70f3ba1693.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...
Github user mubarak commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54606650 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Fix for false positives reported by mi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2285#issuecomment-54609772 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19803/consoleFull) for PR 2285 at commit [`24f3381`](https://github.com/apache/spark/commit/24f338120c33d353136c056544fe59ade7696af7). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AddJar(path: String) extends LeafNode with Command ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't include the empty string as a default...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2286#issuecomment-54610134 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19804/consoleFull) for PR 2286 at commit [`cf973a1`](https://github.com/apache/spark/commit/cf973a1b8f202cd7fe70cf60c701c62c51d2e702). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
GitHub user wardviaene opened a pull request: https://github.com/apache/spark/pull/2287 [SPARK-3415] [PySpark] removes SerializingAdapter code This code removes the SerializingAdapter code that was copied from PiCloud You can merge this pull request into a Git repository by running: $ git pull https://github.com/wardviaene/spark feature/pythonsys Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2287 commit e263bf557148ab878e656f3138f6f7cb2cd003fb Author: Ward Viaene ward.via...@bigdatapartnership.com Date: 2014-09-05T13:12:03Z SPARK-3415: removes legacy SerializingAdapter code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't include the empty string as a default...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2286#discussion_r17174246 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -162,7 +162,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { // always add the current user and SPARK_USER to the viewAcls private val defaultAclUsers = Set[String](System.getProperty(user.name, ), -Option(System.getenv(SPARK_USER)).getOrElse()) +Option(System.getenv(SPARK_USER)).getOrElse()).filter(_ != ) --- End diff -- can you change this to be !_.isEmpty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Don't include the empty string as a default...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2286#issuecomment-54630202 Thanks for working on this, I've been meaning to fix this for a while. Could you also please file a jira and link them. The header of the pr should include jira number like [SPARK-] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54631454 Jenkins, retest this please . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54634193 @ash211 Thank you for explaining that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-54634308 I don't think this is really necessary as I see the value of the Filesystem one as a public api now and changing its value would break compatibility, but I'm ok with it. Yes yarn-alpha has this defined. Higher value is higher priority. I would rather leave it at value 30 or at least some priorities in between, so I would rather see + 20. 30 is also what mapreduce uses so if Hadoop would to add others in we would be better off to imitate MR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2140] Updating heap memory calculation ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2253#issuecomment-54634933 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class
GitHub user mrocklin opened a pull request: https://github.com/apache/spark/pull/2288 pyspark.sql.SQLContext is new-style class Tiny PR making SQLContext a new-style class. This allows various type logic to work more effectively ```Python In [1]: import pyspark In [2]: pyspark.sql.SQLContext.mro() Out[2]: [pyspark.sql.SQLContext, object] ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mrocklin/spark sqlcontext-new-style-class Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2288.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2288 commit a2dc02fabf940c4714cbcf9f5da35c79e0795150 Author: Matthew Rocklin mrock...@gmail.com Date: 2014-09-05T14:51:25Z pyspark.sql.SQLContext is new-style class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3260] yarn - pass acls along with execu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2185 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3375] spark on yarn container allocatio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2275 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2287#issuecomment-54636076 Hi @wardviaene, Do you have an example program that reproduces this bug? We should probably add it as a regression test (see `python/pyspark/tests.py` for examples of how to do this). (For other reviewers: you can browse SerializingAdapter's code at http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/) It looks like this code is designed to handle the pickling of file() objects. The Dill developers have recently been discussing how to pickle file handles: https://github.com/uqfoundation/dill/issues/57 It looks like `SerializingAdapter.max_transmit_data` acts as an upper-limit on the sizes of closures that PiCloud would send to their service. Unlike PiCloud, we don't have limits on closure sizes (there are warnings, but these are detected / enforced inside the JVM). Therefore, I wonder if we should just remove this limit and allow the whole file to be read rather than adding an obscure configuration option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54636484 Good catch! While you're at it, are there any other old-style classes in PySpark that should be made into new-style ones? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54636685 Also, do you mind opening a JIRA ticket on https://issues.apache.org/jira/browse/SPARK and editing the title of your pull request to reference it, e.g. `[SPARK-] Use new-style classes in PySpark`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user bbejeck commented on the pull request: https://github.com/apache/spark/pull/2227#issuecomment-54637031 Did any of the admin had chance to check it out? Let me know if you want me to modify anything in it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class
Github user mrocklin commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54638388 Sure. Next time I find a few free minutes. On Fri, Sep 5, 2014 at 8:04 AM, Josh Rosen notificati...@github.com wrote: Also, do you mind opening a JIRA ticket on https://issues.apache.org/jira/browse/SPARK and editing the title of your pull request to reference it, e.g. [SPARK-] Use new-style classes in PySpark? — Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2288#issuecomment-54636685. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3361] Expand PEP 8 checks to include EC...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2277#issuecomment-54638477 Jenkinshenck, could you test this please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class
Github user mrocklin commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54639788 ``` mrocklin@notebook:~/workspace/spark$ git grep ^class \w*: mrocklin@notebook:~/workspace/spark$ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3417] -Use of old-style classes in pysp...
Github user mrocklin commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54639853 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark-3406 add a default storage level to pyth...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2280#issuecomment-54641296 It looks like `sql.py` overrides the default `persist()`, so you might want to update it there, too. LGTM otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2276#discussion_r17180863 --- Diff: yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala --- @@ -96,7 +96,7 @@ private class YarnRMClientImpl(args: ApplicationMasterArguments) extends YarnRMC // Users can then monitor stderr/stdout on that node if required. appMasterRequest.setHost(Utils.localHostName()) appMasterRequest.setRpcPort(0) -appMasterRequest.setTrackingUrl(uiAddress) +appMasterRequest.setTrackingUrl(uiAddress.replaceAll(^http(\\w)*://, )) --- End diff -- I would rather this done with something more reliable like URI class and just removing the scheme if it has one. Also can you add a comment about we are removing it because hadoop doesn't handle the scheme. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-54648499 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2270#discussion_r17182845 --- Diff: bin/pyspark --- @@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS # For pyspark tests if [[ -n $SPARK_TESTING ]]; then + unset YARN_CONF_DIR + unset HADOOP_CONF_DIR --- End diff -- If this problem only happen during testing, could we put these in python/run-tests? pyspark will be often used as python shell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/899#issuecomment-54652280 @zeodtr does this compile with anything hadoop 2.4? If it doesn't, this is a no-go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...
Github user benoyantony commented on the pull request: https://github.com/apache/spark/pull/2276#issuecomment-54652578 Sure. I'll do both. Does Alpha corresponds to Hadoop versions before YARN-1203 ? As you know, before YARN-1203, we cannot pass AM URLS with scheme. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2276#issuecomment-54653232 No, alpha means pre-branch-2 hadoop (I think, Hadoop branching is not exactly an exact science). Anyway, there are stable releases without YARN-1203. So that probably should be handled. If there isn't an API to figure out the Yarn version, I'd use reflection to detect a method that was added after YARN-1203 (preferrably around this API), and only apply the fix if the method is available. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2270#discussion_r17184086 --- Diff: bin/pyspark --- @@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS # For pyspark tests if [[ -n $SPARK_TESTING ]]; then + unset YARN_CONF_DIR + unset HADOOP_CONF_DIR --- End diff -- Thanks for your comment. As I mentioned in JIRA, YARN_CONF_DIR and HADOOP_CONF_DIR is loaded in pyspark script and some tests like rdd.py are kicked by pyspark in python/run-tests so it's not make sense that putting unset on python/run-tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3361] Expand PEP 8 checks to include EC...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2277#issuecomment-54654607 Jenkins, retest this please. (Not sure if Jenkins is programmed to listen to @nchammas or not...) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2144#issuecomment-54654606 @mateiz @JoshRosen @mattf run-tests will try to run tests for spark core and sql with PyPy. One known issue is that serialization of array in PyPy is similar to Python2.6, which is not supported by Pyrite, so one test cases has been skipped for them. I had added another one which do not depend on serialization of array. Also I had added some refactor in cloudpickle to do it in more portable ways (which is also used by dill). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2144#issuecomment-54654638 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2227#issuecomment-54654828 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2227#issuecomment-54655129 Feels to me like it would be better to fix this in `Utils.memoryStringToMb`. That way all code using it benefits. As for the behavior of that method, maybe it should throw an exception if there is no suffix and the value is 1MB? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3375] spark on yarn container allocatio...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2275#issuecomment-54655288 Oops. Thanks for fixing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2227#discussion_r17184945 --- Diff: core/src/test/scala/org/apache/spark/deploy/worker/WorkerArgumentsTest.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.spark.deploy.worker + +import org.apache.spark.SparkConf +import org.scalatest.FunSuite + + +class WorkerArgumentsTest extends FunSuite { + + test(Memory can't be set to 0 when cmd line args leave off M or G) { +val conf = new SparkConf +val args = Array(-m, 1, spark://localhost: ) +intercept[IllegalStateException] { + new WorkerArguments(args, conf) +} + } + + +/* For this test an environment property for SPARK_WORKER_MEMORY was set --- End diff -- In #2002, I added a mechanism that allows environment variables to be mocked in tests. Take a look at that PR, `SparkConf.getEnv` in particular. By using a custom SparkConf subclass, you can mock environment variables on a per-test basis: https://github.com/apache/spark/pull/2002/files#diff-e9fb6be5f96766cce96c4d60aea2fc59R45 If we find ourselves doing this in multiple places (my PR, here, ...) it might be nice to add some test helper classes for doing this more generically. That refactoring can happen in a separate PR, though, so for now it's probably fine to just copy my code snippet here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2270#discussion_r17184963 --- Diff: bin/pyspark --- @@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS # For pyspark tests if [[ -n $SPARK_TESTING ]]; then + unset YARN_CONF_DIR + unset HADOOP_CONF_DIR --- End diff -- Thanks, I get it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2227#discussion_r17185052 --- Diff: core/src/test/scala/org/apache/spark/deploy/worker/WorkerArgumentsTest.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.spark.deploy.worker + +import org.apache.spark.SparkConf +import org.scalatest.FunSuite + + +class WorkerArgumentsTest extends FunSuite { + + test(Memory can't be set to 0 when cmd line args leave off M or G) { +val conf = new SparkConf +val args = Array(-m, 1, spark://localhost: ) +intercept[IllegalStateException] { + new WorkerArguments(args, conf) +} + } + + +/* For this test an environment property for SPARK_WORKER_MEMORY was set --- End diff -- Oh, to be more specific: you'll have to change the code that reads the environment variable to use `SparkConf.getEnv` instead of `System.getEnv`; I only changed this for the environment variables used in my specific test because I didn't want to make a big cross-cutting change across the codebase (plus it would probably get broken by subsequent PRs; we should add a style checker rule that complains about System.getEnv uses if we plan on doing this change globally). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2270#issuecomment-54657139 This patch looks good to me. @JoshRosen could you help to re-visit this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2259#issuecomment-54657265 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2270#issuecomment-54660424 Looks good to me, too. Thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1616#discussion_r17188055 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -313,14 +313,74 @@ private[spark] object Utils extends Logging { } /** + * Download a file requested by the executor . Supports fetching the file in a variety of ways, + * including HTTP, HDFS and files on a standard filesystem, based on the URL parameter. + * + * If `useCache` is true, first attempts to fetch the file from a local cache that's shared across + * executors running the same application. + * + * Throws SparkException if the target file already exists and has different contents than + * the requested file. + */ + def fetchFile( + url: String, + targetDir: File, + conf: SparkConf, + securityMgr: SecurityManager, + hadoopConf: Configuration, + timestamp: Long, + useCache: Boolean) { +val fileName = url.split(/).last +val targetFile = new File(targetDir, fileName) +if (useCache) { + val cachedFileName = url.hashCode + timestamp + _cach --- End diff -- _cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1616#discussion_r17188080 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -313,14 +313,74 @@ private[spark] object Utils extends Logging { } /** + * Download a file requested by the executor . Supports fetching the file in a variety of ways, + * including HTTP, HDFS and files on a standard filesystem, based on the URL parameter. + * + * If `useCache` is true, first attempts to fetch the file from a local cache that's shared across + * executors running the same application. + * + * Throws SparkException if the target file already exists and has different contents than + * the requested file. + */ + def fetchFile( + url: String, + targetDir: File, + conf: SparkConf, + securityMgr: SecurityManager, + hadoopConf: Configuration, + timestamp: Long, + useCache: Boolean) { +val fileName = url.split(/).last +val targetFile = new File(targetDir, fileName) +if (useCache) { + val cachedFileName = url.hashCode + timestamp + _cach + val lockFileName = url.hashCode + timestamp + _lock + val localDir = new File(getLocalDir(conf)) + val lockFile = new File(localDir, lockFileName) --- End diff -- Why do we need a lock file? This seems a little expensive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1616#discussion_r17188168 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -313,14 +313,74 @@ private[spark] object Utils extends Logging { } /** + * Download a file requested by the executor . Supports fetching the file in a variety of ways, + * including HTTP, HDFS and files on a standard filesystem, based on the URL parameter. + * + * If `useCache` is true, first attempts to fetch the file from a local cache that's shared across + * executors running the same application. + * + * Throws SparkException if the target file already exists and has different contents than + * the requested file. + */ + def fetchFile( + url: String, + targetDir: File, + conf: SparkConf, + securityMgr: SecurityManager, + hadoopConf: Configuration, + timestamp: Long, + useCache: Boolean) { +val fileName = url.split(/).last +val targetFile = new File(targetDir, fileName) +if (useCache) { + val cachedFileName = url.hashCode + timestamp + _cach + val lockFileName = url.hashCode + timestamp + _lock + val localDir = new File(getLocalDir(conf)) + val lockFile = new File(localDir, lockFileName) --- End diff -- I think the idea here is that multiple executors JVMs are running on the same machine and we only want to download one copy of the file to the shared cache, so we use a lock file as a form of interprocess synchronization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-54661638 Do we need to clean up the new cache files we created? Or is that handled automatically somewhere --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3337 Paranoid quoting in shell to allow ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2229#issuecomment-54661731 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user wardviaene commented on the pull request: https://github.com/apache/spark/pull/2287#issuecomment-54661969 Hi @JoshRosen I added a test script in this pull request. The sys.stderr in a class triggers the bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2287#discussion_r17188526 --- Diff: python/pyspark/tests.py --- @@ -180,6 +180,22 @@ def tearDown(self): self.sc.stop() sys.path = self._old_sys_path +class CloudPickleTestCase(PySparkTestCase): +def SetUp(self): --- End diff -- This is capitalized (`SetUp`) so it won't be called by `unittest`. Also, we should just end up inheriting the proper setup and teardown methods from PySparkTestCase, so you don't need these methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user bbejeck commented on the pull request: https://github.com/apache/spark/pull/2227#issuecomment-54662304 Josh, Thanks for the heads up on testing with environment variables. I will look at the PR and make the required changes to the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2287#discussion_r17188600 --- Diff: python/pyspark/tests.py --- @@ -180,6 +180,22 @@ def tearDown(self): self.sc.stop() sys.path = self._old_sys_path +class CloudPickleTestCase(PySparkTestCase): +def SetUp(self): +PySparkTestCase.setUp(self) +def tearDown(self): +PySparkTestCase.tearDown(self) +def test_CloudPickle(self): --- End diff -- I'd probably go with `test_cloudpickle` without the camel-case / capitlaization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2287#discussion_r17188799 --- Diff: python/pyspark/tests.py --- @@ -180,6 +180,22 @@ def tearDown(self): self.sc.stop() sys.path = self._old_sys_path +class CloudPickleTestCase(PySparkTestCase): +def SetUp(self): +PySparkTestCase.setUp(self) +def tearDown(self): +PySparkTestCase.tearDown(self) +def test_CloudPickle(self): --- End diff -- Also, `test_cloudpickle` isn't a very descriptive name; it will be hard for people that come along and read this later to figure out what this is supposed to be testing. A better name would be `test_pickling_file_handles` (and maybe add a comment saying that it's a regression test for SPARK-3415). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...
Github user bbejeck commented on the pull request: https://github.com/apache/spark/pull/2227#issuecomment-54662763 Feels to me like it would be better to fix this in Utils.memoryStringToMb. That way all code using it benefits. I thought the same thing, but I was not sure about making a change that would be cross-cutting, so I confined my change to the WorkerArguments class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2287#discussion_r17188918 --- Diff: python/pyspark/tests.py --- @@ -180,6 +180,22 @@ def tearDown(self): self.sc.stop() sys.path = self._old_sys_path +class CloudPickleTestCase(PySparkTestCase): +def SetUp(self): +PySparkTestCase.setUp(self) +def tearDown(self): +PySparkTestCase.tearDown(self) +def test_CloudPickle(self): +self.t = self.CloudPickleTestClass() +a = [ 1 , 2, 3, 4, 5 ] +b = self.sc.parallelize(a) +c = b.map(self.t.getOk) +self.assertEquals('ok', c.first()) +class CloudPickleTestClass(object): --- End diff -- Do you need to define a separate class to test this? Maybe a simpler reproduction would be to directly instantiate CloudPickleSerializer and attempt to dump `sys.stderr` directly (or a function that references `sys.stderr` in its closure). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql
Github user xinyunh commented on the pull request: https://github.com/apache/spark/pull/2099#issuecomment-54663384 Sorry, I forgot --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: TEST ONLY DO NOT MERGE
GitHub user shaneknapp opened a pull request: https://github.com/apache/spark/pull/2289 TEST ONLY DO NOT MERGE TEST ONLY DO NOT MERGE You can merge this pull request into a Git repository by running: $ git pull https://github.com/shaneknapp/spark sknapptest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2289 commit 7973946433208fffbb7d7ac244f4e18af6e883ab Author: shane knapp incompl...@gmail.com Date: 2014-09-05T18:10:57Z TEST ONLY DO NOT MERGE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] don't duplicate default values
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/2290 [EC2] don't duplicate default values This PR makes two minor changes to the `spark-ec2` script: 1. The script's input parameter default values are duplicated into the help text. This is unnecessary. This PR replaces the duplicated info with the appropriate `optparse` placeholder. 2. The default Spark version currently needs to be updated by hand during each release, which is known to be a faulty process. This PR places that default value in an easy-to-spot place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nchammas/spark spark-ec2-default-version Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2290 commit 0c6d3bbe90b81dc433791a82d26ddc695cacf1d7 Author: Nicholas Chammas nicholas.cham...@gmail.com Date: 2014-09-05T18:33:09Z don't duplicate default values --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2491]: Fix When an fatal error is throw...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1482#issuecomment-54664325 This seems reasonable to me. /cc @andrewor14 for another pair of eyes. To recap [some discussion on the JIRA](https://issues.apache.org/jira/browse/SPARK-2491), the issue that this addresses is a scenario where the Executor JVM is in the process of exiting due to an uncaught exception and other shutdown hooks might have deleted files or otherwise performed cleanup that causes other still-running tasks to fail. These additional failures/errors are confusing when they appear in the log and make it hard to find the real failure that caused the executor JVM to exit. @witgo If I understand correctly, the problem here is that confusing messages appear in the logs, not that the executor doesn't stop or doesn't perform cleanup? If that's the case, can we edit the PR's title to [SPARK-2491] Don't handle uncaught exceptions from tasks that fail during executor shutdown? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] don't duplicate default values
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2290#issuecomment-54664634 Woah, I didn't know optparse had `%default`. Cool! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org