[GitHub] spark pull request: SPARK-3337 Paranoid quoting in shell to allow ...

2014-09-05 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2229#discussion_r17158127
  
--- Diff: sbt/sbt-launch-lib.bash ---
@@ -180,7 +180,7 @@ run() {
 ${SBT_OPTS:-$default_sbt_opts} \
 $(get_mem_opts $sbt_mem) \
 ${java_opts} \
-${java_args[@]} \
+${java_args[@]} \
--- End diff --

Ah yes, getting rid of them(and alike) as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve

2014-09-05 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2245#issuecomment-54588876
  
Seems so unlucky to be trapped in different test suite in every run.
@marmbrus @rxin Can you give the patch a retest?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve

2014-09-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2245#issuecomment-54588910
  
Jenkins is down right now ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Optimize the schedule procedure in Master

2014-09-05 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/1106#issuecomment-54588914
  
The PR is: https://issues.apache.org/jira/browse/SPARK-3411.
Cause the filter will create copy of worker, so I change the way of 
filtering.
The shuffle will create copies too, could we change its way ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3362][SQL] bug in casewhen resolve

2014-09-05 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2245#issuecomment-54589163
  
All right... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54589212
  
This change makes this shutdown hook lower than `FileSystem`'s, whereas it 
used to be higher. Also does this compile for `yarn-alpha` too? Given the time 
it went in, it probably works with all supported Hadoop versions but worth 
checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2117#issuecomment-54589602
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54589820
  
@srowen It's confused but lower value is higher priority.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54589954
  
Ah I see. That's fine, I just wasn't sure which the intent was since I 
think the original description is missing a word.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...

2014-09-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2260#issuecomment-54590041
  
Ok merging this (and removed io1 for now).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...

2014-09-05 Thread pdeyhim
Github user pdeyhim commented on the pull request:

https://github.com/apache/spark/pull/2260#issuecomment-54590139
  
And what happens when the additional EBS volumes get added? We probably 
want to configure spark-env.sh and spark_local_dir with the new volumes 
correct? the place this happens is here: 
https://github.com/rxin/spark/blob/ec2-ebs-vol/ec2/spark_ec2.py#L674-L678 but 
that snippet only configures local disks in spark-env.sh and not the new EBS 
volumes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...

2014-09-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2260#issuecomment-54590209
  
the ebs volumes are not great for shuffle (bad small write performance). 
Let's hold that off for now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...

2014-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] fix minor MLlib case typo

2014-09-05 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2278#issuecomment-54590246
  
Merged into master and branch-1.1. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] fix minor MLlib case typo

2014-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2278


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54590354
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19800/consoleFull)
 for   PR 2283 at commit 
[`717aba2`](https://github.com/apache/spark/commit/717aba2221fe974f218f6ecbffab77162c4c94ea).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AddJar(path: String) extends LeafNode with Command `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54590502
  
Ah, sorry it's my wrong. I confirm the logic of ShutdownHookManager, and 
higher value is higher priority.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3391][EC2] Support attaching up to 8 EB...

2014-09-05 Thread pdeyhim
Github user pdeyhim commented on the pull request:

https://github.com/apache/spark/pull/2260#issuecomment-54590541
  
@rxin ok that's correct for smaller instance types. But FYI, EBS on larger 
instances (and ebs optimized instances) should perform well on shuffle 
read/write


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3086] [SPARK-3043] [SPARK-3156] [mllib]...

2014-09-05 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2125#issuecomment-54590611
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3409][SQL] Avoid pulling in Exchange op...

2014-09-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2282#issuecomment-54594495
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-05 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/2284

[SPARK-3412] [SQL] Add 3 missing types for Row API

`BinaryType`, `DecimalType` and `TimestampType` are missing in the Row API.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark missing_types_in_row

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2284.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2284


commit 3644ffa46ac06adb0096df4f13bc03d0f3904eab
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-09-05T07:45:57Z

Add 3 missing types for Row API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54595905
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2117#issuecomment-54596140
  
@nchammas I'm guessing your OOM issue is unrelated to this one.

```
a = sc.parallelize([Nick, John, Bob])
a = a.repartition(24000)
a.keyBy(lambda x: len(x)).reduceByKey(lambda x,y: x + y, 
sc.defaultParallelism).take(1)
```

After the reduceByKey above, you'd have 24000 partitions and only 2 entries 
in them: (4, NickJohn) and (3, Bob).  This bug manifests when you have an 
empty partition 0 and many remaining partitions each with a large amount of 
data.  The .take(n) gets up to the first n from each remaining partition and 
then takes the first n from the concatenation of those arrays.

For this bug to take effect on your situation you'd have to have an empty 
first partition (a good 23998/24000 chance).  The driver would then bring into 
memory 23998 empty arrays and 2 arrays of size 1 (or maybe 1 array of size 2), 
which I can't imagine would OOM the driver.  So I don't think this is your bug.

The other evidence is that you observed a regression (at least the perf 
numbers later in your bug) and this has been the same for quite some time.  The 
current behavior was implemented in commit 
42571d30d0d518e69eecf468075e4c5a823a2ae8 and was first released in version 0.9:

```
aash@aash-mbp ~/git/spark$ git log origin/branch-1.0 | grep 
42571d30d0d518e69eecf468075e4c5a823a2ae8
commit 42571d30d0d518e69eecf468075e4c5a823a2ae8
aash@aash-mbp ~/git/spark$ git log origin/branch-0.9 | grep 
42571d30d0d518e69eecf468075e4c5a823a2ae8
commit 42571d30d0d518e69eecf468075e4c5a823a2ae8
aash@aash-mbp ~/git/spark$ git log origin/branch-0.8 | grep 
42571d30d0d518e69eecf468075e4c5a823a2ae8
aash@aash-mbp ~/git/spark$
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2117#issuecomment-54596256
  
Regarding the merge, I'm guessing this is too late to land in the Spark 1.1 
release.  Is it a candidate for a backport to a 1.1.x?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2281#issuecomment-54596774
  
What's the implication here for other client code of the Spark API?  It 
looks like there are mutability concerns in whether you can save a reference to 
the object you get back from the iterator in mapPartitions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2281#issuecomment-54596928
  
The correct assumption is to not reuse objects. However, in Spark SQL we 
exploited the implementation of the old shuffle behavior (which serializes each 
row object immediately without buffering them) to avoid allocating objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix for false positives reported by mima on PR...

2014-09-05 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/2285

Fix for false positives reported by mima on PR 2194.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 mima-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2285


commit c050d1b363b01aed0df0706fae87ae4f86631067
Author: Prashant Sharma prashan...@imaginea.com
Date:   2014-09-05T08:26:48Z

Fix for false positives reported by mima on PR 2194.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2281#issuecomment-54597321
  
I don't see that contract in the API documented in the Scaladoc for the 
method:

```
 588   /**
 589* Return a new RDD by applying a function to each partition of this 
RDD.
 590*
 591* `preservesPartitioning` indicates whether the input function 
preserves the partitioner, which
 592* should be `false` unless this is a pair RDD and the input 
function doesn't modify the keys.
 593*/
 594   def mapPartitions[U: ClassTag](
 595   f: Iterator[T] = Iterator[U], preservesPartitioning: Boolean = 
false): RDD[U] = {
 596 val func = (context: TaskContext, index: Int, iter: Iterator[T]) 
= f(iter)
 597 new MapPartitionsRDD(this, sc.clean(func), preservesPartitioning)
 598   }
```

Should I send a PR documenting it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

2014-09-05 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2194#issuecomment-54598665
  
@rxin There is a reason and (workaround type of)fix for this on #2285.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix for false positives reported by mima on PR...

2014-09-05 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2285#discussion_r17161813
  
--- Diff: dev/mima ---
@@ -25,12 +25,16 @@ FWDIR=$(cd `dirname $0`/..; pwd)
 cd $FWDIR
 
 echo -e q\n | sbt/sbt oldDeps/update
+rm -f .generated-mima*
+
+./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new
--- End diff --

run with just new jars first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix for false positives reported by mima on PR...

2014-09-05 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2285#discussion_r17161831
  
--- Diff: dev/mima ---
@@ -25,12 +25,16 @@ FWDIR=$(cd `dirname $0`/..; pwd)
 cd $FWDIR
 
 echo -e q\n | sbt/sbt oldDeps/update
+rm -f .generated-mima*
+
+./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new
 
 export SPARK_CLASSPATH=`find lib_managed \( -name '*spark*jar' -a -type f 
\) | tr \\n :`
 echo SPARK_CLASSPATH=$SPARK_CLASSPATH
 
-./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore
-echo -e q\n | sbt/sbt mima-report-binary-issues | grep -v -e 
info.*Resolving
+./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore old
--- End diff --

run with old jars ahead of new ones. (since new ones cant be eliminated, 
tools project need them anyway.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Tests meant to demonstrate the bug in SPARK-26...

2014-09-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/1588#issuecomment-54598213
  
Yep good to close -- we can refer to the ticket in the future if it comes 
back up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3412] [SQL] Add 3 missing types for Row...

2014-09-05 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2284#issuecomment-54597712
  
test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix for false positives reported by mima on PR...

2014-09-05 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2285#discussion_r17162225
  
--- Diff: dev/mima ---
@@ -25,11 +25,15 @@ FWDIR=$(cd `dirname $0`/..; pwd)
 cd $FWDIR
 
 echo -e q\n | sbt/sbt oldDeps/update
+rm -f .generated-mima*
+
+./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore new
 
 export SPARK_CLASSPATH=`find lib_managed \( -name '*spark*jar' -a -type f 
\) | tr \\n :`
 echo SPARK_CLASSPATH=$SPARK_CLASSPATH
 
-./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore
+./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore old
--- End diff --

and then old too..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-05 Thread li-zhihui
Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-54600822
  
@JoshRosen @andrewor14
I use codeurl.hashCode + timestamp/code as codecachedFileName/code, 
I believe it is impossible that existing codeurl.hashCode/code collision 
and codetimestamp/code collision at same time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/2230#issuecomment-54601247
  
@marmbrus Seems hive parser will pass something like a.b.c... to 
`LogicalPlan`, so I have to roll back(and I changed `dotExpressionHeader` to 
`ident . ident {. ident}`). And I have done some work on `GetField` to let 
it support not just StructType, but also array of struct, or array of array of 
struct, or array of array of ... struct. 
The idea is simple. If you want `a.b` to work, then `a` must be some level 
if nested array of struct(level 0 means just a StructType). And the result of 
`a.b` is same level of nested array of b-type. In this way, we can handle 
nested array of strcut and simple struct in same process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2096][SQL] Correctly parse dot notation...

2014-09-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/2230#issuecomment-54601682
  
I'm not sure how to modify `lazy val resolved` in `GetField` since it 
handles not only StructType now. Currently I just removed the type check. What 
do you think? @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54602306
  
test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't include the empty string as a default...

2014-09-05 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/2286

Don't include the empty string  as a defaultAclUser

Changes logging from

```
14/09/05 02:01:08 INFO SecurityManager: Changing view acls to: aash,
14/09/05 02:01:08 INFO SecurityManager: Changing modify acls to: aash,
14/09/05 02:01:08 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(aash, ); users 
with modify permissions: Set(aash, )
```
to
```
14/09/05 02:28:28 INFO SecurityManager: Changing view acls to: aash
14/09/05 02:28:28 INFO SecurityManager: Changing modify acls to: aash
14/09/05 02:28:28 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(aash); users with 
modify permissions: Set(aash)
```

Note that the first set of logs have a Set of size 2 containing aash and 
the empty string 

cc @tgravescs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark empty-default-acl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2286


commit cf973a1b8f202cd7fe70cf60c701c62c51d2e702
Author: Andrew Ash and...@andrewash.com
Date:   2014-09-05T09:30:33Z

Don't include the empty string  as a defaultAclUser




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [BUILD] Fix for false positives reported by mi...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2285#issuecomment-54604364
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19803/consoleFull)
 for   PR 2285 at commit 
[`24f3381`](https://github.com/apache/spark/commit/24f338120c33d353136c056544fe59ade7696af7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't include the empty string as a default...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2286#issuecomment-54604796
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19804/consoleFull)
 for   PR 2286 at commit 
[`cf973a1`](https://github.com/apache/spark/commit/cf973a1b8f202cd7fe70cf60c701c62c51d2e702).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread mubarak
Github user mubarak commented on the pull request:

https://github.com/apache/spark/pull/1723#issuecomment-54606563
  
@tdas 
Can you please review? Thanks
![screen shot 2014-09-05 at 1 42 28 
am](https://cloud.githubusercontent.com/assets/668134/4163160/b9b9b538-34e3-11e4-9fae-0e70f3ba1693.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread mubarak
Github user mubarak commented on the pull request:

https://github.com/apache/spark/pull/1723#issuecomment-54606650
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [BUILD] Fix for false positives reported by mi...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2285#issuecomment-54609772
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19803/consoleFull)
 for   PR 2285 at commit 
[`24f3381`](https://github.com/apache/spark/commit/24f338120c33d353136c056544fe59ade7696af7).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AddJar(path: String) extends LeafNode with Command `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't include the empty string as a default...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2286#issuecomment-54610134
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19804/consoleFull)
 for   PR 2286 at commit 
[`cf973a1`](https://github.com/apache/spark/commit/cf973a1b8f202cd7fe70cf60c701c62c51d2e702).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread wardviaene
GitHub user wardviaene opened a pull request:

https://github.com/apache/spark/pull/2287

[SPARK-3415] [PySpark] removes SerializingAdapter code

This code removes the SerializingAdapter code that was copied from PiCloud

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wardviaene/spark feature/pythonsys

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2287


commit e263bf557148ab878e656f3138f6f7cb2cd003fb
Author: Ward Viaene ward.via...@bigdatapartnership.com
Date:   2014-09-05T13:12:03Z

SPARK-3415: removes legacy SerializingAdapter code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't include the empty string as a default...

2014-09-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2286#discussion_r17174246
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -162,7 +162,7 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf) extends Logging {
 
   // always add the current user and SPARK_USER to the viewAcls
   private val defaultAclUsers = 
Set[String](System.getProperty(user.name, ),
-Option(System.getenv(SPARK_USER)).getOrElse())
+Option(System.getenv(SPARK_USER)).getOrElse()).filter(_ != )
--- End diff --

can you change this to be !_.isEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't include the empty string as a default...

2014-09-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2286#issuecomment-54630202
  
Thanks for working on this, I've been meaning to fix this for a while. 

Could you also please file a jira and link them.  The header of the pr 
should include jira number like [SPARK-]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54631454
  
Jenkins, retest this please .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2117#issuecomment-54634193
  
@ash211 Thank you for explaining that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-54634308
  
I don't think this is really necessary as I see the value of the Filesystem 
one as a public api now and changing its value would break compatibility, but 
I'm ok with it.   Yes yarn-alpha has this defined.   

 Higher value is higher priority.  I would rather leave it at value 30 or 
at least some priorities in between, so I would rather see + 20.   30 is also 
what mapreduce uses so if Hadoop would to add others in we would be better off 
to imitate MR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2140] Updating heap memory calculation ...

2014-09-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2253#issuecomment-54634933
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class

2014-09-05 Thread mrocklin
GitHub user mrocklin opened a pull request:

https://github.com/apache/spark/pull/2288

pyspark.sql.SQLContext is new-style class

Tiny PR making SQLContext a new-style class.  This allows various type 
logic to work more effectively

```Python
In [1]: import pyspark

In [2]: pyspark.sql.SQLContext.mro()
Out[2]: [pyspark.sql.SQLContext, object]
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mrocklin/spark sqlcontext-new-style-class

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2288


commit a2dc02fabf940c4714cbcf9f5da35c79e0795150
Author: Matthew Rocklin mrock...@gmail.com
Date:   2014-09-05T14:51:25Z

pyspark.sql.SQLContext is new-style class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3260] yarn - pass acls along with execu...

2014-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3375] spark on yarn container allocatio...

2014-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2275


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2287#issuecomment-54636076
  
Hi @wardviaene,

Do you have an example program that reproduces this bug?  We should 
probably add it as a regression test (see `python/pyspark/tests.py` for 
examples of how to do this).

(For other reviewers: you can browse SerializingAdapter's code at 
http://pydoc.net/Python/cloud/2.7.0/cloud.transport.adapter/)  It looks like 
this code is designed to handle the pickling of file() objects.  The Dill 
developers have recently been discussing how to pickle file handles: 
https://github.com/uqfoundation/dill/issues/57

It looks like `SerializingAdapter.max_transmit_data` acts as an upper-limit 
on the sizes of closures that PiCloud would send to their service.  Unlike 
PiCloud, we don't have limits on closure sizes (there are warnings, but these 
are detected / enforced inside the JVM).  Therefore, I wonder if we should just 
remove this limit and allow the whole file to be read rather than adding an 
obscure configuration option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2288#issuecomment-54636484
  
Good catch!  While you're at it, are there any other old-style classes in 
PySpark that should be made into new-style ones?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2288#issuecomment-54636685
  
Also, do you mind opening a JIRA ticket on 
https://issues.apache.org/jira/browse/SPARK and editing the title of your pull 
request to reference it, e.g. `[SPARK-] Use new-style classes in PySpark`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread bbejeck
Github user bbejeck commented on the pull request:

https://github.com/apache/spark/pull/2227#issuecomment-54637031
  
Did any of the admin had chance to check it out? Let me know if you want me 
to modify anything in it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class

2014-09-05 Thread mrocklin
Github user mrocklin commented on the pull request:

https://github.com/apache/spark/pull/2288#issuecomment-54638388
  
Sure.  Next time I find a few free minutes.


On Fri, Sep 5, 2014 at 8:04 AM, Josh Rosen notificati...@github.com wrote:

 Also, do you mind opening a JIRA ticket on
 https://issues.apache.org/jira/browse/SPARK and editing the title of your
 pull request to reference it, e.g. [SPARK-] Use new-style classes in
 PySpark?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2288#issuecomment-54636685.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3361] Expand PEP 8 checks to include EC...

2014-09-05 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2277#issuecomment-54638477
  
Jenkinshenck, could you test this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: pyspark.sql.SQLContext is new-style class

2014-09-05 Thread mrocklin
Github user mrocklin commented on the pull request:

https://github.com/apache/spark/pull/2288#issuecomment-54639788
  


```
mrocklin@notebook:~/workspace/spark$ git grep ^class \w*:
mrocklin@notebook:~/workspace/spark$ 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3417] -Use of old-style classes in pysp...

2014-09-05 Thread mrocklin
Github user mrocklin commented on the pull request:

https://github.com/apache/spark/pull/2288#issuecomment-54639853
  
Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark-3406 add a default storage level to pyth...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2280#issuecomment-54641296
  
It looks like `sql.py` overrides the default `persist()`, so you might want 
to update it there, too.  LGTM otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...

2014-09-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/2276#discussion_r17180863
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala 
---
@@ -96,7 +96,7 @@ private class YarnRMClientImpl(args: 
ApplicationMasterArguments) extends YarnRMC
 // Users can then monitor stderr/stdout on that node if required.
 appMasterRequest.setHost(Utils.localHostName())
 appMasterRequest.setRpcPort(0)
-appMasterRequest.setTrackingUrl(uiAddress)
+appMasterRequest.setTrackingUrl(uiAddress.replaceAll(^http(\\w)*://, 
))
--- End diff --

I would rather this done with something more reliable like URI class and 
just removing the scheme if it has one.   

Also can you add a comment about we are removing it because hadoop doesn't 
handle the scheme.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

2014-09-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2257#issuecomment-54648499
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2270#discussion_r17182845
  
--- Diff: bin/pyspark ---
@@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS
 
 # For pyspark tests
 if [[ -n $SPARK_TESTING ]]; then
+  unset YARN_CONF_DIR
+  unset HADOOP_CONF_DIR
--- End diff --

If this problem only happen during testing, could we put these in 
python/run-tests? pyspark will be often used as python shell.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...

2014-09-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/899#issuecomment-54652280
  
@zeodtr does this compile with anything  hadoop 2.4? If it doesn't, this 
is a no-go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...

2014-09-05 Thread benoyantony
Github user benoyantony commented on the pull request:

https://github.com/apache/spark/pull/2276#issuecomment-54652578
  
Sure. I'll do both.
Does Alpha corresponds to Hadoop versions before YARN-1203 ?   As you know, 
before YARN-1203, we cannot pass AM URLS with scheme.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3286] - Cannot view ApplicationMaster U...

2014-09-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2276#issuecomment-54653232
  
No, alpha means pre-branch-2 hadoop (I think, Hadoop branching is not 
exactly an exact science). Anyway, there are stable releases without YARN-1203. 
So that probably should be handled.

If there isn't an API to figure out the Yarn version, I'd use reflection to 
detect a method that was added after YARN-1203 (preferrably around this API), 
and only apply the fix if the method is available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/2270#discussion_r17184086
  
--- Diff: bin/pyspark ---
@@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS
 
 # For pyspark tests
 if [[ -n $SPARK_TESTING ]]; then
+  unset YARN_CONF_DIR
+  unset HADOOP_CONF_DIR
--- End diff --

Thanks for your comment.
As I mentioned in JIRA, YARN_CONF_DIR and HADOOP_CONF_DIR is loaded in 
pyspark script and some tests like rdd.py are kicked by pyspark in 
python/run-tests so it's not make sense that putting unset on 
python/run-tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3361] Expand PEP 8 checks to include EC...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2277#issuecomment-54654607
  
Jenkins, retest this please.  (Not sure if Jenkins is programmed to listen 
to @nchammas or not...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-05 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2144#issuecomment-54654606
  
@mateiz @JoshRosen @mattf run-tests will try to run tests for spark core 
and sql with PyPy.

One known issue is that serialization of array in PyPy is similar to 
Python2.6, which is not supported by Pyrite, so one test cases has been skipped 
for them. I had added another one which do not depend on serialization of array.

Also I had added some refactor in cloudpickle to do it in more portable 
ways (which is also used by dill).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-05 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2144#issuecomment-54654638
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2227#issuecomment-54654828
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2227#issuecomment-54655129
  
Feels to me like it would be better to fix this in 
`Utils.memoryStringToMb`. That way all code using it benefits.

As for the behavior of that method, maybe it should throw an exception if 
there is no suffix and the value is  1MB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3375] spark on yarn container allocatio...

2014-09-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2275#issuecomment-54655288
  
Oops. Thanks for fixing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2227#discussion_r17184945
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/worker/WorkerArgumentsTest.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.deploy.worker
+
+import org.apache.spark.SparkConf
+import org.scalatest.FunSuite
+
+
+class WorkerArgumentsTest extends FunSuite {
+
+  test(Memory can't be set to 0 when cmd line args leave off M or G) {
+val conf = new SparkConf
+val args = Array(-m, 1, spark://localhost:  )
+intercept[IllegalStateException] {
+  new WorkerArguments(args, conf)
+}
+  }
+
+
+/* For this test an environment property for SPARK_WORKER_MEMORY was set
--- End diff --

In #2002, I added a mechanism that allows environment variables to be 
mocked in tests.  Take a look at that PR, `SparkConf.getEnv` in particular.  By 
using a custom SparkConf subclass, you can mock environment variables on a 
per-test basis: 
https://github.com/apache/spark/pull/2002/files#diff-e9fb6be5f96766cce96c4d60aea2fc59R45

If we find ourselves doing this in multiple places (my PR, here, ...) it 
might be nice to add some test helper classes for doing this more generically.  
That refactoring can happen in a separate PR, though, so for now it's probably 
fine to just copy my code snippet here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/2270#discussion_r17184963
  
--- Diff: bin/pyspark ---
@@ -85,6 +85,8 @@ export PYSPARK_SUBMIT_ARGS
 
 # For pyspark tests
 if [[ -n $SPARK_TESTING ]]; then
+  unset YARN_CONF_DIR
+  unset HADOOP_CONF_DIR
--- End diff --

Thanks, I get it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2227#discussion_r17185052
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/worker/WorkerArgumentsTest.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.deploy.worker
+
+import org.apache.spark.SparkConf
+import org.scalatest.FunSuite
+
+
+class WorkerArgumentsTest extends FunSuite {
+
+  test(Memory can't be set to 0 when cmd line args leave off M or G) {
+val conf = new SparkConf
+val args = Array(-m, 1, spark://localhost:  )
+intercept[IllegalStateException] {
+  new WorkerArguments(args, conf)
+}
+  }
+
+
+/* For this test an environment property for SPARK_WORKER_MEMORY was set
--- End diff --

Oh, to be more specific: you'll have to change the code that reads the 
environment variable to use `SparkConf.getEnv` instead of `System.getEnv`; I 
only changed this for the environment variables used in my specific test 
because I didn't want to make a big cross-cutting change across the codebase 
(plus it would probably get broken by subsequent PRs; we should add a style 
checker rule that complains about System.getEnv uses if we plan on doing this 
change globally).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2270#issuecomment-54657139
  
This patch looks good to me.

@JoshRosen could you help to re-visit this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3030] [PySpark] Reuse Python worker

2014-09-05 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2259#issuecomment-54657265
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2270#issuecomment-54660424
  
Looks good to me, too.  Thanks for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1616#discussion_r17188055
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -313,14 +313,74 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file requested by the executor . Supports fetching the 
file in a variety of ways,
+   * including HTTP, HDFS and files on a standard filesystem, based on the 
URL parameter.
+   *
+   * If `useCache` is true, first attempts to fetch the file from a local 
cache that's shared across
+   * executors running the same application.
+   *
+   * Throws SparkException if the target file already exists and has 
different contents than
+   * the requested file.
+   */
+  def fetchFile(
+  url: String,
+  targetDir: File,
+  conf: SparkConf,
+  securityMgr: SecurityManager,
+  hadoopConf: Configuration,
+  timestamp: Long,
+  useCache: Boolean) {
+val fileName = url.split(/).last
+val targetFile = new File(targetDir, fileName)
+if (useCache) {
+  val cachedFileName = url.hashCode + timestamp + _cach
--- End diff --

_cache


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1616#discussion_r17188080
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -313,14 +313,74 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file requested by the executor . Supports fetching the 
file in a variety of ways,
+   * including HTTP, HDFS and files on a standard filesystem, based on the 
URL parameter.
+   *
+   * If `useCache` is true, first attempts to fetch the file from a local 
cache that's shared across
+   * executors running the same application.
+   *
+   * Throws SparkException if the target file already exists and has 
different contents than
+   * the requested file.
+   */
+  def fetchFile(
+  url: String,
+  targetDir: File,
+  conf: SparkConf,
+  securityMgr: SecurityManager,
+  hadoopConf: Configuration,
+  timestamp: Long,
+  useCache: Boolean) {
+val fileName = url.split(/).last
+val targetFile = new File(targetDir, fileName)
+if (useCache) {
+  val cachedFileName = url.hashCode + timestamp + _cach
+  val lockFileName = url.hashCode + timestamp + _lock
+  val localDir = new File(getLocalDir(conf))
+  val lockFile = new File(localDir, lockFileName)
--- End diff --

Why do we need a lock file? This seems a little expensive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1616#discussion_r17188168
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -313,14 +313,74 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file requested by the executor . Supports fetching the 
file in a variety of ways,
+   * including HTTP, HDFS and files on a standard filesystem, based on the 
URL parameter.
+   *
+   * If `useCache` is true, first attempts to fetch the file from a local 
cache that's shared across
+   * executors running the same application.
+   *
+   * Throws SparkException if the target file already exists and has 
different contents than
+   * the requested file.
+   */
+  def fetchFile(
+  url: String,
+  targetDir: File,
+  conf: SparkConf,
+  securityMgr: SecurityManager,
+  hadoopConf: Configuration,
+  timestamp: Long,
+  useCache: Boolean) {
+val fileName = url.split(/).last
+val targetFile = new File(targetDir, fileName)
+if (useCache) {
+  val cachedFileName = url.hashCode + timestamp + _cach
+  val lockFileName = url.hashCode + timestamp + _lock
+  val localDir = new File(getLocalDir(conf))
+  val lockFile = new File(localDir, lockFileName)
--- End diff --

I think the idea here is that multiple executors JVMs are running on the 
same machine and we only want to download one copy of the file to the shared 
cache, so we use a lock file as a form of interprocess synchronization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-54661638
  
Do we need to clean up the new cache files we created? Or is that handled 
automatically somewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3337 Paranoid quoting in shell to allow ...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2229#issuecomment-54661731
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread wardviaene
Github user wardviaene commented on the pull request:

https://github.com/apache/spark/pull/2287#issuecomment-54661969
  
Hi @JoshRosen 

I added a test script in this pull request. The sys.stderr in a class 
triggers the bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2287#discussion_r17188526
  
--- Diff: python/pyspark/tests.py ---
@@ -180,6 +180,22 @@ def tearDown(self):
 self.sc.stop()
 sys.path = self._old_sys_path
 
+class CloudPickleTestCase(PySparkTestCase):
+def SetUp(self):
--- End diff --

This is capitalized (`SetUp`) so it won't be called by `unittest`.  Also, 
we should just end up inheriting the proper setup and teardown methods from 
PySparkTestCase, so you don't need these methods. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread bbejeck
Github user bbejeck commented on the pull request:

https://github.com/apache/spark/pull/2227#issuecomment-54662304
  
Josh, 

Thanks for the heads up on testing with environment variables. I will look 
at the PR and make the required changes to the test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2287#discussion_r17188600
  
--- Diff: python/pyspark/tests.py ---
@@ -180,6 +180,22 @@ def tearDown(self):
 self.sc.stop()
 sys.path = self._old_sys_path
 
+class CloudPickleTestCase(PySparkTestCase):
+def SetUp(self):
+PySparkTestCase.setUp(self)
+def tearDown(self):
+PySparkTestCase.tearDown(self)
+def test_CloudPickle(self):
--- End diff --

I'd probably go with `test_cloudpickle` without the camel-case / 
capitlaization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2287#discussion_r17188799
  
--- Diff: python/pyspark/tests.py ---
@@ -180,6 +180,22 @@ def tearDown(self):
 self.sc.stop()
 sys.path = self._old_sys_path
 
+class CloudPickleTestCase(PySparkTestCase):
+def SetUp(self):
+PySparkTestCase.setUp(self)
+def tearDown(self):
+PySparkTestCase.tearDown(self)
+def test_CloudPickle(self):
--- End diff --

Also, `test_cloudpickle` isn't a very descriptive name; it will be hard for 
people that come along and read this later to figure out what this is supposed 
to be testing.  A better name would be `test_pickling_file_handles` (and maybe 
add a comment saying that it's a regression test for SPARK-3415).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3178 setting SPARK_WORKER_MEMORY to a va...

2014-09-05 Thread bbejeck
Github user bbejeck commented on the pull request:

https://github.com/apache/spark/pull/2227#issuecomment-54662763
  
 Feels to me like it would be better to fix this in 
Utils.memoryStringToMb. That way all code using it benefits.

I thought the same thing, but I was not sure about making a change that 
would be cross-cutting, so I confined my change to the WorkerArguments class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2287#discussion_r17188918
  
--- Diff: python/pyspark/tests.py ---
@@ -180,6 +180,22 @@ def tearDown(self):
 self.sc.stop()
 sys.path = self._old_sys_path
 
+class CloudPickleTestCase(PySparkTestCase):
+def SetUp(self):
+PySparkTestCase.setUp(self)
+def tearDown(self):
+PySparkTestCase.tearDown(self)
+def test_CloudPickle(self):
+self.t = self.CloudPickleTestClass()
+a = [ 1 , 2, 3, 4, 5 ]
+b = self.sc.parallelize(a)
+c = b.map(self.t.getOk)
+self.assertEquals('ok', c.first())
+class CloudPickleTestClass(object):
--- End diff --

Do you need to define a separate class to test this?  Maybe a simpler 
reproduction would be to directly instantiate CloudPickleSerializer and attempt 
to dump `sys.stderr` directly (or a function that references `sys.stderr` in 
its closure).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3176] Implement 'ABS and 'LAST' for sql

2014-09-05 Thread xinyunh
Github user xinyunh commented on the pull request:

https://github.com/apache/spark/pull/2099#issuecomment-54663384
  
Sorry, I forgot


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread shaneknapp
GitHub user shaneknapp opened a pull request:

https://github.com/apache/spark/pull/2289

TEST ONLY DO NOT MERGE

TEST ONLY DO NOT MERGE

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shaneknapp/spark sknapptest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2289


commit 7973946433208fffbb7d7ac244f4e18af6e883ab
Author: shane knapp incompl...@gmail.com
Date:   2014-09-05T18:10:57Z

TEST ONLY DO NOT MERGE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] don't duplicate default values

2014-09-05 Thread nchammas
GitHub user nchammas opened a pull request:

https://github.com/apache/spark/pull/2290

[EC2] don't duplicate default values

This PR makes two minor changes to the `spark-ec2` script:

1. The script's input parameter default values are duplicated into the help 
text. This is unnecessary. This PR replaces the duplicated info with the 
appropriate `optparse`  placeholder.
2. The default Spark version currently needs to be updated by hand during 
each release, which is known to be a faulty process. This PR places that 
default value in an easy-to-spot place.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nchammas/spark spark-ec2-default-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2290


commit 0c6d3bbe90b81dc433791a82d26ddc695cacf1d7
Author: Nicholas Chammas nicholas.cham...@gmail.com
Date:   2014-09-05T18:33:09Z

don't duplicate default values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2491]: Fix When an fatal error is throw...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1482#issuecomment-54664325
  
This seems reasonable to me.  /cc @andrewor14 for another pair of eyes.

To recap [some discussion on the 
JIRA](https://issues.apache.org/jira/browse/SPARK-2491), the issue that this 
addresses is a scenario where the Executor JVM is in the process of exiting due 
to an uncaught exception and other shutdown hooks might have deleted files or 
otherwise performed cleanup that causes other still-running tasks to fail.  
These additional failures/errors are confusing when they appear in the log and 
make it hard to find the real failure that caused the executor JVM to exit.

@witgo If I understand correctly, the problem here is that confusing 
messages appear in the logs, not that the executor doesn't stop or doesn't 
perform cleanup?  If that's the case, can we edit the PR's title to 
[SPARK-2491] Don't handle uncaught exceptions from tasks that fail during 
executor shutdown?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] don't duplicate default values

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2290#issuecomment-54664634
  
Woah, I didn't know optparse had `%default`.  Cool!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >