[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43852089
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43852090
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15137/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43851355
  
Even I assumed it was intentional since output from user code goes
typically to stdout.

Regards
Mridul
On 22-May-2014 11:20 am, "Reynold Xin"  wrote:

> I actually thought this was intentional and it's been the case since
> Spark's inception. @mateiz  can you comment on
> this?
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43850841
  
I actually thought this was intentional and it's been the case since 
Spark's inception. @mateiz can you comment on this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43849242
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/852#issuecomment-43849233
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Default log4j.properties incorrectly sends all...

2014-05-21 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/852

Default log4j.properties incorrectly sends all output to stderr and none to 
stdout

https://issues.apache.org/jira/browse/SPARK-1899

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark SPARK-1899

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #852


commit 41af54c07509a1aabd9f0e0e111e7f13616f3cee
Author: Andrew Ash 
Date:   2014-05-22T04:54:43Z

Change log4j.properties.template

- Only send ERROR and higher to stderr
- Send everything to stdout

commit d094463519ba3b50293dc06bfe51717204a1e21f
Author: Andrew Ash 
Date:   2014-05-22T05:00:40Z

Use 4-year dates in default logging




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/791#issuecomment-43848168
  
@mridulm @tdas I have moved `putLock.synchronized` into `ensureFreeSpace` 
and rename this method to `getToBeDroppedBlocks`. And I also updated the 
scaladoc to explain this selection, marking, then dropping process. Please take 
a look to see if I missed something.
Can one of the admins ask @AmplabJenkins  to run the unit_test? I want to 
make sure my PR doesn't break some basic functions...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1886] check executor id existence when ...

2014-05-21 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/827#issuecomment-43844695
  
Can't see any reason why it should be related... one more time...

Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43843669
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43843670
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15136/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43843526
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43843528
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15135/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/791#issuecomment-43842861
  
@mridulm I checked all caller of MemoryStore#putValues and putBytes via 
IDE, it shows only BlockManager will call them and with block info 
synchronized. So maybe we don't need to worry about putting same block in 
parallel?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-05-21 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/791#issuecomment-43842578
  
@tdas I think we shouldn't synchronize on this. When one thread is running 
`ensureFreeSpace`, others should not get into `ensureFreeSpace`, but should be 
able to add and remove blocks. So using a `putLock` is better.
About the test, I haven't but I'm going to. Can spark-perf do this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1880] [SQL] Eliminate unnecessary job e...

2014-05-21 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/825#issuecomment-43842591
  
@rxin Thanks a lot!
Well, should I close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1886] check executor id existence when ...

2014-05-21 Thread zhpengg
Github user zhpengg commented on the pull request:

https://github.com/apache/spark/pull/827#issuecomment-43841865
  
any idea of the ut failure? @aarondav 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43841781
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43841770
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43841490
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43841479
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage

2014-05-21 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/828#issuecomment-43840745
  
@tdas 
You're right. the code breaks the fault-tolerance properties of RDDs.
The perfect solution is the automatic cleanup and rebuilding shuffle data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/851#issuecomment-43840241
  
I've merged this into master & branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/851


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43838803
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43838804
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15134/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43836661
  
This doesn't apply to standalone or Mesos. For these two modes, Spark 
submit translates `--jars` to `spark.jars`, then SparkContext uploads these 
jars to the HTTP server, and the executors pull from the server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/850#discussion_r12930693
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
   def submitApp(appContext: ApplicationSubmissionContext) = {
 // Submit the application to the applications manager.
 logInfo("Submitting application to ASM")
-super.submitApplication(appContext)
+yarnClient.submitApplication(appContext)
   }
 
+  def getApplicationReport = yarnClient.getApplicationReport _
--- End diff --

I guess the same thing applies to stop() as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread cmccabe
Github user cmccabe commented on a diff in the pull request:

https://github.com/apache/spark/pull/850#discussion_r12930683
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
   def submitApp(appContext: ApplicationSubmissionContext) = {
 // Submit the application to the applications manager.
 logInfo("Submitting application to ASM")
-super.submitApplication(appContext)
+yarnClient.submitApplication(appContext)
   }
 
+  def getApplicationReport = yarnClient.getApplicationReport _
+
+  def stop = yarnClient.stop _
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread cmccabe
Github user cmccabe commented on a diff in the pull request:

https://github.com/apache/spark/pull/850#discussion_r12930649
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
   def submitApp(appContext: ApplicationSubmissionContext) = {
 // Submit the application to the applications manager.
 logInfo("Submitting application to ASM")
-super.submitApplication(appContext)
+yarnClient.submitApplication(appContext)
   }
 
+  def getApplicationReport = yarnClient.getApplicationReport _
--- End diff --

This is here so YarnClientSchedulerBackend can call 
Client#getApplicationReport.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/850#discussion_r12930178
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
   def submitApp(appContext: ApplicationSubmissionContext) = {
 // Submit the application to the applications manager.
 logInfo("Submitting application to ASM")
-super.submitApplication(appContext)
+yarnClient.submitApplication(appContext)
   }
 
+  def getApplicationReport = yarnClient.getApplicationReport _
+
+  def stop = yarnClient.stop _
--- End diff --

Should `yarnClient.stop` be called directly just like other yarnClient 
methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/850#discussion_r12930168
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: 
Configuration, spConf: Spa
   def submitApp(appContext: ApplicationSubmissionContext) = {
 // Submit the application to the applications manager.
 logInfo("Submitting application to ASM")
-super.submitApplication(appContext)
+yarnClient.submitApplication(appContext)
   }
 
+  def getApplicationReport = yarnClient.getApplicationReport _
--- End diff --

Is this necessary? You seem to be called yarnClient.getApplicationReport 
directly. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/851#issuecomment-43832566
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15132/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43832565
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15133/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/851#issuecomment-43832564
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43832563
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43831786
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43831793
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43831674
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/850#issuecomment-43831445
  
+1.  There's a separate version of Client under yarn/alpha that handles 
0.23, so this change shouldn't cause any problems on that front.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/850#issuecomment-43831359
  
Does this API also work on Hadoop 0.23? We have some constraints in that we 
want to support that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43830170
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15131/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43830169
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/850#issuecomment-43828253
  
+1 less private api usage yay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43827704
  
On standalone mode and Mesos, does this fix require the JARs to be 
accessible from the same URL on all nodes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43825518
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43825524
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage

2014-05-21 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/828#issuecomment-43825145
  
I dont think this cachePoint is a good idea at all. While it *can* give 
better performance, it fundamentally breaks the fault-tolerance properties of 
RDDs. If a cachePoint() an RDD with MEMORY_ONLY, and then the executor dies, 
you have no way to recover the lost partitions as there is not lineage 
information to how that RDD was created. All of Spark operations maintain this 
guarantee of fault-tolerance despite failed workers and breaking that is a bad 
idea. So this is a fundamentally unsafe operation to expose to the end-user.

In fact this is the same reason why checkpoint() has been implemented using 
HDFS, so that fault-tolerance property is maintained (data save to 
fault-tolerant storage) even if executors die. 

That said, there is a good middle ground out here. We can do what 
cachePoint() does while ensuring that the data is replicated within the 
executors (so better fault-tolerance guarantee) but not expose it to the users 
(so that it does break public API semantics). This would be a ALS-only solution.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1880] [SQL] Eliminate unnecessary job e...

2014-05-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/825#issuecomment-43824842
  
Done & merged #836.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1800 Add broadcast hash join opera...

2014-05-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/734#discussion_r12927830
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -142,6 +136,68 @@ case class HashJoin(
 
 /**
  * :: DeveloperApi ::
+ * Performs and inner hash join of two child relations by first shuffling 
the data using the join
+ * keys.
+ */
+@DeveloperApi
+case class ShuffledHashJoin(
+leftKeys: Seq[Expression],
+rightKeys: Seq[Expression],
+buildSide: BuildSide,
+left: SparkPlan,
+right: SparkPlan) extends BinaryNode with HashJoin {
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: 
Nil
+
+
+  def execute() = {
+buildPlan.execute().zipPartitions(streamedPlan.execute()) {
+  (buildIter, streamIter) => joinIterators(buildIter, streamIter)
+}
+  }
+}
+
+
+/**
+ * :: DeveloperApi ::
+ * Performs an inner hash join of two child relations.  When the operator 
is constructed, a Spark
+ * job is asynchronously started to calculate the values for the 
broadcasted relation.  This data
+ * is then placed in a Spark broadcast variable.  The streamed relation is 
not shuffled.
+ */
+@DeveloperApi
+case class BroadcastHashJoin(
+ leftKeys: Seq[Expression],
+ rightKeys: Seq[Expression],
+ buildSide: BuildSide,
+ left: SparkPlan,
+ right: SparkPlan)(@transient sc: SparkContext) extends BinaryNode 
with HashJoin {
+
+  override def otherCopyArgs = sc :: Nil
+
+  override def outputPartitioning: Partitioning = left.outputPartitioning
+
+  override def requiredChildDistribution =
+UnspecifiedDistribution :: UnspecifiedDistribution :: Nil
+
+  @transient
+  lazy val broadcastFuture = future {
+   sc.broadcast(buildPlan.executeCollect())
--- End diff --

In Spark 1.0, with the newly added garbage collection mechanism, when the 
query plan itself goes out of scope, the broadcast variable should also be 
cleaned automatically.

Another way we can do this is to have some query context object we pass 
around the entire physical query plan which tracks the stuff we need to clean 
up. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1896] Respect spark.master (and --maste...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/846#issuecomment-43824702
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15130/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/850#issuecomment-43824695
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/851#issuecomment-43824705
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/851#issuecomment-43824692
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1896] Respect spark.master (and --maste...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/846#issuecomment-43824700
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Configuration documentation updates

2014-05-21 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/851

Configuration documentation updates

1. Add  to configuration options
2. List env variables in tabular format to be consistent with other pages.
3. Moved Viewing Spark Properties section up.


This is against branch-1.0, but should be cherry picked into master as 
well. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark doc-config

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #851


commit 28ac0d3dd0b1cb9c79bc31409332aba78d868525
Author: Reynold Xin 
Date:   2014-05-21T22:47:51Z

Add  to configuration options, and list env variables in a table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

2014-05-21 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/spark/pull/850

SPARK-1898: In deploy.yarn.Client, use YarnClient not YarnClientImpl

https://issues.apache.org/jira/browse/SPARK-1898

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #850


commit 33e87eb4efc23156757e06fba4b79e4c0aed6903
Author: Colin Patrick Mccabe 
Date:   2014-05-21T22:48:04Z

SPARK-1898: In deploy.yarn.Client, use YarnClient rather than YarnClientImpl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1889] [SQL] Apply splitConjunctivePredi...

2014-05-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/836


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/849#discussion_r12927323
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -228,7 +228,9 @@ object SparkSubmit {
   if (isUserJar(args.primaryResource)) {
 jars = jars ++ Seq(args.primaryResource)
   }
-  sysProps.put("spark.jars", jars.mkString(","))
+  if (jars.nonEmpty) {
+sysProps.put("spark.jars", jars.mkString(","))
+  }
 }
--- End diff --

This  change ensures that spark.jars is not set to an empty string by 
SparkSubmit, if not jars have been specified. This is different from previous 
behavior where empty string was being passed on downstream, to YARN, etc. What 
are the repercussions of this?

@pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1889] [SQL] Apply splitConjunctivePredi...

2014-05-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/836#issuecomment-43823390
  
I've merged this into master & branch-1.0. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/849#discussion_r12927311
  
--- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -993,7 +993,11 @@ object SparkILoop {
   implicit def loopToInterpreter(repl: SparkILoop): SparkIMain = repl.intp
   private def echo(msg: String) = Console println msg
 
-  def getAddedJars: Array[String] = 
Option(System.getenv("ADD_JARS")).map(_.split(',')).getOrElse(new 
Array[String](0))
+  def getAddedJars: Array[String] = {
+val envJars = sys.env.get("ADD_JARS")
--- End diff --

Would it make sense to just merge the two lists rather than chose one or 
the other?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43823039
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1889] [SQL] Apply splitConjunctivePredi...

2014-05-21 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/836#issuecomment-43823052
  
@ueshin Thank you for the change. It looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43823045
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43822842
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43822767
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43822769
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15128/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43822766
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43822768
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43822770
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15129/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars (and --jars) i...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43822771
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15127/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1896] Respect --master and spark.master...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/846#issuecomment-43821205
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1896] Respect --master and spark.master...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/846#issuecomment-43821224
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...

2014-05-21 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/846#issuecomment-43820888
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars and --jars in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43819194
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars and --jars in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43819199
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars and --jars in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43816528
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43816549
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43816530
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars and --jars in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/849#issuecomment-43816537
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1897] Respect spark.jars and --jars in ...

2014-05-21 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/849

[SPARK-1897] Respect spark.jars and --jars in spark-shell

Spark shell currently overwrites `spark.jars` with `ADD_JARS`. This means 
the extra jars added through `bin/spark-shell --jars jar1,jar2` are discarded.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark shell-jars

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #849


commit d8549f7f9732a6c8992873bc802628210fa69cc8
Author: Andrew Or 
Date:   2014-05-21T21:23:12Z

Respect spark.jars and --jars in spark-shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43816325
  
@dbtsai Could you backport the patch to branch-0.9 and test it on your 
cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Small updates to Streaming Programming Guide

2014-05-21 Thread jaceklaskowski
Github user jaceklaskowski commented on the pull request:

https://github.com/apache/spark/pull/830#issuecomment-43816303
  
Please review the changes that were introduced after @tdas's comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/848#discussion_r12923805
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase {
 
 extraClassPath.foreach(addClasspathEntry)
 
-addClasspathEntry(Environment.PWD.$())
+val cachedSecondaryJarLinks =
+  
sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
 // Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.get("spark.yarn.user.classpath.first", 
"false").toBoolean) {
--- End diff --

I will update the doc. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/848#discussion_r12923791
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase {
 
 extraClassPath.foreach(addClasspathEntry)
 
-addClasspathEntry(Environment.PWD.$())
+val cachedSecondaryJarLinks =
+  
sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
 // Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.get("spark.yarn.user.classpath.first", 
"false").toBoolean) {
--- End diff --

`spark.files.userClassPath` is a global configuration that controls the 
ordering of dynamically added jars, while `spark.yarn.user.classpath.first` is 
only for YARN. I agree it is a little confusing, but this is independent of 
this PR. We can create a new JIRA for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43815204
  
Yes, we can also control the ordering in this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43814642
  
It works under driver before, so the major issue is those files are not in 
executor's distributed cache. But I like the idea to add them explicitly so 
we'll not miss anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43814337
  
The symbolic links may not be under the PWD. That is why it didn't work 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43812877
  
Thanks. It looks great for me, and better than my patch.

cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) is not needed since 
we have 
addPwdClasspathEntry("*"). But later, we may change the priority of the 
jars since we explicitly add them.

This patch also works for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/848#discussion_r12921709
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase {
 
 extraClassPath.foreach(addClasspathEntry)
 
-addClasspathEntry(Environment.PWD.$())
+val cachedSecondaryJarLinks =
+  
sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
 // Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.get("spark.yarn.user.classpath.first", 
"false").toBoolean) {
--- End diff --

PS, in line 47,   * 1. In standalone mode, it will launch an 
[[org.apache.spark.deploy.yarn.ApplicationMaster]]
should it be cluster mode now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/848#discussion_r12921552
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase {
 
 extraClassPath.foreach(addClasspathEntry)
 
-addClasspathEntry(Environment.PWD.$())
+val cachedSecondaryJarLinks =
+  
sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
 // Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.get("spark.yarn.user.classpath.first", 
"false").toBoolean) {
--- End diff --

What's difference between `spark.yarn.user.classpath.first` and 
`spark.files.userClassPathFirst `? For me, it seems to be the same thing with 
two different configuration. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1822] SchemaRDD.count() should use opti...

2014-05-21 Thread kanzhang
Github user kanzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/841#discussion_r12921105
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala ---
@@ -274,6 +274,10 @@ class SchemaRDD(
   seed: Long) =
 new SchemaRDD(sqlContext, Sample(fraction, withReplacement, seed, 
logicalPlan))
 
+  override def count(): Long = {
--- End diff --

Sure, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...

2014-05-21 Thread kanzhang
Github user kanzhang commented on the pull request:

https://github.com/apache/spark/pull/697#issuecomment-43810385
  
@rxin & @ahirreddy , thanks for the quick response!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...

2014-05-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/697


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...

2014-05-21 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/697#issuecomment-43809349
  
Thanks. I've merged this into master & branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Enable repartitioning of graph over different ...

2014-05-21 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/719#issuecomment-43807656
  
Seems Jenkins is broken, but this looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Enable repartitioning of graph over different ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/719#issuecomment-43804544
  
Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43804540
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/848#issuecomment-43804541
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15125/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Enable repartitioning of graph over different ...

2014-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/719#issuecomment-43804545
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15126/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >