date:20150218

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4668


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3781#discussion_r24890677
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -148,19 +152,16 @@ private[spark] class SparkDeploySchedulerBackend(
   super.applicationId
 }
 
+  def setShutdownCallback(f: SparkDeploySchedulerBackend = Unit) {
--- End diff --

OK, but why do you need this setter now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3781#issuecomment-74846885
  
  [Test build #27678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27678/consoleFull)
 for   PR 3781 at commit 
[`c146c93`](https://github.com/apache/spark/commit/c146c93b3df500881f716b5007304315a70fb641).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...

2015-02-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4628#issuecomment-74833670
  
@marmbrus I updated it with test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Minor doc fix in GBT classification ex...

GitHub user MechCoder opened a pull request:

https://github.com/apache/spark/pull/4672

[Minor] Minor doc fix in GBT classification example

numClassesForClassification has been renamed to numClasses.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MechCoder/spark minor-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4672


commit d2ddb7fb513dde84d34df08aa70c053042fa0ec8
Author: MechCoder manojkumarsivaraj...@gmail.com
Date:   2015-02-18T09:17:10Z

Minor doc fix in GBT classification example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74834330
  
ping @jkbradley ? I was not sure if I had to open a JIRA for this, as it is 
minor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4611#issuecomment-74842751
  
As I say on OS X you get the whole binary path, not just `java`:

```
ps -p ... -o comm=
...
/Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java
```

that's why I was thinking `if ps -p $TARGET_ID -o comm= | grep -q java ; 
then`

+ @nchammas for bash syntax thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5878] fix DataFrame.repartition() in Py...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4628#issuecomment-74834156
  
  [Test build #27676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27676/consoleFull)
 for   PR 4628 at commit 
[`ecb3bcd`](https://github.com/apache/spark/commit/ecb3bcd74914128cc65fa0c4b3454e1914d18a9f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74843700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27677/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4628#issuecomment-74843574
  
  [Test build #27676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27676/consoleFull)
 for   PR 4628 at commit 
[`ecb3bcd`](https://github.com/apache/spark/commit/ecb3bcd74914128cc65fa0c4b3454e1914d18a9f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4628#issuecomment-74843582
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27676/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74843689
  
  [Test build #27677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27677/consoleFull)
 for   PR 4672 at commit 
[`d2ddb7f`](https://github.com/apache/spark/commit/d2ddb7fb513dde84d34df08aa70c053042fa0ec8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

2015-02-18 Thread lukovnikov

Github user lukovnikov commented on the pull request:

https://github.com/apache/spark/pull/4650#issuecomment-74851018
  
style errors fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74834704
  
  [Test build #27677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27677/consoleFull)
 for   PR 4672 at commit 
[`d2ddb7f`](https://github.com/apache/spark/commit/d2ddb7fb513dde84d34df08aa70c053042fa0ec8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74834754
  
I will merge this back to 1.2. It really should just be an addendum to 
https://issues.apache.org/jira/browse/SPARK-4610


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...

2015-02-18 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/3781#discussion_r24893296
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -148,19 +152,16 @@ private[spark] class SparkDeploySchedulerBackend(
   super.applicationId
 }
 
+  def setShutdownCallback(f: SparkDeploySchedulerBackend = Unit) {
--- End diff --

It's no longer needed so I remove it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5669 [BUILD] [HOTFIX] Spark assembly inc...

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/4673

SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed 
libgfortran, libgcc code via JBLAS

Correct exclusion path for JBLAS native libs.
(More explanation coming soon on the mailing list re: 1.3.0 RC1)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-5669.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4673


commit e29693cc1eceb5c7917a36d93e77a158915f2a0c
Author: Sean Owen so...@cloudera.com
Date:   2015-02-18T11:29:55Z

Correct exclusion path for JBLAS native libs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Avoid deprecation warnings in JDBCSuite.

2015-02-18 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4668#issuecomment-74832153
  
This is great. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4650#issuecomment-74851295
  
  [Test build #27680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull)
 for   PR 4650 at commit 
[`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-18 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74908505

Sorry, but this patch is not correct. As @growse mentions, when
`SPARK_LOCAL_DIRS` is not set, this code will try to change the permissions of
`/tmp` on Unix machines. It will also use `/tmp/` as the local dir for the
driver in client mode, which was the exact thing the original change was trying
to avoid.

The correct fix here, if you really care about cleaning up the extra
directory, is to export a different env variable from the `Worker`
([here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala#L134))
and handle that variable specially in `getOrCreateLocalRootDirs`. When that
new env variable is set, the code would behave just like the
`isRunningInYarnContainer()` case above the change you're making.

@srowen the current code shouldn't create a cascade of directories, but it
does create a two-level-deep spark- hierarchy for executors in standalone
mode.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-18 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74910370
  
Ah, wait, there's a second problem (which would result in the cascading 
directories, I think). `getLocalDir` should cache the local directory it 
returns, to avoid having to recreate it. (And should probably be made 
synchronized in the process.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5507] Added documentation for BlockMatr...

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4664#issuecomment-74915682
  
LGTM. Merged into master and branch-1.3. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-18 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74916479
  
Hi, me again, sorry for the spam. Regarding my last comment, it's probably 
better if `getOrCreateLocalRootDirs()` caches its return value instead of 
`getLocalDir()`, since the former is called in several places, and doing that 
would also cover the `getLocalDir()` case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5507] Added documentation for BlockMatr...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4664


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4671#issuecomment-74913992
  
  [Test build #27682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27682/consoleFull)
 for   PR 4671 at commit 
[`3168b4b`](https://github.com/apache/spark/commit/3168b4b19971f1e82c91f561d3abc3f3141dfa9b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...

2015-02-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4610#discussion_r24922476
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala ---
@@ -66,9 +66,17 @@ private[sql] class DefaultSource
   mode match {
 case SaveMode.Append =
   sys.error(sAppend mode is not supported by 
${this.getClass.getCanonicalName})
-case SaveMode.Overwrite =
-  fs.delete(filesystemPath, true)
+case SaveMode.Overwrite = {
+  try {
+fs.delete(filesystemPath, true)
+  } catch {
+case e: IOException =
+  throw new IOException(
+sUnable to clear output directory 
${filesystemPath.toString} prior
+  + s to CREATE a JSON table AS SELECT:\n${e.toString})
+  }
--- End diff --

@yanbohappy Seems we just throw another error message at here. Based on 
your JIRA description, I think you need to check if delete returns true or 
false when data already exists. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4661#issuecomment-74915467
  
Merged into master and branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-74919610
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/4675

[SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release

For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL 
DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.

For SPARK-5892:
* Fix Python docs
* Various other cleanups

CC: @mengxr  (ML),  @davies  (Python docs)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark doc-review-1.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4675


commit e57282727570ce7a47cf4144ae2db33c874f6357
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T02:34:12Z

updated programming guide for ml and mllib

commit b05a80de67b645bf68d7ab87123720517d0222e1
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T02:35:22Z

organize imports. doc cleanups

commit a72c018ddf029496cb5a158d8a22aafd9f819483
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T02:36:50Z

made ChiSqTestResult appear in python docs

commit 695f3f62b202f319a2dbf9c6fd8436be280ca48d
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T02:37:19Z

partly done trying to fix inherit_doc for class hierarchies in python docs

commit 8cce91c47e9633a11c911e879e489df7c54324e1
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T19:24:27Z

GMM: removed old imports, added some doc

commit da16aef6800b16a708739c96bc1ef713043eb461
Author: Joseph K. Bradley jos...@databricks.com
Date:   2015-02-18T19:24:56Z

Fixed python mllib docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74931339
  
  [Test build #27684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27684/consoleFull)
 for   PR 4675 at commit 
[`da16aef`](https://github.com/apache/spark/commit/da16aef6800b16a708739c96bc1ef713043eb461).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4672#issuecomment-74932384
  
(belatedly) Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4676#issuecomment-74932367
  
  [Test build #27685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27685/consoleFull)
 for   PR 4676 at commit 
[`bfabd91`](https://github.com/apache/spark/commit/bfabd91d350fbb48c103896a585b362c7c823c2d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...

2015-02-18 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4583#issuecomment-74931875
  
@florianverhein - Sorry for the delay. I just tested this out and it seemed 
to work okay. One thing that I was confused by is that its not very clear where 
the files are ending up on the master. For example I had a directory 
`/home/shivaram/dotfiles` that I passed in as the argument. I think it would be 
good to rsync this to `/root/dotfiles` on the master ? Right now the behavior 
is that the files inside the directory (like say `.vimrc`) are put in `/root/` 
(i.e. I got `/root/.vimrc`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark 5889] Remove pid file after stopping se...

2015-02-18 Thread zhzhan

GitHub user zhzhan opened a pull request:

https://github.com/apache/spark/pull/4676

[Spark 5889] Remove pid file after stopping service.

Currently the pid file is not deleted, and potentially may cause some 
problem after service is stopped. The fix remove the pid file after service 
stopped.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhzhan/spark spark-5889

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4676


commit e63bfa5dcd7f35e0101a8aa6631a1ae29b81a399
Author: Zhan Zhang zhaz...@gmail.com
Date:   2014-08-08T17:47:18Z

test

commit c0c7d2ae0dcd4d8921513910985e10f1f58e8ab4
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-01-07T21:01:45Z

squash all commits

commit bfabd91d350fbb48c103896a585b362c7c823c2d
Author: Zhan Zhang zhaz...@gmail.com
Date:   2015-02-18T19:22:23Z

spark-5889: remove pid file after stopping service




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4671#issuecomment-74932737
  
  [Test build #27682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27682/consoleFull)
 for   PR 4671 at commit 
[`3168b4b`](https://github.com/apache/spark/commit/3168b4b19971f1e82c91f561d3abc3f3141dfa9b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4671#issuecomment-74932750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27682/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-74934591
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27683/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-74934575
  
  [Test build #27683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27683/consoleFull)
 for   PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74931104
  
Note: The altered examples in the spark.ml guide were copied from 
executable examples in the examples/ directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74932285
  
  [Test build #27686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27686/consoleFull)
 for   PR 4675 at commit 
[`34b067f`](https://github.com/apache/spark/commit/34b067fba0bb7602b69d0f5fdcde5ce470786de4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-74920638
  
  [Test build #27683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27683/consoleFull)
 for   PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5570: No docs stating that `new SparkCon...

2015-02-18 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4665#issuecomment-74920473
  
Hey @ilganeli thanks for doing this. Can you also do this for the other 
`spark.driver.*` options? Like extra java opts, class paths etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5548: Fix for AkkaUtilsSuite failure - a...

2015-02-18 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4653#discussion_r24927231
  
--- Diff: core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala ---
@@ -370,9 +371,13 @@ class AkkaUtilsSuite extends FunSuite with 
LocalSparkContext with ResetSystemPro
 val selection = slaveSystem.actorSelection(
   AkkaUtils.address(AkkaUtils.protocol(slaveSystem), spark, 
localhost, boundPort, MapOutputTracker))
 val timeout = AkkaUtils.lookupTimeout(conf)
-intercept[TimeoutException] {
-  slaveTracker.trackerActor = 
Await.result(selection.resolveOne(timeout * 2), timeout)
+val result = Try(Await.result(selection.resolveOne(timeout * 2), 
timeout))
+
+assert(result.isFailure === true)
+val exception = result match {
+  case Failure(ex) = ex
 }
--- End diff --

this will create a lot of warnings complaining that the match is not 
exhaustive. I thin you'll need to add a `case _ = fail(...)` to fix this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...

2015-02-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4337#discussion_r24929712
  
--- Diff: 
external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
 ---
@@ -113,7 +115,8 @@ class MQTTStreamSuite extends FunSuite with Eventually 
with BeforeAndAfter {
   }
 
   private def findFreePort(): Int = {
--- End diff --

We don't have a test utilities subproject, so this ends up getting 
duplicated, but note that we also have duplication of classes like 
LocalSparkContext; fixing this broader issue is outside the scope of this PR 
(there's a few JIRAs to track the creation of a test utilities project, 
though).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Merge pull request #1 from apache/master

2015-02-18 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4553#issuecomment-74928572
  
It looks like this was opened by mistake; do you mind closing this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-18 Thread gurvindersingh

Github user gurvindersingh commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74931493
  
will be nice to have this patch merged in for 1.3 release. As we plan to 
use this feature with Mesos and Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5548: Fix for AkkaUtilsSuite failure - a...

2015-02-18 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4653#discussion_r24927173
  
--- Diff: core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala ---
@@ -370,9 +371,13 @@ class AkkaUtilsSuite extends FunSuite with 
LocalSparkContext with ResetSystemPro
 val selection = slaveSystem.actorSelection(
   AkkaUtils.address(AkkaUtils.protocol(slaveSystem), spark, 
localhost, boundPort, MapOutputTracker))
 val timeout = AkkaUtils.lookupTimeout(conf)
-intercept[TimeoutException] {
-  slaveTracker.trackerActor = 
Await.result(selection.resolveOne(timeout * 2), timeout)
+val result = Try(Await.result(selection.resolveOne(timeout * 2), 
timeout))
+
+assert(result.isFailure === true)
--- End diff --

you can just do `assert(result.isFailure)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5570: No docs stating that `new SparkCon...

2015-02-18 Thread ilganeli

Github user ilganeli commented on the pull request:

https://github.com/apache/spark/pull/4665#issuecomment-74931272
  
Sure @andrewor14 , I presume their behavior is identical ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74936971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27687/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74936955
  
  [Test build #27687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27687/consoleFull)
 for   PR 3074 at commit 
[`0d6d2b3`](https://github.com/apache/spark/commit/0d6d2b304d56b65d7e2fa61d762ae787d35a2e75).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4675#discussion_r24935321
  
--- Diff: docs/ml-guide.md ---
@@ -171,12 +171,12 @@ import org.apache.spark.sql.{Row, SQLContext}
 val conf = new SparkConf().setAppName(SimpleParamsExample)
 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
-import sqlContext._
+import sqlContext.implicits._
 
 // Prepare training data.
-// We use LabeledPoint, which is a case class.  Spark SQL can convert RDDs 
of case classes
-// into SchemaRDDs, where it uses the case class metadata to infer the 
schema.
-val training = sparkContext.parallelize(Seq(
+// We use LabeledPoint, which is a case class.  Spark SQL can convert RDDs 
of Java Beans
--- End diff --

This is under Scala context. `case classes` or `case class instances` may 
be better than `JavaBeans`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74936967
  
  [Test build #27687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27687/consoleFull)
 for   PR 3074 at commit 
[`0d6d2b3`](https://github.com/apache/spark/commit/0d6d2b304d56b65d7e2fa61d762ae787d35a2e75).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

2015-02-18 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4675#discussion_r24936197
  
--- Diff: python/pyspark/ml/pipeline.py ---
@@ -18,7 +18,8 @@
 from abc import ABCMeta, abstractmethod
 
 from pyspark.ml.param import Param, Params
-from pyspark.ml.util import inherit_doc, keyword_only
+from pyspark.ml.util import keyword_only
+from pyspark.mllib.__init__ import inherit_doc
--- End diff --

from pyspark.mllib import inherit_doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74940888
  
@mateiz where do you suggest putting this Dockerfile? I have a Dockerfile 
that builds Spark from source that depends on the Mesos image here: 
https://github.com/tnachen/spark/blob/dockerfile/Dockerfile
@hellertime you can use this if you like or make modifications with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

2015-02-18 Thread mbofb

Github user mbofb commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74948488
  
description of RowMatrix.computeSVD and mllib-dimensionality-reduction.html:
We assume n is smaller than m. Is this just a recommendation or a hard 
requirement. This condition seems not to be checked and causing an 
IllegalArgumentException â the processing finishes even though the vectors 
have a higher dimension than the number of vectors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

2015-02-18 Thread mbofb

Github user mbofb commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74949165
  
description of RowMatrix. computePrincipalComponents or RowMatrix in 
general:
I got a Exception.
java.lang.IllegalArgumentException: Argument with more than 65535 cols: 
7949273
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.checkNumColumns(RowMatrix.scala:131)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeCovariance(RowMatrix.scala:318)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computePrincipalComponents(RowMatrix.scala:373)
This 65535 cols restriction would be nice to be written in the doc (if this 
still applies in 1.3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4654#issuecomment-74951424
  
@MechCoder Could you share some performance comparison results?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4677#issuecomment-74953365
  
  [Test build #27689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27689/consoleFull)
 for   PR 4677 at commit 
[`07c8f12`](https://github.com/apache/spark/commit/07c8f12bc72b11ae780095a73662b5e049dc6e22).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

2015-02-18 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-74950927
  
We spoke a bit offline about this, but my feeling was that the best thing 
here might be to add a way to launch the shuffle service as a standalone 
application (initially, not one managed by Mesos) so that it can be shared 
across Spark applications. That would involve writing some simple launching 
scripts for it in a similar way to existing daemons we launch, and you'd ask 
users to launch the shuffle service similar to other storage systems like HDFS. 
That's very simple and would avoid diverging a lot between Mesos and the other 
modes. And longer term we could actually have a single shared shuffle service 
that is scheduled by mesos.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

Github user tnachen closed the pull request at:

https://github.com/apache/spark/pull/3861


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-74951847
  
Agree and it's currently being worked on. We can close this PR too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-18 Thread mateiz

Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74951728
  
The docker folder is for test images, but it could be a good place for this 
one. I'll let @pwendell comment on it.

Does Apache Mesos publish a base Docker image? It would be easier to base 
it on that if that would get updated with each release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-02-18 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4221#issuecomment-74952993
  
Yeah our auto-close doesn't work on PR's into release branches like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

GitHub user MechCoder opened a pull request:

https://github.com/apache/spark/pull/4677

[SPARK-5436] [MLlib] Validate GradientBoostedTrees during train

One can early stop if the decrease in error rate is lesser than a certain 
tol, or if the error increases if the training data is overfit.

This introduces a new method which takes in a pair of RDD's , one for the 
training data and the other for the validation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MechCoder/spark spark-5436

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4677


commit 07c8f12bc72b11ae780095a73662b5e049dc6e22
Author: MechCoder manojkumarsivaraj...@gmail.com
Date:   2015-02-18T21:23:33Z

[SPARK-5436] Validate GradientBoostedTrees during train




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4677#issuecomment-74953724
  
@jkbradley I just wanted to know if this is in the right direction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4677#issuecomment-74954211
  
  [Test build #27690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27690/consoleFull)
 for   PR 4677 at commit 
[`7534d14`](https://github.com/apache/spark/commit/7534d145d8cf686221647bffeeb2c404dddc575d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...

2015-02-18 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4583#issuecomment-74954531
  
Hmm okay - My other concern was also that the directory itself wasn't 
maintained. i.e. it might be better to put the deploy-root-dir into `/` as a 
directory (`/dotfiles/.vimrc` instead of `/.vimrc`) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74956760
  
Mesosphere does publish a Mesos image on each release (mesosphere/mesos), 
with the each version tagged.
We don't tag the latest release with the :latest tag, I could go change 
that for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...

2015-02-18 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4671#issuecomment-74957281
  
Thanks, merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: example of python converter for avrò output f...

2015-02-18 Thread daria-sukhareva

GitHub user daria-sukhareva opened a pull request:

https://github.com/apache/spark/pull/4678

example of python converter for avrÃ² output format

I actually wanted to know if I am doing it right rather than suggest 
pulling it to spark repo

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daria-sukhareva/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4678


commit 2ba7b213572d6ce2056cfc2536b701ae689c7f98
Author: daria daria.sukhar...@rubikloud.com
Date:   2015-02-18T21:49:45Z

avrÃ² output format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4677#issuecomment-74957382
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27690/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: example of python converter for avrò output f...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4678#issuecomment-74957691
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5710][SQL] Combines two adjacent Cast e...

2015-02-18 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4497#issuecomment-74957611
  
Agreed.  If there are concrete proposals for eliminating redundant casts 
then we should discuss on JIRA.  However as is this could change the answer and 
thus is an invalid optimization.  So, we should close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...

2015-02-18 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/4671#issuecomment-74957462
  
Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4677#issuecomment-74957377
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27689/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4675#discussion_r24935349
  
--- Diff: docs/mllib-guide.md ---
@@ -90,6 +90,21 @@ version 1.4 or newer.
 
 # Migration Guide
 
+## From 1.2 to 1.3
+
+In the `spark.mllib` package:
+
+* *(Breaking change)* In 
[`ALS`](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS), the 
extraneous method `solveLeastSquares` has been removed.  The `DeveloperApi` 
method `analyzeBlocks` was also removed.
--- End diff --

Shall we try to make the sections as code tabs? It is getting longer and 
longer.

For the `breaking change`, we should mention that they are experimental or 
developer APIs. `ALS.solverLeastSquares` is perhaps the only outlier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4675#discussion_r24935325
  
--- Diff: docs/ml-guide.md ---
@@ -300,19 +302,21 @@ ListLabeledPoint localTest = Lists.newArrayList(
 new LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)),
 new LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)),
 new LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5)));
-JavaSchemaRDD test = jsql.createDataFrame(jsc.parallelize(localTest), 
LabeledPoint.class);
+DataFrame test = jsql.createDataFrame(jsc.parallelize(localTest), 
LabeledPoint.class);
 
 // Make predictions on test documents using the Transformer.transform() 
method.
 // LogisticRegression.transform will only use the 'features' column.
-// Note that model2.transform() outputs a 'probability' column instead of 
the usual 'score'
-// column since we renamed the lr.scoreCol parameter previously.
-model2.transform(test).registerAsTable(results);
-JavaSchemaRDD results =
-jsql.sql(SELECT features, label, probability, prediction FROM 
results);
+// Note that model2.transform() outputs a 'myProbability' column instead 
of the usual
+// 'probability' column since we renamed the lr.probabilityCol parameter 
previously.
+model2.transform(test).registerTempTable(results);
+DataFrame results =
+jsql.sql(SELECT features, label, myProbability, prediction FROM 
results);
--- End diff --

With the DataFrame API, we don't need to call SQL now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5673] [MLlib] Implement Streaming wrapp...

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4456#issuecomment-74936361
  
@catap Can you please add a description for this PR?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-18 Thread hellertime

Github user hellertime commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74944182
  
@tnachen That Dockerfile you have is actually all that is needed for an 
example image; that its based on the mesosphere image is even better!

I had hoped that there could be an actual image on the Docker hub which 
could be referenced from the properties example. Is that image on the Docker 
hub?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74946826
  
  [Test build #27686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27686/consoleFull)
 for   PR 4675 at commit 
[`34b067f`](https://github.com/apache/spark/commit/34b067fba0bb7602b69d0f5fdcde5ce470786de4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74946839
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27686/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74939602
  
  [Test build #27688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27688/consoleFull)
 for   PR 3074 at commit 
[`127aaa8`](https://github.com/apache/spark/commit/127aaa8050b34925e511b8d8131dfb1e75841be8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74939621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27688/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74945101
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...

2015-02-18 Thread florianverhein

Github user florianverhein commented on the pull request:

https://github.com/apache/spark/pull/4583#issuecomment-74946543
  
Thanks @shivaram. I'm not sure I follow 100%. With that argument they 
should have ended up eg /.vimrc (unless root is a subdirectory of dotfiles). 
The contents if '--deploy-root-dir' end up in /, not /root/ (ie as documented 
in the help). This is necessary because you may want to copy files elsewhere on 
the file system. Eg /opt. It's just unfortunate that the existence of /root/ 
means root is not unambiguous. Therefore I made sure to use / in the help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...

2015-02-18 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4420#issuecomment-74946838
  
@mccheah @mingyukim yeah, there isn't an OOM proof solution at all because 
these are all heuristics. Even checking every element is not OOM proof since 
memory estimation is itself a heuristic that involves sampling. My only concern 
with exposing knobs here is that users will expect us to support these going 
forward, even though we may want to refactor this in the future in a way where 
those knobs don't make sense anymore. It's reasonable users would consider it a 
regression if their tuning of those knobs stopped working.

So if possible, it would be good to adjust our heuristics to meet a wider 
range of use cases and then if we keep hearing more issues we can expose knobs. 
We can't have them meet every possible use case, since they are heuristics, but 
in this case I was wondering if we could have a strict improvement to the 
heuristics. @andrewor14 can you comment on whether this is indeed a strict 
improvement?

One of the main benefits of the new data frames API is that we will be able 
to have precise control over memory usage in a way that can avoid OOM's ever. 
But for the current Spark API we are using this more ad-hoc memory estimation 
along with some heuristics.

I'm not 100% against exposing knobs either, but I'd be interested if some 
simple improvements fix your use case. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4676#issuecomment-74946963
  
  [Test build #27685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27685/consoleFull)
 for   PR 4676 at commit 
[`bfabd91`](https://github.com/apache/spark/commit/bfabd91d350fbb48c103896a585b362c7c823c2d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4676#issuecomment-74946981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27685/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-02-18 Thread hellertime

Github user hellertime commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74937819
  
So perhaps putting an example Dockerfile in the `docker` subdirectory is 
not an appropriate thing to do... any suggestions on a better location for 
examples such as this? The `examples` directory also would be inappropriate I 
think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

2015-02-18 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4675#discussion_r24936278
  
--- Diff: python/pyspark/mllib/__init__.py ---
@@ -33,3 +34,20 @@
 random.__name__ = 'random'
 random.RandomRDDs.__module__ = __name__ + '.random'
 sys.modules[__name__ + '.random'] = random
+
+
+def inherit_doc(cls):
--- End diff --

Move this into mllib/common.py ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-74939617
  
  [Test build #27688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27688/consoleFull)
 for   PR 3074 at commit 
[`127aaa8`](https://github.com/apache/spark/commit/127aaa8050b34925e511b8d8131dfb1e75841be8).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74945089
  
  [Test build #27684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27684/consoleFull)
 for   PR 4675 at commit 
[`da16aef`](https://github.com/apache/spark/commit/da16aef6800b16a708739c96bc1ef713043eb461).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...

2015-02-18 Thread mbofb

Github user mbofb commented on the pull request:

https://github.com/apache/spark/pull/4675#issuecomment-74948403
  
The description of RowMatrix.computeSVD and 
mllib-dimensionality-reduction.html should be more precise/explicit regarding 
the m x n matrix. In the current description I would conclude that n refers to 
the rows. According to 
http://math.stackexchange.com/questions/191711/how-many-rows-and-columns-are-in-an-m-x-n-matrix
 this way of describing a matrix is only used in particular domains. I as a 
reader interested on applying  SVD would rather prefer the more common m x n 
way of rows x columns (e.g. 
http://en.wikipedia.org/wiki/Matrix_%28mathematics%29 ) which is also used in 
http://en.wikipedia.org/wiki/Latent_semantic_analysis (and also within the 
ARPACK manual:
â
N   Integer.  (INPUT) - Dimension of the eigenproblem. 
NEV Integer.  (INPUT) - Number of eigenvalues of OP to be computed. 0  
NEV  N. 
NCV Integer.  (INPUT) - Number of columns of the matrix V (less than or 
equal to N).
â
).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...