date:20150323

[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...

2015-03-23 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/5119#discussion_r26925330
  
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -1,140 +0,0 @@
-/*
--- End diff --

yes. These are the two files which were copied over, then allowed to age 
while the originals were maintained.

1. cut these files and things don't build any more
1.  add the mvn changes and they do compile, except where 
`CachedTableSuite` had another method from the original tests pasted in.
1. remove that method, the updated parent class exports the method, and all 
is well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-84933397
  
  [Test build #28986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28986/consoleFull)
 for   PR 4491 at commit 
[`b522f23`](https://github.com/apache/spark/commit/b522f23438e119b2c987374ed6d64aa2b7317421).
 * This patch **fails MiMa tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AesCtrCryptoCodec extends CryptoCodec `
  * `case class CipherSuite(name: String, algoBlockSize: Int) `
  * `abstract case class CryptoCodec() `
  * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,`
  * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, 
bufferSizeVal: Int,`
  * `trait Decryptor `
  * `trait Encryptor `
  * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec 
with Logging `
  * `  class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor 
with Decryptor `
  * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends 
AesCtrCryptoCodec with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-84933426
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28986/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6452] [SQL] Checks for missing attribut...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5129#issuecomment-84939647
  
  [Test build #28991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28991/consoleFull)
 for   PR 5129 at commit 
[`52cdc69`](https://github.com/apache/spark/commit/52cdc69fcbf40968628b62366891fd5e43b80299).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-23 Thread kellyzly

Github user kellyzly commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-84855188
  
@steveloughran: i don't understand why need make 
CryptoOutputStream.scala#close safe. Is there situation when multiple threads 
call this function at the same time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...

2015-03-23 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5132#issuecomment-84855187
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...

2015-03-23 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5132#issuecomment-84855200
  
LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6449][YARN] Report failure status if dr...

2015-03-23 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/5130#issuecomment-84860442
  
 Do InvocationTargetExceptions only wrap Exceptions and not all Throwables?

It will wrap Error, too. Run the following codes in my machine, 

```Scala
class Foo {}

object Foo {

  def main(args: Array[String]): Unit = {
val a = ArrayBuffer[String]()
while(true) {
  a += 11
}
  }
}

object Bar {

  def main(args: Array[String]): Unit = {
val mainMethod = classOf[Foo].getMethod(main, classOf[Array[String]])
mainMethod.invoke(null, null)
  }

}
```
and it outputs,

```
Exception in thread main java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at Bar$.main(Nio.scala:72)
at Bar.main(Nio.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:99)
at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:47)
at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:83)
at Foo$.main(Nio.scala:62)
at Foo.main(Nio.scala)
... 11 more

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...

2015-03-23 Thread watermen

Github user watermen commented on the pull request:

https://github.com/apache/spark/pull/5132#issuecomment-84897442
  
@liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6466][SQL] Remove unnecessary attribute...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5134#issuecomment-84930028
  
  [Test build #28988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28988/consoleFull)
 for   PR 5134 at commit 
[`8e16206`](https://github.com/apache/spark/commit/8e16206aa7b8ece8521a64bfabdafbe925ce8e75).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4697#issuecomment-84936129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28985/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4697#issuecomment-84936098
  
  [Test build #28985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28985/consoleFull)
 for   PR 4697 at commit 
[`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6352] [SQL] Add DirectParquetOutputComm...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5042#issuecomment-84939649
  
  [Test build #28992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28992/consoleFull)
 for   PR 5042 at commit 
[`9ae7545`](https://github.com/apache/spark/commit/9ae7545701f522702f2d0240367fc6fba06b7c26).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6379][SQL] Support a functon to call us...

2015-03-23 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5061#discussion_r26916348
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -212,6 +212,22 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   val udf: UDFRegistration = new UDFRegistration(this)
 
   /**
+   * Call an user-defined function which is registered
+   * in functionRegistry.
+   * Example:
+   * {{{
+   *  import org.apache.spark.sql._
+   *
+   *  val df = Seq((id1, 1), (id2, 4), (id3, 5)).toDF(id, value)
+   *  val sqlctx = df.sqlContext
+   *  sqlctx.udf.register(simpleUdf, (v: Int) = v * v)
+   *  df.select($id, sqlctx.callUdf(simpleUdf, $value))
--- End diff --

No `sqlCtx.` once this is moved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-23 Thread kellyzly

Github user kellyzly commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-84893111
  
@steveloughran :  in hadoop, if we need add a native lib path to hadoop 
execution path, we need export LD_LIBRARY_PATH
export LD_LIBRARY_PATH=x
in hadoop, LD_LIBRARY_PATH is saved in ContainerLaunchContext#environment.

so in spark, if we need add a native lib path to spark execution path, we 
just set the 
[ContainerLaunchContext#environment](https://github.com/kellyzly/spark/blob/b522f23438e119b2c987374ed6d64aa2b7317421/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#l548)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6356][SQL] Support the ROLLUP/CUBE/GROU...

2015-03-23 Thread watermen

Github user watermen commented on the pull request:

https://github.com/apache/spark/pull/5080#issuecomment-84899209
  
@yhuai Any more comment on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...

2015-03-23 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4986#issuecomment-84913707
  
What would be the reason to add a Save Load Version 1.0. What are the 
expected changes to be done in further versions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6466][SQL] Remove unnecessary attribute...

2015-03-23 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/5134

[SPARK-6466][SQL] Remove unnecessary attributes when resolving GroupingSets

When resolving `GroupingSets`, we currently list all outputs of 
`GroupingSets`'s child plan. However, the columns that are not in groupBy 
expressions and not used by aggregation expressions are unnecessary and can be 
removed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 remove_attr_expand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5134


commit 8e16206aa7b8ece8521a64bfabdafbe925ce8e75
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-03-23T09:58:54Z

Only keep necessary attribute output.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...

2015-03-23 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/5119#discussion_r26925134
  
--- Diff: pom.xml ---
@@ -1472,6 +1474,46 @@
 groupIdorg.scalatest/groupId
 artifactIdscalatest-maven-plugin/artifactId
   /plugin
+  !-- Build the JARs--
+  plugin
+groupIdorg.apache.maven.plugins/groupId
+artifactIdmaven-jar-plugin/artifactId
+version${maven-jar-plugin.version}/version
+configuration
+  !-- Configuration of the archiver --
+  archive
--- End diff --

primarily to say what you want and the version. if the version control is 
cut, not needed any more


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4930#issuecomment-84843466
  
  [Test build #28981 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28981/consoleFull)
 for   PR 4930 at commit 
[`b1d68bf`](https://github.com/apache/spark/commit/b1d68bfde905d469369d85fc7f935f1089b26c36).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6449][YARN] Report failure status if dr...

2015-03-23 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/5130#issuecomment-84860777
  
 If they wrap Errors as well, then the fix would be to replace Exception 
with Throwable in the match block of the InvocationTargetException cause.

This has been fixed in #4773


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4930#issuecomment-84876719
  
  [Test build #28982 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28982/consoleFull)
 for   PR 4930 at commit 
[`2ce590f`](https://github.com/apache/spark/commit/2ce590f67c2e1404cba62b103f999ba119b02a37).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4930#issuecomment-84876740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28982/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...

2015-03-23 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/5119#discussion_r26925089
  
--- Diff: pom.xml ---
@@ -158,6 +158,7 @@
 fasterxml.jackson.version2.4.4/fasterxml.jackson.version
 snappy.version1.1.1.6/snappy.version
 netlib.java.version1.1.2/netlib.java.version
+maven-jar-plugin.version2.6/maven-jar-plugin.version
--- End diff --

Lifted it from the hadoop code. the parent one is @ v 2.4, so it comes down 
to whether you are happy with what that parent gives you or not. Easy to alter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4697#issuecomment-84940364
  
  [Test build #28994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28994/consoleFull)
 for   PR 4697 at commit 
[`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update the command to use IPython notebook

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5111#issuecomment-85198759
  
OK, nevermind my question. I think it's clear you know what to do here and 
it's as you think it should be. I'll leave it open a bit for any other opinions 
but if it's making the example work for more ipython versions, fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5144#issuecomment-85221484
  
  [Test build #29030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29030/consoleFull)
 for   PR 5144 at commit 
[`2b5e23c`](https://github.com/apache/spark/commit/2b5e23c2402c8fbee73c49f1780c3219da1188fa).
 * This patch **fails Scala style tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5144#issuecomment-85220925
  
  [Test build #29030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29030/consoleFull)
 for   PR 5144 at commit 
[`2b5e23c`](https://github.com/apache/spark/commit/2b5e23c2402c8fbee73c49f1780c3219da1188fa).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-03-23 Thread hellertime

Github user hellertime commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85229862
  
@tnachen I'm stumped at the moment. I've gone so far as to exclude the 
explicit docker/spark-mesos/Dockerfile path, but it is still not excluded. I 
had put this down so I haven't looked at it in a few days, nor merged in HEAD, 
but no the .rat-excludes is still stopping me. Its probably a typo that I've 
stared at too long (:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...

2015-03-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5118


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

2015-03-23 Thread hunglin

Github user hunglin commented on the pull request:

https://github.com/apache/spark/pull/5124#issuecomment-85186794
  
@JoshRosen thanks for the suggestions.  Let's me work on those tonight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5143#issuecomment-85215859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29027/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5145#issuecomment-85223025
  
Agree, I like this one. Fail-fast checks should go first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5085#discussion_r26985197
  
--- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java ---
@@ -47,10 +47,14 @@
* character. On Windows, the output is a command line suitable for 
direct execution from the
* script.
*/
+
+  static String uberJarPath;
--- End diff --

This looks really ugly. I'd really prefer plumbing this to the command 
builders through the constructor. It's a little bit more code but much cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...

2015-03-23 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4867#issuecomment-85183514
  
The master SBT build is currently broken for a few Hadoop profiles due to 
dependency issues.  Do you think that this patch may have been responsible?  I 
noticed that it wasn't tested by Jenkins prior to being merged (the last test 
was 18 days ago with an earlier version of the patch).  See 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/1940/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update the command to use IPython notebook

2015-03-23 Thread yuecong

Github user yuecong commented on the pull request:

https://github.com/apache/spark/pull/5111#issuecomment-85188711
  
Let me clarify my opinions more clearly.
1, change '$ PYSPARK_DRIVER_PYTHON=ipython 
PYSPARK_DRIVER_PYTHON_OPTS=notebook --pylab inline ./bin/pyspark' to $ 
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook  
./bin/pyspark'.
 For this, we agree it as for it will not work any more for ipython 3.0
2, Whether it is necessary to methion '%pylab inline' or not. 
  I think it is necessary, as for this give the users to understand that 
with ipython notebook, they can visualize their data with pylab, which is 
different from ipython shell.
3, Whether it needs to add how to launch a notebook from ipython notebook 
UI.
 Originally, I add the explanation on the base of ipython3.0, but as you 
commented, I find the ipython notebook UI is different between 2.x and 3.x, so 
I agree we may do not need to explain it in detail to make the guide be 
suitable for all version of ipythons.

Hope the above can clarify my opinions. :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4337#issuecomment-85192767
  
  [Test build #29024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29024/consoleFull)
 for   PR 4337 at commit 
[`16f109f`](https://github.com/apache/spark/commit/16f109f13a90d28c3d187f47cb2d0dcd5fc782bc).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4337#issuecomment-85192800
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29024/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85197419
  
  [Test build #29026 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29026/consoleFull)
 for   PR 4435 at commit 
[`25cd894`](https://github.com/apache/spark/commit/25cd8948a4421aa90930cb8422647c9194240bc8).
 * This patch **fails MiMa tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ApplicationInfo(`
  * `class ExecutorStageSummary(`
  * `class ExecutorSummary(`
  * `class JobData(`
  * `class RDDStorageInfo(`
  * `class RDDDataDistribution(`
  * `class RDDPartitionInfo(`
  * `class StageData(`
  * `class TaskData(`
  * `class TaskMetrics(`
  * `class InputMetrics(`
  * `class OutputMetrics(`
  * `class ShuffleReadMetrics(`
  * `class ShuffleWriteMetrics(`
  * `class AccumulableInfo (`
  * `throw new SparkException(It appears you are using SparkEnum 
in a class which does not  +`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85197443
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29026/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread brennonyork

Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5093#discussion_r26983769
  
--- Diff: dev/tests/pr_new_dependencies.sh ---
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#
+# This script follows the base format for testing pull requests against
+# another branch and returning results to be published. More details can be
+# found at dev/run-tests-jenkins.
+#
+# Arg1: The Github Pull Request Actual Commit
+#+ known as `ghprbActualCommit` in `run-tests-jenkins`
+# Arg2: The SHA1 hash
+#+ known as `sha1` in `run-tests-jenkins`
+#
+
+ghprbActualCommit=$1
+sha1=$2
+
+MVN_BIN=`pwd`/build/mvn
+CURR_CP_FILE=my-classpath.txt
+MASTER_CP_FILE=master-classpath.txt
+
+${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \
--- End diff --

Yeah, its required :/ I've tested without it and it fails at building 
`spark-networking`. This adds on, for each run (of which there are two) around 
4.5 mins, so 9mins added to the build time. I also looked at seeing what `sbt` 
could output, but couldn't find anything. Further thought about this as a 
special case test and to grab the output from the generic build of spark that 
happens for each PR, but with having to build against the `master` branch as 
well that didn't seem like a much better option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...

2015-03-23 Thread brennonyork

GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/5145

[SPARK-6477][Build]: Run MIMA tests before the Spark test suite

This moves the MIMA checks to before the full Spark test suite such that, 
if new PR's fail the MIMA check, they will return much faster having not run 
the entire test suite. This is preferable to the current scenario where a user 
would have to wait until the entire test suite completes before realizing it 
failed on a MIMA check in which case, once the MIMA issues are fixed, the user 
would have to resubmit and rerun the full test suite again.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-6477

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5145


commit 12b0aee58eaa6cd06d67bff5d778c6d4932f2209
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-23T21:56:15Z

updated to put the mima checks before the spark test suite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5144#issuecomment-85221495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29030/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/5074#issuecomment-85229756
  
@srowen I've got some more comments.  Going to be fairly nitpicky on this 
because I think it'd benefit people to be as clear as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6475][SQL] recognize array types when i...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5146#issuecomment-85231185
  
  [Test build #29035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29035/consoleFull)
 for   PR 5146 at commit 
[`4f2df5e`](https://github.com/apache/spark/commit/4f2df5e807d256fdac5b4f9a5e1605dee5a1c38c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26987263
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
+costly operation.
+
+ Background
+
+To understand what happens during the shuffle we can consider the example 
of the 
+[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all 
+values for a single key are combined into a tuple - the key and the result 
of executing a reduce 
+function against all values associated with that key. The challenge is 
that not all values for a 
+single key necessarily reside on the same partition, or even the same 
machine, but they must be 
+co-located to present a single array per key.
+
+In Spark, data is generally not distributed across partitions to be in the 
necessary place for a 
--- End diff --

These first couple sentences are a little redundant with the previous 
paragraph.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...

2015-03-23 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4986#discussion_r26976590
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
 ---
@@ -83,5 +95,82 @@ class GaussianMixtureModel(
   p(i) /= pSum
 }
 p
-  }  
+  }
+}
+
+@Experimental
+object GaussianMixtureModel extends Loader[GaussianMixtureModel] {
+
+  private object SaveLoadV1_0 {
+
+case class Data(weights: Array[Double], mus: Array[Vector], sigmas: 
Array[Matrix])
--- End diff --

As I mentioned before, let's flatten the data into rows, where each row 
corresponds to a Gaussian distribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5143#issuecomment-85215748
  
  [Test build #29027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29027/consoleFull)
 for   PR 5143 at commit 
[`a2e5e2d`](https://github.com/apache/spark/commit/a2e5e2d13e3f9c5c458593a3a8c992ae05d14845).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5085#discussion_r26985093
  
--- Diff: bin/spark-class ---
@@ -40,36 +40,24 @@ else
   fi
 fi
 
-# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
-# instead of looking for a jar file.
-SPARK_LAUNCHER_CP=
-if [ -f $SPARK_HOME/RELEASE ]; then
-  LAUNCHER_DIR=$SPARK_HOME/lib
-  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
-  if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then
-echo Failed to find Spark launcher in $LAUNCHER_DIR. 12
-echo You need to build Spark before running this program. 12
-exit 1
-  fi
-
-  LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ 
|| true)
-  if [ $num_jars -gt 1 ]; then
-echo Found multiple Spark launcher jars in $LAUNCHER_DIR: 12
-echo $LAUNCHER_JARS 12
-echo Please remove all but one jar. 12
-exit 1
-  fi
-
-  SPARK_LAUNCHER_CP=${LAUNCHER_DIR}/${LAUNCHER_JARS}
-else
-  LAUNCHER_DIR=$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION
-  if [ ! -d $LAUNCHER_DIR/classes ]; then
-echo Failed to find Spark launcher classes in $LAUNCHER_DIR. 12
-echo You need to build Spark before running this program. 12
-exit 1
-  fi
-  SPARK_LAUNCHER_CP=$LAUNCHER_DIR/classes
+# Find assembly jar
+SPARK_ASSEMBLY_JAR=
+ASSEMBLY_DIR=$SPARK_HOME/lib
--- End diff --

Where are you looking for the assembly under 
`assembly/target/scala-$SPARK_SCALA_VERSION`?

That's needed to not break dev builds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-85228821
  
  [Test build #29034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29034/consoleFull)
 for   PR 4027 at commit 
[`6d04da1`](https://github.com/apache/spark/commit/6d04da11e44d395416f208a20d250c17c672fcc9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4129#issuecomment-85229133
  
@CodingCat sorry you're right, I didn't realize CPUS_PER_TASK was 
configured to that flag. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85228796
  
  [Test build #29033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29033/consoleFull)
 for   PR 5142 at commit 
[`c6744b8`](https://github.com/apache/spark/commit/c6744b82776263889c7a5eb7664835419834d28b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26987355
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
+costly operation.
+
+ Background
+
+To understand what happens during the shuffle we can consider the example 
of the 
+[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all 
+values for a single key are combined into a tuple - the key and the result 
of executing a reduce 
+function against all values associated with that key. The challenge is 
that not all values for a 
+single key necessarily reside on the same partition, or even the same 
machine, but they must be 
+co-located to present a single array per key.
+
+In Spark, data is generally not distributed across partitions to be in the 
necessary place for a 
+specific operation. During computations, a single task will operate on a 
single partition - thus, to
+organize all the data for a single `reduceByKey` reduce task to execute, 
Spark needs to perform an 
+all-to-all operation. It must read from all partitions to find all the 
values for all keys, and then
+organize those such that all values for any key lie within the same 
partition - this is called the 
+**shuffle**.
+
+Although the set of elements in each partition of newly shuffled data will 
be deterministic, the 
+ordering of these elements is not. If one desires predictably ordered data 
following shuffle 
+operations, [`mapPartitions`](#MapPartLink) can be used to sort each 
partition or `sortBy` can be
--- End diff --

`sortBy` would repartition the data negating the original shuffle we're 
talking about, so maybe not worth mention here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...

2015-03-23 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5118#issuecomment-85180705
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85179771
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29028/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85179766
  
  [Test build #29028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29028/consoleFull)
 for   PR 5093 at commit 
[`2bb5527`](https://github.com/apache/spark/commit/2bb5527e2dc67dae1b4834eac3aaac07f3a76b32).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85179769
  
  [Test build #29028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29028/consoleFull)
 for   PR 5093 at commit 
[`2bb5527`](https://github.com/apache/spark/commit/2bb5527e2dc67dae1b4834eac3aaac07f3a76b32).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch adds no new dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5143#issuecomment-85179800
  
  [Test build #29027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29027/consoleFull)
 for   PR 5143 at commit 
[`a2e5e2d`](https://github.com/apache/spark/commit/a2e5e2d13e3f9c5c458593a3a8c992ae05d14845).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5847] Allow for namespacing metrics by ...

2015-03-23 Thread ryan-williams

Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/4632#issuecomment-85180665
  
Thanks @pwendell. I had stumbled across that 
[SPARK-3377](https://issues.apache.org/jira/browse/SPARK-3377) work as well.

I think there are solid arguments for each of these use-cases being 
supported:

* `app.id`-prefixing can be pathologically hard on Graphite's disk I/O / 
for short-running jobs.
* `app.name`-prefixing is no good if you have jobs running simultaneously.

Here are three options for supporting both (all defaulting to `app.id` but 
providing an escape hatch):

1. Only admit `id` and `name` values here, and use the value from the 
appropriate config key. The main downside is that we would essentially 
introduce two new, made-up magic strings to do this; id and name? 
app.id and app.name? At that point, we're basically atâ¦
2. Allow usage of any existing conf value as the metrics prefix, which is 
what this PR currently does.
3. Default to `app.id` but allow the user to specify a string that is used 
as the metrics' prefix (as opposed to a string that keys into `SparkConfig`), 
e.g. `--conf spark.metrics.prefix=my-app-name`;
* this could be a `--conf` param or happen in the `MetricsConfig`'s 
file.

I feel like doing this via the `MetricsConfig`'s `spark.metrics.conf` file 
makes more sense than adding another `--conf` param, but I could be persuaded 
otherwise.

 It seems a bit weird to hard code handling of this particular 
configuration in the MetricsConfig class.

This bit I disagree with; plenty of config params are {read by, relevant 
to} just one class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85182245
  
  [Test build #29029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29029/consoleFull)
 for   PR 5142 at commit 
[`170d6f9`](https://github.com/apache/spark/commit/170d6f971f29049715cb4aff919ac4e6d7855020).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5143#issuecomment-85186045
  
Seems fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-][MESOS] Add cluster mode support for M...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/5144#issuecomment-85219844
  
@andrewor14 Let me know what you think!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...

2015-03-23 Thread vlyubin

Github user vlyubin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4859#discussion_r26986539
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala ---
@@ -115,18 +116,21 @@ private[sql] class DefaultSource extends 
RelationProvider {
 numPartitions.toInt)
 }
 val parts = JDBCRelation.columnPartition(partitionInfo)
-JDBCRelation(url, table, parts)(sqlContext)
+val properties = new Properties() // Additional properties that we 
will pass to getConnection
+parameters.foreach(kv = properties.setProperty(kv._1, kv._2))
+JDBCRelation(url, table, parts, properties)(sqlContext)
   }
 }
 
 private[sql] case class JDBCRelation(
 url: String,
 table: String,
-parts: Array[Partition])(@transient val sqlContext: SQLContext)
+parts: Array[Partition],
+properties: Properties = null)(@transient val sqlContext: SQLContext)
--- End diff --

No particular reason really, both are fine with DriverManagers' 
getConnection(). I've switched to empty properties map, I guess it is in fact 
neater than null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26986920
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
--- End diff --

re-arranging and copying are redundant.

Also, be consistent on shuffle vs. the shuffle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85220210
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29029/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85220152
  
  [Test build #29029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29029/consoleFull)
 for   PR 5142 at commit 
[`170d6f9`](https://github.com/apache/spark/commit/170d6f971f29049715cb4aff919ac4e6d7855020).
 * This patch **fails MiMa tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6287][MESOS] Add dynamic allocation to ...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4984#issuecomment-85228355
  
@pwendell @andrewor14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4859#issuecomment-85228515
  
  [Test build #29032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29032/consoleFull)
 for   PR 4859 at commit 
[`7a8cfda`](https://github.com/apache/spark/commit/7a8cfdaa897e2a9a312f500c530c97a3fa27a5be).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85228145
  
@hellertime are you able to figure out the RAT problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26987121
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
+costly operation.
+
+ Background
+
+To understand what happens during the shuffle we can consider the example 
of the 
+[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all 
+values for a single key are combined into a tuple - the key and the result 
of executing a reduce 
+function against all values associated with that key. The challenge is 
that not all values for a 
+single key necessarily reside on the same partition, or even the same 
machine, but they must be 
+co-located to present a single array per key.
--- End diff --

Not sure what array means here.  Maybe replace with just co-located to 
compute the result value. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6475][SQL] recognize array types when i...

2015-03-23 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/5146

[SPARK-6475][SQL] recognize array types when infer data types from JavaBeans

Right now if there is a array field in a JavaBean, the user wold see an 
exception in `createDataFrame`. @liancheng

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-6475

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5146


commit 4f2df5e807d256fdac5b4f9a5e1605dee5a1c38c
Author: Xiangrui Meng m...@databricks.com
Date:   2015-03-23T22:23:58Z

recognize array types when infer data types from JavaBeans




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85103963
  
  [Test build #29003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29003/consoleFull)
 for   PR 4435 at commit 
[`a066055`](https://github.com/apache/spark/commit/a066055441f370598bdef7868ff3bd51b4f0136d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26957574
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.net.URI
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Logging, SparkException, SecurityManager, 
SparkConf}
+import org.apache.spark.util.{AkkaUtils, Utils}
+
+/**
+ * An RPC environment. [[RpcEndpoint]]s need to register itself with a 
name to [[RpcEnv]] to
+ * receives messages. Then [[RpcEnv]] will process messages sent from 
[[RpcEndpointRef]] or remote
+ * nodes, and deliver them to corresponding [[RpcEndpoint]]s.
+ *
+ * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s 
given name or uri.
+ */
+private[spark] abstract class RpcEnv(conf: SparkConf) {
+
+  private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf)
+
+  /**
+   * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used 
to implement
+   * [[RpcEndpoint.self]].
+   */
+  private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Return the address that [[RpcEnv]] is listening to.
+   */
+  def address: RpcAddress
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] does not
+   * guarantee thread-safety.
+   */
+  def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] should
+   * make sure thread-safely sending messages to [[RpcEndpoint]].
+   *
+   * Thread-safety means processing of one message happens before 
processing of the next message by
+   * the same [[RpcEndpoint]]. In the other words, changes to internal 
fields of a [[RpcEndpoint]]
+   * are visible when processing the next message, and fields in the 
[[RpcEndpoint]] need not be
+   * volatile or equivalent.
+   *
+   * However, there is no guarantee that the same thread will be executing 
the same [[RpcEndpoint]]
+   * for different messages.
+   */
+  def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously.
+   */
+  def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef]
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a 
blocking action.
+   */
+  def setupEndpointRefByUrl(url: String): RpcEndpointRef = {
+Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout)
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`
+   * asynchronously.
+   */
+  def asyncSetupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
Future[RpcEndpointRef] = {
+asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`.
+   * This is a blocking action.
+   */
+  def setupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
RpcEndpointRef = {
+setupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Stop [[RpcEndpoint]] specified by `endpoint`.
+   */
+  def stop(endpoint: RpcEndpointRef): Unit
+
+  /**
+   * Shutdown this [[RpcEnv]] asynchronously. If need to make sure 
[[RpcEnv]] exits successfully,
+   * call [[awaitTermination()]] straight after [[shutdown()]].
+   */
+  def shutdown(): Unit
+

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26958295
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.net.URI
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Logging, SparkException, SecurityManager, 
SparkConf}
+import org.apache.spark.util.{AkkaUtils, Utils}
+
+/**
+ * An RPC environment. [[RpcEndpoint]]s need to register itself with a 
name to [[RpcEnv]] to
+ * receives messages. Then [[RpcEnv]] will process messages sent from 
[[RpcEndpointRef]] or remote
+ * nodes, and deliver them to corresponding [[RpcEndpoint]]s.
+ *
+ * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s 
given name or uri.
+ */
+private[spark] abstract class RpcEnv(conf: SparkConf) {
+
+  private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf)
+
+  /**
+   * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used 
to implement
+   * [[RpcEndpoint.self]].
+   */
+  private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Return the address that [[RpcEnv]] is listening to.
+   */
+  def address: RpcAddress
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] does not
+   * guarantee thread-safety.
+   */
+  def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] should
+   * make sure thread-safely sending messages to [[RpcEndpoint]].
+   *
+   * Thread-safety means processing of one message happens before 
processing of the next message by
+   * the same [[RpcEndpoint]]. In the other words, changes to internal 
fields of a [[RpcEndpoint]]
+   * are visible when processing the next message, and fields in the 
[[RpcEndpoint]] need not be
+   * volatile or equivalent.
+   *
+   * However, there is no guarantee that the same thread will be executing 
the same [[RpcEndpoint]]
+   * for different messages.
+   */
+  def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously.
+   */
+  def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef]
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a 
blocking action.
+   */
+  def setupEndpointRefByUrl(url: String): RpcEndpointRef = {
+Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout)
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`
+   * asynchronously.
+   */
+  def asyncSetupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
Future[RpcEndpointRef] = {
+asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`.
+   * This is a blocking action.
+   */
+  def setupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
RpcEndpointRef = {
+setupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Stop [[RpcEndpoint]] specified by `endpoint`.
+   */
+  def stop(endpoint: RpcEndpointRef): Unit
+
+  /**
+   * Shutdown this [[RpcEnv]] asynchronously. If need to make sure 
[[RpcEnv]] exits successfully,
+   * call [[awaitTermination()]] straight after [[shutdown()]].
+   */
+  def shutdown(): Unit
+

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26958330
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala ---
@@ -0,0 +1,412 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.net.URI
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Logging, SparkException, SecurityManager, 
SparkConf}
+import org.apache.spark.util.{AkkaUtils, Utils}
+
+/**
+ * An RPC environment. [[RpcEndpoint]]s need to register itself with a 
name to [[RpcEnv]] to
+ * receives messages. Then [[RpcEnv]] will process messages sent from 
[[RpcEndpointRef]] or remote
+ * nodes, and deliver them to corresponding [[RpcEndpoint]]s.
+ *
+ * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s 
given name or uri.
+ */
+private[spark] abstract class RpcEnv(conf: SparkConf) {
+
+  private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf)
+
+  /**
+   * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used 
to implement
+   * [[RpcEndpoint.self]].
+   */
+  private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Return the address that [[RpcEnv]] is listening to.
+   */
+  def address: RpcAddress
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] does not
+   * guarantee thread-safety.
+   */
+  def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef
+
+  /**
+   * Register a [[RpcEndpoint]] with a name and return its 
[[RpcEndpointRef]]. [[RpcEnv]] should
+   * make sure thread-safely sending messages to [[RpcEndpoint]].
+   *
+   * Thread-safety means processing of one message happens before 
processing of the next message by
+   * the same [[RpcEndpoint]]. In the other words, changes to internal 
fields of a [[RpcEndpoint]]
+   * are visible when processing the next message, and fields in the 
[[RpcEndpoint]] need not be
+   * volatile or equivalent.
+   *
+   * However, there is no guarantee that the same thread will be executing 
the same [[RpcEndpoint]]
+   * for different messages.
+   */
+  def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously.
+   */
+  def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef]
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a 
blocking action.
+   */
+  def setupEndpointRefByUrl(url: String): RpcEndpointRef = {
+Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout)
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`
+   * asynchronously.
+   */
+  def asyncSetupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
Future[RpcEndpointRef] = {
+asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] represented by `systemName`, 
`address` and `endpointName`.
+   * This is a blocking action.
+   */
+  def setupEndpointRef(
+  systemName: String, address: RpcAddress, endpointName: String): 
RpcEndpointRef = {
+setupEndpointRefByUrl(uriOf(systemName, address, endpointName))
+  }
+
+  /**
+   * Stop [[RpcEndpoint]] specified by `endpoint`.
+   */
+  def stop(endpoint: RpcEndpointRef): Unit
+
+  /**
+   * Shutdown this [[RpcEnv]] asynchronously. If need to make sure 
[[RpcEnv]] exits successfully,
+   * call [[awaitTermination()]] straight after [[shutdown()]].
+   */
+  def shutdown(): Unit
+

[GitHub] spark pull request: Update the command to use IPython notebook

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5111#issuecomment-85116122
  
But does this then work with ipython 2? I wouldn't want to necessarily 
'break' support, even if it's just in an example. Or are two examples called 
for? Ideally, one example is good, even if it's deprecated in new ipython 
versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread brennonyork

Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/5093#discussion_r26959105
  
--- Diff: dev/tests/pr_new_dependencies.sh ---
@@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#
+# This script follows the base format for testing pull requests against
+# another branch and returning results to be published. More details can be
+# found at dev/run-tests-jenkins.
+#
+# Arg1: The Github Pull Request Actual Commit
+#+ known as `ghprbActualCommit` in `run-tests-jenkins`
+# Arg2: The SHA1 hash
+#+ known as `sha1` in `run-tests-jenkins`
+#
+
+ghprbActualCommit=$1
+sha1=$2
+
+CURR_CP_FILE=my-classpath.txt
+MASTER_CP_FILE=master-classpath.txt
+
+./build/mvn clean compile dependency:build-classpath | \
+  sed -n -e '/Building Spark Project Assembly/,$p' | \
+  grep --context=1 -m 2 Dependencies classpath: | \
+  head -n 3 | \
+  tail -n 1 | \
+  tr : \n | \
+  rev | \
+  cut -d / -f 1 | \
+  rev | \
+  sort  ${CURR_CP_FILE}
+
+# Checkout the master branch to compare against
+git checkout apache/master
+
+./build/mvn clean compile dependency:build-classpath | \
+  sed -n -e '/Building Spark Project Assembly/,$p' | \
+  grep --context=1 -m 2 Dependencies classpath: | \
+  head -n 3 | \
+  tail -n 1 | \
+  tr : \n | \
+  rev | \
+  cut -d / -f 1 | \
+  rev | \
+  sort  ${MASTER_CP_FILE}
+
+DIFF_RESULTS=`diff my-classpath.txt master-classpath.txt`
+
+if [ -z ${DIFF_RESULTS} ]; then
+  echo  * This patch adds no new dependencies.
+else
+  # Pretty print the new dependencies
+  new_deps=$(echo ${DIFF_RESULTS} | grep  | cut -d  -f2 | awk '{print 
   * $1}')
+  echo  * This patch **adds the following new 
dependencies:**\n${new_deps}
--- End diff --

Was thinking the same thing actually. I'll make sure to include that before 
this WIP is completed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5139#issuecomment-85116031
  
  [Test build #29004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29004/consoleFull)
 for   PR 5139 at commit 
[`dfdf3ef`](https://github.com/apache/spark/commit/dfdf3efff1d83f5644469b87d10044ac8329fed3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85118421
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5140#issuecomment-85118446
  
  [Test build #29005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29005/consoleFull)
 for   PR 5140 at commit 
[`d739640`](https://github.com/apache/spark/commit/d739640308ca0884bf5cd678dbedf3cc85c3cec9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85118397
  
  [Test build #29006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29006/consoleFull)
 for   PR 5093 at commit 
[`291a8fe`](https://github.com/apache/spark/commit/291a8fea27d1aadf7db28936ef56762e5d74eb7b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85118409
  
  [Test build #29006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29006/consoleFull)
 for   PR 5093 at commit 
[`291a8fe`](https://github.com/apache/spark/commit/291a8fea27d1aadf7db28936ef56762e5d74eb7b).
 * This patch **passes all tests**.


 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch adds no new dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26959685
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala ---
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc.akka
+
+import java.net.URI
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+import scala.util.control.NonFatal
+
+import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, 
Props, Address}
+import akka.pattern.{ask = akkaAsk}
+import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, 
AssociationErrorEvent}
+import org.apache.spark.{SparkException, Logging, SparkConf}
+import org.apache.spark.rpc._
+import org.apache.spark.util.{ActorLogReceive, AkkaUtils}
+
+/**
+ * A RpcEnv implementation based on Akka.
+ *
+ * TODO Once we remove all usages of Akka in other place, we can move this 
file to a new project and
+ * remove Akka from the dependencies.
+ *
+ * @param actorSystem
+ * @param conf
+ * @param boundPort
+ */
+private[spark] class AkkaRpcEnv private[akka] (
+val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int)
+  extends RpcEnv(conf) with Logging {
+
+  private val defaultAddress: RpcAddress = {
+val address = 
actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress
+// In some test case, ActorSystem doesn't bind to any address.
+// So just use some default value since they are only some unit tests
+RpcAddress(address.host.getOrElse(localhost), 
address.port.getOrElse(boundPort))
+  }
+
+  override val address: RpcAddress = defaultAddress
+
+  /**
+   * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. 
We need it to make
+   * [[RpcEndpoint.self]] work.
+   */
+  private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, 
RpcEndpointRef]()
+
+  /**
+   * Need this map to remove `RpcEndpoint` from `endpointToRef` via a 
`RpcEndpointRef`
+   */
+  private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, 
RpcEndpoint]()
+
+  private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: 
RpcEndpointRef): Unit = {
+endpointToRef.put(endpoint, endpointRef)
+refToEndpoint.put(endpointRef, endpoint)
+  }
+
+  private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = {
+val endpoint = refToEndpoint.remove(endpointRef)
+if (endpoint != null) {
+  endpointToRef.remove(endpoint)
+}
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] of `endpoint`.
+   */
+  override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = {
+val endpointRef = endpointToRef.get(endpoint)
+require(endpointRef != null, sCannot find RpcEndpointRef of 
${endpoint} in ${this})
+endpointRef
+  }
+
+  override def setupEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef = {
+setupThreadSafeEndpoint(name, endpoint)
+  }
+
+  override def setupThreadSafeEndpoint(name: String, endpoint: 
RpcEndpoint): RpcEndpointRef = {
+@volatile var endpointRef: AkkaRpcEndpointRef = null
+// Use lazy because the Actor needs to use `endpointRef`.
+// So `actorRef` should be created after assigning `endpointRef`.
+lazy val actorRef = actorSystem.actorOf(Props(new Actor with 
ActorLogReceive with Logging {
+
+  require(endpointRef != null)
+  registerEndpoint(endpoint, endpointRef)
+
+  override def preStart(): Unit = {
+// Listen for remote client network events
+context.system.eventStream.subscribe(self, 
classOf[AssociationEvent])
+safelyCall(endpoint) {
+  endpoint.onStart()
+}
+

[GitHub] spark pull request: [SPARK-6256] [MLlib] MLlib Python API parity c...

2015-03-23 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4997#discussion_r26953234
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -111,9 +111,11 @@ private[python] class PythonMLLibAPI extends 
Serializable {
   initialWeights: Vector,
   regParam: Double,
   regType: String,
-  intercept: Boolean): JList[Object] = {
+  intercept: Boolean,
--- End diff --

Should this be addIntercept to match the Scala named argument?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...

2015-03-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/5014#discussion_r26953213
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -557,7 +557,6 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
 TOK_TABLEPROPERTIES),
   children)
   val (db, tableName) = extractDbNameTableName(tableNameParts)
-
   CreateTableAsSelect(db, tableName, nodeToPlan(query), allowExisting 
!= None, Some(node))
--- End diff --

Currently, it is. If we are sure that `CreateTableAsSelect` is only used by 
Hive dialect, we can remove the `Option`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85098523
  
  [Test build #29000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29000/consoleFull)
 for   PR 4435 at commit 
[`0be5120`](https://github.com/apache/spark/commit/0be51209b88364fb3df2d65cf7ae2c1456c58629).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26958734
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala ---
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc.akka
+
+import java.net.URI
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+import scala.util.control.NonFatal
+
+import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, 
Props, Address}
+import akka.pattern.{ask = akkaAsk}
+import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, 
AssociationErrorEvent}
+import org.apache.spark.{SparkException, Logging, SparkConf}
+import org.apache.spark.rpc._
+import org.apache.spark.util.{ActorLogReceive, AkkaUtils}
+
+/**
+ * A RpcEnv implementation based on Akka.
+ *
+ * TODO Once we remove all usages of Akka in other place, we can move this 
file to a new project and
+ * remove Akka from the dependencies.
+ *
+ * @param actorSystem
+ * @param conf
+ * @param boundPort
+ */
+private[spark] class AkkaRpcEnv private[akka] (
+val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int)
+  extends RpcEnv(conf) with Logging {
+
+  private val defaultAddress: RpcAddress = {
+val address = 
actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress
+// In some test case, ActorSystem doesn't bind to any address.
+// So just use some default value since they are only some unit tests
+RpcAddress(address.host.getOrElse(localhost), 
address.port.getOrElse(boundPort))
+  }
+
+  override val address: RpcAddress = defaultAddress
+
+  /**
+   * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. 
We need it to make
+   * [[RpcEndpoint.self]] work.
+   */
+  private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, 
RpcEndpointRef]()
+
+  /**
+   * Need this map to remove `RpcEndpoint` from `endpointToRef` via a 
`RpcEndpointRef`
+   */
+  private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, 
RpcEndpoint]()
+
+  private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: 
RpcEndpointRef): Unit = {
+endpointToRef.put(endpoint, endpointRef)
+refToEndpoint.put(endpointRef, endpoint)
+  }
+
+  private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = {
+val endpoint = refToEndpoint.remove(endpointRef)
+if (endpoint != null) {
+  endpointToRef.remove(endpoint)
+}
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] of `endpoint`.
+   */
+  override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = {
+val endpointRef = endpointToRef.get(endpoint)
+require(endpointRef != null, sCannot find RpcEndpointRef of 
${endpoint} in ${this})
+endpointRef
+  }
+
+  override def setupEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef = {
+setupThreadSafeEndpoint(name, endpoint)
+  }
+
+  override def setupThreadSafeEndpoint(name: String, endpoint: 
RpcEndpoint): RpcEndpointRef = {
+@volatile var endpointRef: AkkaRpcEndpointRef = null
+// Use lazy because the Actor needs to use `endpointRef`.
+// So `actorRef` should be created after assigning `endpointRef`.
+lazy val actorRef = actorSystem.actorOf(Props(new Actor with 
ActorLogReceive with Logging {
+
+  require(endpointRef != null)
+  registerEndpoint(endpoint, endpointRef)
+
+  override def preStart(): Unit = {
+// Listen for remote client network events
+context.system.eventStream.subscribe(self, 
classOf[AssociationEvent])
+safelyCall(endpoint) {
+  endpoint.onStart()
+}
+

[GitHub] spark pull request: [SPARK-3533][Core][PySpark] Add saveAsTextFile...

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4895#issuecomment-85115634
  
My entirely personal opinion is I'm neutral on whether this is worth more 
API methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26959384
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala ---
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc.akka
+
+import java.net.URI
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+import scala.util.control.NonFatal
+
+import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, 
Props, Address}
+import akka.pattern.{ask = akkaAsk}
+import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, 
AssociationErrorEvent}
+import org.apache.spark.{SparkException, Logging, SparkConf}
+import org.apache.spark.rpc._
+import org.apache.spark.util.{ActorLogReceive, AkkaUtils}
+
+/**
+ * A RpcEnv implementation based on Akka.
+ *
+ * TODO Once we remove all usages of Akka in other place, we can move this 
file to a new project and
+ * remove Akka from the dependencies.
+ *
+ * @param actorSystem
+ * @param conf
+ * @param boundPort
+ */
+private[spark] class AkkaRpcEnv private[akka] (
+val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int)
+  extends RpcEnv(conf) with Logging {
+
+  private val defaultAddress: RpcAddress = {
+val address = 
actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress
+// In some test case, ActorSystem doesn't bind to any address.
+// So just use some default value since they are only some unit tests
+RpcAddress(address.host.getOrElse(localhost), 
address.port.getOrElse(boundPort))
+  }
+
+  override val address: RpcAddress = defaultAddress
+
+  /**
+   * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. 
We need it to make
+   * [[RpcEndpoint.self]] work.
+   */
+  private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, 
RpcEndpointRef]()
+
+  /**
+   * Need this map to remove `RpcEndpoint` from `endpointToRef` via a 
`RpcEndpointRef`
+   */
+  private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, 
RpcEndpoint]()
+
+  private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: 
RpcEndpointRef): Unit = {
+endpointToRef.put(endpoint, endpointRef)
+refToEndpoint.put(endpointRef, endpoint)
+  }
+
+  private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = {
+val endpoint = refToEndpoint.remove(endpointRef)
+if (endpoint != null) {
+  endpointToRef.remove(endpoint)
+}
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] of `endpoint`.
+   */
+  override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = {
+val endpointRef = endpointToRef.get(endpoint)
+require(endpointRef != null, sCannot find RpcEndpointRef of 
${endpoint} in ${this})
+endpointRef
+  }
+
+  override def setupEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef = {
+setupThreadSafeEndpoint(name, endpoint)
+  }
+
+  override def setupThreadSafeEndpoint(name: String, endpoint: 
RpcEndpoint): RpcEndpointRef = {
+@volatile var endpointRef: AkkaRpcEndpointRef = null
+// Use lazy because the Actor needs to use `endpointRef`.
+// So `actorRef` should be created after assigning `endpointRef`.
+lazy val actorRef = actorSystem.actorOf(Props(new Actor with 
ActorLogReceive with Logging {
+
+  require(endpointRef != null)
+  registerEndpoint(endpoint, endpointRef)
+
+  override def preStart(): Unit = {
+// Listen for remote client network events
+context.system.eventStream.subscribe(self, 
classOf[AssociationEvent])
+safelyCall(endpoint) {
+  endpoint.onStart()
+}
+

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/5140#issuecomment-85117276
  
I'm not sure how mesos and yarn clusters are started/stopped (nor do I have 
such clusters on which to test), so I'm not sure how this will affect them. I 
think the way I did this should be safe - it's mostly just moving code around - 
but I could use a knowledgeable set of eyes to be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6468][Block Manager] Fix the race condi...

2015-03-23 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5136#discussion_r26959388
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -91,7 +90,12 @@ private[spark] class DiskBlockManager(blockManager: 
BlockManager, conf: SparkCon
   /** List all the files currently stored on disk by the disk manager. */
   def getAllFiles(): Seq[File] = {
 // Get all the files inside the array of array of directories
-subDirs.flatten.filter(_ != null).flatMap { dir =
+subDirs.flatMap { dir =
--- End diff --

How can you see a file that hasn't been created? it's assigned to the array 
after `mkdir()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...

2015-03-23 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4986#issuecomment-85086722
  
We want to allow the model data to be extended (with defaults to allow 
backwards compatibility).  There might be unforeseeable reasons to change the 
format, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/5063#issuecomment-85096699
  
@sryza When creating a Mesos Task, one usually define the resources 
required for the execution of the task and the resources required to run the 
Mesos executor. Again the executor role is initiate executing the task and 
report task statuses, but can do anything else if it's a custom executor 
provided by the user. (You can skip defining executor where Mesos provides a 
default one and also add a default resource padding for the default one).

In Spark fine-grain mode we do have a custom executor in 
org.apache.spark.executor.MesosExecutorBackend, and cores assigned is just for 
running this executor alone which is running one per slave per app (it can run 
mulitple Spark tasks).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/5063#issuecomment-85096806
  
Fractional is definitely supported, since it's just cpu shares in the end. 
We should make it a double


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...

2015-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5118#issuecomment-85101090
  
  [Test build #29001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29001/consoleFull)
 for   PR 5118 at commit 
[`6c8ffab`](https://github.com/apache/spark/commit/6c8ffab396d76e329100c9c33a609f1b993e1abb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld

Github user nkronenfeld closed the pull request at:

https://github.com/apache/spark/pull/3699


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld

Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/3699#issuecomment-85112967
  
I'm redoing this in the latest code, remaking the PR from scratch, to 
alleviate merge issues. I'll post the new PR here when it's made.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

2015-03-23 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26958715
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala ---
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc.akka
+
+import java.net.URI
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.concurrent.{Await, Future}
+import scala.concurrent.duration._
+import scala.language.postfixOps
+import scala.reflect.ClassTag
+import scala.util.control.NonFatal
+
+import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, 
Props, Address}
+import akka.pattern.{ask = akkaAsk}
+import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, 
AssociationErrorEvent}
+import org.apache.spark.{SparkException, Logging, SparkConf}
+import org.apache.spark.rpc._
+import org.apache.spark.util.{ActorLogReceive, AkkaUtils}
+
+/**
+ * A RpcEnv implementation based on Akka.
+ *
+ * TODO Once we remove all usages of Akka in other place, we can move this 
file to a new project and
+ * remove Akka from the dependencies.
+ *
+ * @param actorSystem
+ * @param conf
+ * @param boundPort
+ */
+private[spark] class AkkaRpcEnv private[akka] (
+val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int)
+  extends RpcEnv(conf) with Logging {
+
+  private val defaultAddress: RpcAddress = {
+val address = 
actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress
+// In some test case, ActorSystem doesn't bind to any address.
+// So just use some default value since they are only some unit tests
+RpcAddress(address.host.getOrElse(localhost), 
address.port.getOrElse(boundPort))
+  }
+
+  override val address: RpcAddress = defaultAddress
+
+  /**
+   * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. 
We need it to make
+   * [[RpcEndpoint.self]] work.
+   */
+  private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, 
RpcEndpointRef]()
+
+  /**
+   * Need this map to remove `RpcEndpoint` from `endpointToRef` via a 
`RpcEndpointRef`
+   */
+  private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, 
RpcEndpoint]()
+
+  private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: 
RpcEndpointRef): Unit = {
+endpointToRef.put(endpoint, endpointRef)
+refToEndpoint.put(endpointRef, endpoint)
+  }
+
+  private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = {
+val endpoint = refToEndpoint.remove(endpointRef)
+if (endpoint != null) {
+  endpointToRef.remove(endpoint)
+}
+  }
+
+  /**
+   * Retrieve the [[RpcEndpointRef]] of `endpoint`.
+   */
+  override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = {
+val endpointRef = endpointToRef.get(endpoint)
+require(endpointRef != null, sCannot find RpcEndpointRef of 
${endpoint} in ${this})
+endpointRef
+  }
+
+  override def setupEndpoint(name: String, endpoint: RpcEndpoint): 
RpcEndpointRef = {
+setupThreadSafeEndpoint(name, endpoint)
+  }
+
+  override def setupThreadSafeEndpoint(name: String, endpoint: 
RpcEndpoint): RpcEndpointRef = {
+@volatile var endpointRef: AkkaRpcEndpointRef = null
+// Use lazy because the Actor needs to use `endpointRef`.
+// So `actorRef` should be created after assigning `endpointRef`.
+lazy val actorRef = actorSystem.actorOf(Props(new Actor with 
ActorLogReceive with Logging {
+
+  require(endpointRef != null)
+  registerEndpoint(endpoint, endpointRef)
+
+  override def preStart(): Unit = {
+// Listen for remote client network events
+context.system.eventStream.subscribe(self, 
classOf[AssociationEvent])
+safelyCall(endpoint) {
+  endpoint.onStart()
+}
+

[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...

2015-03-23 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/5139

[SPARK-6369] [SQL] [WIP] Uses commit coordinator to help committing Hive 
and Parquet tables

This PR leverages the output commit coordinator introduced in #4066 to help 
committing Hive and Parquet tables.

This PR extracts output commit code in `SparkHadoopWriter.commit` to 
`SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and 
Hive tables on executor side.

TODO

- [ ] Add tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-6369

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5139


commit dfdf3efff1d83f5644469b87d10044ac8329fed3
Author: Cheng Lian l...@databricks.com
Date:   2015-03-23T17:21:35Z

Uses commit coordinator to help committing Hive and Parquet tables




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld

GitHub user nkronenfeld opened a pull request:

https://github.com/apache/spark/pull/5140

[Spark-4848] Stand-alone cluster: Allow differences between workers with 
multiple instances

This refixes #3699 with the latest code.
This fixes SPARK-4848

I've changed the stand-alone cluster scripts to allow different workers to 
have different numbers of instances, with both port and web-ui port following 
allong appropriately.

I did this by moving the loop over instances from start-slaves and 
stop-slaves (on the master) to start-slave and stop-slave (on the worker).

Wile I was at it, I changed SPARK_WORKER_PORT to work the same way as 
SPARK_WORKER_WEBUI_PORT, since the new methods work fine for both.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkronenfeld/spark-1 feature/spark-4848

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5140


commit d739640308ca0884bf5cd678dbedf3cc85c3cec9
Author: Nathan Kronenfeld nkronenf...@oculusinfo.com
Date:   2015-03-23T17:28:44Z

Move looping through instances from the master to the workers, so that each 
worker respects its own number of instances and web-ui port




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 514 matches

Mail list logo