[GitHub] spark pull request: [SPARK-5011][SQL] Add support for WITH SERDEPR...

2014-12-31 Thread tianyi
Github user tianyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/3847#discussion_r22376322
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -70,13 +73,33 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
* CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
* OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
+   * OR,
+   * For other external datasources not only a kind of file like:avro, 
parquet, json, but a cluster database, like: cassandra an hbase etc...
+   * DDL like this:
+   * CREATE TEMPORARY TABLE cassandraTable
+   * USING org.apache.spark.sql.cassandra
+   * WITH SERDEPROP(serialization.format=1, 
cassandra.columns.mapping=key,data)
+   * TBLPROPERTIES(cassandra.keyspace.name = cassandra_keyspace)
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~ (OPTIONS 
~ options) ^^ {
-  case tableName ~ provider ~ opts =
-CreateTableUsing(tableName, provider, opts)
+CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~
+  (OPTIONS ~ options).? ~ (WITH ~ SERDEPROP ~ properties).? ~
+  (TBLPROP ~ properties).? ^^ {
+  case tableName ~ provider ~ opts ~ serdeprop ~ tblprop =
+val optionParams = opts.getOrElse(Map[String,String]())
+val serdeParams = serdeprop.getOrElse(Map[String,String]())
+val tblParams = tblprop.getOrElse(Map[String,String]())
+//TODO: in order to not break current interface, simple union 
them, if interface changes, also change this
--- End diff --

exceeds limitation of line length


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5011][SQL] Add support for WITH SERDEPR...

2014-12-31 Thread tianyi
Github user tianyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/3847#discussion_r22376331
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -70,13 +73,33 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
* CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
* OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
+   * OR,
+   * For other external datasources not only a kind of file like:avro, 
parquet, json, but a cluster database, like: cassandra an hbase etc...
+   * DDL like this:
+   * CREATE TEMPORARY TABLE cassandraTable
+   * USING org.apache.spark.sql.cassandra
+   * WITH SERDEPROP(serialization.format=1, 
cassandra.columns.mapping=key,data)
+   * TBLPROPERTIES(cassandra.keyspace.name = cassandra_keyspace)
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~ (OPTIONS 
~ options) ^^ {
-  case tableName ~ provider ~ opts =
-CreateTableUsing(tableName, provider, opts)
+CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~
+  (OPTIONS ~ options).? ~ (WITH ~ SERDEPROP ~ properties).? ~
+  (TBLPROP ~ properties).? ^^ {
+  case tableName ~ provider ~ opts ~ serdeprop ~ tblprop =
+val optionParams = opts.getOrElse(Map[String,String]())
+val serdeParams = serdeprop.getOrElse(Map[String,String]())
+val tblParams = tblprop.getOrElse(Map[String,String]())
+//TODO: in order to not break current interface, simple union 
them, if interface changes, also change this
--- End diff --

and need a space after //


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5011][SQL] Add support for WITH SERDEPR...

2014-12-31 Thread tianyi
Github user tianyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/3847#discussion_r22376333
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -70,13 +73,33 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
* CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
* OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
+   * OR,
+   * For other external datasources not only a kind of file like:avro, 
parquet, json, but a cluster database, like: cassandra an hbase etc...
+   * DDL like this:
+   * CREATE TEMPORARY TABLE cassandraTable
+   * USING org.apache.spark.sql.cassandra
+   * WITH SERDEPROP(serialization.format=1, 
cassandra.columns.mapping=key,data)
+   * TBLPROPERTIES(cassandra.keyspace.name = cassandra_keyspace)
*/
   protected lazy val createTable: Parser[LogicalPlan] =
-CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~ (OPTIONS 
~ options) ^^ {
-  case tableName ~ provider ~ opts =
-CreateTableUsing(tableName, provider, opts)
+CREATE ~ TEMPORARY ~ TABLE ~ ident ~ (USING ~ className) ~
+  (OPTIONS ~ options).? ~ (WITH ~ SERDEPROP ~ properties).? ~
+  (TBLPROP ~ properties).? ^^ {
+  case tableName ~ provider ~ opts ~ serdeprop ~ tblprop =
+val optionParams = opts.getOrElse(Map[String,String]())
+val serdeParams = serdeprop.getOrElse(Map[String,String]())
+val tblParams = tblprop.getOrElse(Map[String,String]())
+//TODO: in order to not break current interface, simple union 
them, if interface changes, also change this
+val passedParams = optionParams ++ serdeParams ++ tblParams
+passedParams.foreach(println)
+CreateTableUsing(tableName, provider, passedParams)
 }
 
+  protected lazy val properties: Parser[Map[String,String]] =
+( ~ repsep(equalStrKVPair, ,) ~ ) ^^ { case s: Seq[(String, 
String)] = s.toMap }
+
+  protected lazy val equalStrKVPair: Parser[(String, String)] = stringLit 
~ = ~ stringLit ^^ { case k ~ = ~ v = (k,v) }
--- End diff --

exceeds limitation of line length


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68428985
  
  [Test build #24948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24948/consoleFull)
 for   PR 3844 at commit 
[`b1ac4ad`](https://github.com/apache/spark/commit/b1ac4ad62ff6d537f669699d5da49bc4ee1ab154).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MQTTStreamSuite extends FunSuite with Eventually with 
BeforeAndAfter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5011][SQL] Add support for WITH SERDEPR...

2014-12-31 Thread tianyi
Github user tianyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/3847#discussion_r22376350
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -70,13 +73,33 @@ private[sql] class DDLParser extends 
StandardTokenParsers with PackratParsers wi
* CREATE TEMPORARY TABLE avroTable
* USING org.apache.spark.sql.avro
* OPTIONS (path ../hive/src/test/resources/data/files/episodes.avro)
+   * OR,
+   * For other external datasources not only a kind of file like:avro, 
parquet, json, but a cluster database, like: cassandra an hbase etc...
--- End diff --

exceeds limitation of line length


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68428989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24948/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4963 [SQL] Add copy to SQL's Sample oper...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3827#issuecomment-68429063
  
  [Test build #24947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24947/consoleFull)
 for   PR 3827 at commit 
[`65c4e7c`](https://github.com/apache/spark/commit/65c4e7cdb906412abc154bb25cdfd49b6d53e9f9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4963 [SQL] Add copy to SQL's Sample oper...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3827#issuecomment-68429064
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24947/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2757 [BUILD] [STREAMING] Add Mima test f...

2014-12-31 Thread harishreedharan
Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/3842#issuecomment-68429122
  
I believe we need to track only the Avro classes and the SparkSink class
(not sure if we need binary compat for this either - I think API compat is
all we need for even the SparkSink class). Other than that we should be
fine since the jar that contains the sink itself has the other classes, so
binary compat isn't an issue.

On Tuesday, December 30, 2014, Sean Owen notificati...@github.com wrote:

 @harishreedharan https://github.com/harishreedharan might know better;
 it was his suggestion to track this with MiMa. I suppose it can be 
enabled,
 to err on the side of tracking these things, and exclude/disable later 
when
 needed?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3842#issuecomment-68426102.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68429312
  
  [Test build #24949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24949/consoleFull)
 for   PR 3844 at commit 
[`4b58094`](https://github.com/apache/spark/commit/4b580943de5137e947d1a6cdadd054020932ed8e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MQTTStreamSuite extends FunSuite with Eventually with 
BeforeAndAfter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68429316
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68429438
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24945/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68429433
  
  [Test build #24945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24945/consoleFull)
 for   PR 3849 at commit 
[`8c387ce`](https://github.com/apache/spark/commit/8c387ce76850caf2163dd82dc5c79b079788a921).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68429462
  
I'd rather use fewer Akka features than more, since this will make it 
easier to replace Akka with our own RPC layer in the future.  Therefore, I'd 
much prefer to not allow exceptions to trigger actor restarts / state clearing. 
 I think that adding an experimental Akka feature like persistence would be a 
huge risk for little obvious gain.

I'm not sure if the heartbeat from unknown worker can ever occur if we 
don't clear the master's state because I think that workers only begin sending 
heartbeats once a master has ack'd their registration in which case the master 
would know that it was a previously-registered worker and instruct it to 
reconnect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-68429851
  
  [Test build #24951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24951/consoleFull)
 for   PR 3237 at commit 
[`bb35d1a`](https://github.com/apache/spark/commit/bb35d1a3e14703c1ca71a8b3b463a65b15cace3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3325][Streaming] Add a parameter to the...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3237#issuecomment-68429853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24951/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread YanTangZhai
Github user YanTangZhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3794#discussion_r22376680
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -178,7 +178,7 @@ abstract class RDD[T: ClassTag](
   // Our dependencies and partitions will be gotten by calling subclass's 
methods below, and will
   // be overwritten when we're checkpointed
   private var dependencies_ : Seq[Dependency[_]] = null
-  @transient private var partitions_ : Array[Partition] = null
+  @transient private var partitions_ : Array[Partition] = getPartitions
--- End diff --

Sorry. This approach may cause error as follows:
Exception in thread main java.lang.NullPointerException
at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
at 
com.google.common.collect.MapMakerInternalMap.put(MapMakerInternalMap.java:3499)
at 
org.apache.spark.rdd.HadoopRDD$.putCachedMetadata(HadoopRDD.scala:273)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:151)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:173)
at org.apache.spark.rdd.RDD.init(RDD.scala:181)
at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:97)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:561)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:471)
since jobConfCacheKey has not been initialized at that time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread jkbradley
GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/3856

[SPARK-5032] [graphx] Remove GraphX MIMA exclude for 1.3

Since GraphX is no longer alpha as of 1.2, MimaExcludes should not exclude 
GraphX for 1.3


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark graphx-mima

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3856


commit a3fea4282f9f96d6b5bb5d378ba6198160d84c31
Author: Joseph K. Bradley jos...@databricks.com
Date:   2014-12-31T08:30:51Z

removed graphx mima exclude for 1.3




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68430068
  
  [Test build #24953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24953/consoleFull)
 for   PR 3856 at commit 
[`a3fea42`](https://github.com/apache/spark/commit/a3fea4282f9f96d6b5bb5d378ba6198160d84c31).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68430096
  
This is not fixed yet; I need to include some Mima excludes for GraphX, it 
seems.  I'll update this within a day once I track down the JIRAs to associate 
with the excludes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3637#issuecomment-68431205
  
  [Test build #24952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24952/consoleFull)
 for   PR 3637 at commit 
[`dc5647e`](https://github.com/apache/spark/commit/dc5647ee1438228f53f79037e7b47a8d1ac2d61b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `// where index i corresponds to class i (i = 0, 1).`
  * `new Param(this, probabilityCol, column name for predicted class 
conditional probabilities,`
  * `class VectorUDT extends UserDefinedType[Vector] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3637#issuecomment-68431209
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24952/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3857

[SPARK-5035] [Streaming] ReceiverMessage trait should extend Serializable

Spark Streaming's ReceiverMessage trait should extend Serializable in order 
to fix a subtle bug that only occurs when running on a real cluster:

If you attempt to send a fire-and-forget message to a remote Akka actor and 
that message cannot be serialized, then this seems to lead to more-or-less 
silent failures. As an optimization, Akka skips message serialization for 
messages sent within the same JVM. As a result, Spark's unit tests will never 
fail due to non-serializable Akka messages, but these will cause mostly-silent 
failures when running on a real cluster.

Before this patch, here was the code for ReceiverMessage:

```
/** Messages sent to the NetworkReceiver. */
private[streaming] sealed trait ReceiverMessage
private[streaming] object StopReceiver extends ReceiverMessage
```

Since ReceiverMessage does not extend Serializable and StopReceiver is a 
regular `object`, not a `case object`, StopReceiver will throw serialization 
errors. As a result, graceful receiver shutdown is broken on real clusters but 
works in local and local-cluster modes. If you want to reproduce this, try 
running the word count example from the Streaming Programming Guide in the 
Spark shell:

```
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sc, Seconds(10))
// Create a DStream that will connect to hostname:port, like localhost:
val lines = ssc.socketTextStream(localhost, )
// Split each line into words
val words = lines.flatMap(_.split( ))
import org.apache.spark.streaming.StreamingContext._
// Count each word in each batch
val pairs = words.map(word = (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream to 
the console
wordCounts.print()
ssc.start()
Thread.sleep(1)
ssc.stop(true, true)
```

Prior to this patch, this would work correctly in local mode but fail when 
running against a real cluster.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-5035

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3857


commit 71d0eae7658641b9a820b86e8017dc9c7d3c6029
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-31T09:20:37Z

[SPARK-5035] ReceiverMessage trait should extend Serializable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68431882
  
  [Test build #24954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24954/consoleFull)
 for   PR 3857 at commit 
[`71d0eae`](https://github.com/apache/spark/commit/71d0eae7658641b9a820b86e8017dc9c7d3c6029).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68431968
  
/cc @tdas.

This might fix both https://issues.apache.org/jira/browse/SPARK-4986 and 
https://issues.apache.org/jira/browse/SPARK-2892, although there could possibly 
be more pieces to solving those (e.g. replace 10 second timeout with a 
configurable timeout).

I want to give a huge thanks to @cleaton for filing SPARK-4986 and for 
coming up with a workaround patch for SPARK-4986 which helped to spot this 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68432275
  
Also, this was a really nasty bug because it seems very hard to test for 
this in Spark's own unit tests.  Akka has a configuration option to force all 
messages to be serialized, even between local actors, but unfortunately this 
breaks Spark core because we send some non-serializable SparkContext references 
when initializing the DAGScheduler actor.

Can we force serialization by spinning up separate actor systems for the 
master / worker / executor processes when running in local-cluster mode?  Or is 
there some other way that we can selectively force serialization in order to 
uncover these sorts of issues?

We can definitely reproduce these sorts of issues in my 
spark-integration-tests system, since that uses multiple JVMs, but for 
completeness's sake I guess we'd need that tool's suites to send all of the 
remote messages (so this could be a lot of test code duplication). 

Maybe the simplest (general) preventative test would have been something 
that just tries to call the Java serializer on an instance of each message 
class, so we just test serializability independent of Akka.  Checking for 
serializability (either manually or through handwritten tests) should be part 
of our review checklist when adding new Akka messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68432566
  
Also, just to sanity check and make sure I haven't overlooked something, at 
least one other person besides me should run the `spark-shell` reproduction 
listed in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68432871
  
  [Test build #24955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull)
 for   PR 3849 at commit 
[`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68433088
  
  [Test build #24956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24956/consoleFull)
 for   PR 3844 at commit 
[`fc8eb28`](https://github.com/apache/spark/commit/fc8eb286db6aa8e78a567537996011f554eed969).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68433103
  
For Enums, this patch seems like a strict improvement over the status quo.  
The strengthening of the array checks is the only potentially controversial 
change, but I think it's extremely unlikely to break user programs (it could 
only affect users who tried to use CombineByKey with array keys and a custom 
serializer, which seems like an unlikely use case); besides, any program that 
this breaks was likely giving the wrong answer / results, so it's better to 
fail loudly.

I guess there are still a few cases that could slip through the cracks:

- Java users who use custom serializers
- Cases where the Java API uses the wrong manifest and can't tell that 
we've passed an array.

I think both of these cases can only be detected with runtime-checks on the 
first record being shuffled.  Maybe we should add those as part of a separate 
PR, though, if we think they're worthwhile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68433311
  
  [Test build #24953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24953/consoleFull)
 for   PR 3856 at commit 
[`a3fea42`](https://github.com/apache/spark/commit/a3fea4282f9f96d6b5bb5d378ba6198160d84c31).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68433315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24953/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68434376
  
@mengxr The implementation is renamed and moved to `linalg.Vectors`. Would 
you like to test it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68435325
  
  [Test build #24954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24954/consoleFull)
 for   PR 3857 at commit 
[`71d0eae`](https://github.com/apache/spark/commit/71d0eae7658641b9a820b86e8017dc9c7d3c6029).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5035] [Streaming] ReceiverMessage trait...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3857#issuecomment-68435331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24954/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68435869
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24956/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68435866
  
  [Test build #24956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24956/consoleFull)
 for   PR 3844 at commit 
[`fc8eb28`](https://github.com/apache/spark/commit/fc8eb286db6aa8e78a567537996011f554eed969).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MQTTStreamSuite extends FunSuite with Eventually with 
BeforeAndAfter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68436420
  
  [Test build #24957 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24957/consoleFull)
 for   PR 3794 at commit 
[`fd87518`](https://github.com/apache/spark/commit/fd87518d7f81de1a122cfad25a88956a596ccd4f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68437209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24955/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68437206
  
  [Test build #24955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24955/consoleFull)
 for   PR 3849 at commit 
[`0b10526`](https://github.com/apache/spark/commit/0b1052611f7fd7f71232ba3f2c0505e5711080e1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread YanTangZhai
Github user YanTangZhai commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68438167
  
@JoshRosen Thanks for your comments. I've updated it according to your 
comments and contrived a simple example as follows:
```javascript
val inputfile1 = ./testin/in_1.txt
val inputfile2 = ./testin/in_2.txt
val tempfile = ./testtmp
val outputfile = ./testout
val sc = new SparkContext(new SparkConf())
sc.textFile(inputfile1)
  .flatMap(line = line.split( ))
  .map(word = (word, 1))
  .reduceByKey(_ + _, 1)
  .map{kv = (kv._1 + , + kv._2.toString)}
  .saveAsTextFile(tempfile)
val wordCounts1 = sc.textFile(tempfile)
val wordCounts2 = sc.textFile(inputfile2)
val wordCounts = wordCounts1.union(wordCounts2)
wordCounts.map{line =
val kv = line.split(,)
(kv(0), Integer.parseInt(kv(1)))
   }
   .reduceByKey(_ + _, 1)
   .map{kv = (kv._1 + , + kv._2.toString)}
   .saveAsTextFile(outputfile)
```
./testin/in_1.txt (23 bytes) and ./testin/in_2.txt (19 bytes) are all local 
files.
- Before optimization,
 - job1
   br/New stage creation took 0.729638 s among which HadoopRDD 
getPartitions took 0.710247 s.
 - job2
   br/New stage creation took 0.882241 s among which 
HadoopRDD.getPartitions took 0.850668 + 0.023490 s.
- After optimization,
 - job1
   br/HadoopRDD getPartitions took 0.802133 s.
   br/New stage creation took 0.029328 s.
 - job2
   br/HadoopRDD getPartitions took 0.464713 + 0.022568 s.
   br/New stage creation took 0.001773 s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68438540
  
  [Test build #24958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24958/consoleFull)
 for   PR 3794 at commit 
[`74c1dec`](https://github.com/apache/spark/commit/74c1dec31ba9ded5a82640f7354aa6231169281c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68441198
  
**[Test build #24957 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24957/consoleFull)**
 for PR 3794 at commit 
[`fd87518`](https://github.com/apache/spark/commit/fd87518d7f81de1a122cfad25a88956a596ccd4f)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68441201
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24957/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68441848
  
  [Test build #24959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24959/consoleFull)
 for   PR 3222 at commit 
[`2948c58`](https://github.com/apache/spark/commit/2948c583e77d636b001d484872abf4e76a2f02dd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68442070
  
  [Test build #24960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24960/consoleFull)
 for   PR 3222 at commit 
[`ebd07ad`](https://github.com/apache/spark/commit/ebd07addf5883cfdeacc88a0b551fd5c9a2245e6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68443755
  
**[Test build #24958 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24958/consoleFull)**
 for PR 3794 at commit 
[`74c1dec`](https://github.com/apache/spark/commit/74c1dec31ba9ded5a82640f7354aa6231169281c)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-68443757
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24958/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68445236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24959/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68445231
  
  [Test build #24959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24959/consoleFull)
 for   PR 3222 at commit 
[`2948c58`](https://github.com/apache/spark/commit/2948c583e77d636b001d484872abf4e76a2f02dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AdaGradUpdater(`
  * `class DBN(val stackedRBM: StackedRBM)`
  * `class MLP(`
  * `class MomentumUpdater(val momentum: Double) extends Updater `
  * `class RBM(`
  * `class StackedRBM(val innerRBMs: Array[RBM])`
  * `case class MinstItem(label: Int, data: Array[Int]) `
  * `class MinstDatasetReader(labelsFile: String, imagesFile: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-31 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68445280
  
It doesn't seem to me that usage of the newer Akka persistence API is 
called for, but it does seem that wrapping the `receive` in a try-catch is 
trying to do the job for which Akka's `SupervisorStrategy` is intended.  I 
can't recommend the hand-rolled try-catch approach.

http://doc.akka.io/docs/akka/2.3.4/general/supervision.html#supervision


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68446626
  
  [Test build #24960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24960/consoleFull)
 for   PR 3222 at commit 
[`ebd07ad`](https://github.com/apache/spark/commit/ebd07addf5883cfdeacc88a0b551fd5c9a2245e6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AdaGradUpdater(`
  * `class DBN(val stackedRBM: StackedRBM)`
  * `class MLP(`
  * `class MomentumUpdater(val momentum: Double) extends Updater `
  * `class RBM(`
  * `class StackedAutoEncoder(val stackedRBM: StackedRBM)`
  * `class StackedRBM(val innerRBMs: Array[RBM])`
  * `case class MinstItem(label: Int, data: Array[Int]) `
  * `class MinstDatasetReader(labelsFile: String, imagesFile: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68446628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24960/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-68453172
  
Hmm, looks like this change somehow broke a PySpark Streaming test:

```
==
FAIL: Basic operation test for DStream.mapPartitions.
--
Traceback (most recent call last):
  File pyspark/streaming/tests.py, line 228, in test_mapPartitions
self._test_func(rdds, func, expected)
  File pyspark/streaming/tests.py, line 114, in _test_func
self.assertEqual(expected, result)
AssertionError: [[3, 7], [11, 15], [19, 23]] != []
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5037] dynamically loaded DStreams imple...

2014-12-31 Thread industrial-sloth
GitHub user industrial-sloth opened a pull request:

https://github.com/apache/spark/pull/3858

[SPARK-5037] dynamically loaded DStreams implementation and example

This PR adds a new reflection-based method of creating input DStreams to 
the scala StreamingContext, and wires it through to the python streaming API.

Trying to create DStream instances directly by reflection runs into trouble 
with unwanted stuff getting dragged into closures, so I worked around this by 
defining a new abstract serializable `ReflectedDStreamFactory` class. The idea 
is that one subclasses this with a concrete implementation that directly 
instantiates the desired InputDStream; then the StreamingContext uses 
reflection to dynamically load this new Factory implementation. This PR also 
has an example showing how this works with the existing ZeroMQ example code in 
both the scala and python streaming APIs.

Parameters are passed into the input DStream indirectly by first putting 
them into the factory constructor, then requiring the factory implementation to 
pass them on into the DStream instance. At the moment these parameters are 
limited to String type, which I think should cover the majority of use cases, 
but I'd think it should be possible to generalize this further. 

Am throwing this out there for comment; suggestions and alternative 
approaches more than welcome.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/industrial-sloth/spark reflected-dstreams

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3858


commit 2ffec19c21348934911a56a14799a0ddcae5e4da
Author: industrial-sloth industrial-sl...@users.noreply.github.com
Date:   2014-12-31T16:54:48Z

dynamically leaded DStreams implementation and example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5037] dynamically loaded DStreams imple...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3858#issuecomment-68457064
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread brennonyork
Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68457321
  
@JoshRosen took care of the minor edits for ya!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68457422
  
  [Test build #24961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24961/consoleFull)
 for   PR 3561 at commit 
[`5e0fce1`](https://github.com/apache/spark/commit/5e0fce1cd0a6b4b413c31e8ca214c11c569c6164).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-68457889
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-68458124
  
  [Test build #24962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24962/consoleFull)
 for   PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...

2014-12-31 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3562#issuecomment-68458684
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3562#issuecomment-68458773
  
  [Test build #24963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24963/consoleFull)
 for   PR 3562 at commit 
[`6e97f72`](https://github.com/apache/spark/commit/6e97f72ca401e21e6ef81f7a0535b96801776e6f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68459442
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68459446
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68459776
  
  [Test build #24964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24964/consoleFull)
 for   PR 3643 at commit 
[`f28b275`](https://github.com/apache/spark/commit/f28b275e153b3d093bf063c53efe1dea91084918).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68462221
  
  [Test build #24965 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24965/consoleFull)
 for   PR 3856 at commit 
[`30f8bb4`](https://github.com/apache/spark/commit/30f8bb4cbe472a536c9506a7365e76f736adcb33).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68462618
  
  [Test build #558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/558/consoleFull)
 for   PR 3854 at commit 
[`0f1d96e`](https://github.com/apache/spark/commit/0f1d96e2b4292edbf0a4c9db82fc2969016b0587).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68462718
  
@tgaloppo  Ok, thanks for the fix!  LGTM once the tests are done.

@tgaloppo @mengxr As long as this method is being edited, do you like the 
name ```predictMembership``` for soft clustering?  I assume we may eventually 
use the same term for other clustering methods, includes LDA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68462934
  
  [Test build #24961 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24961/consoleFull)
 for   PR 3561 at commit 
[`5e0fce1`](https://github.com/apache/spark/commit/5e0fce1cd0a6b4b413c31e8ca214c11c569c6164).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68462940
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24961/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68463177
  
Hey @jkbradley - are you sure those excludes are necessary? All of the 
patches you mentioned are in Spark 1.2 already excludes should only be 
relevant to things that changed between 1.2 and the master branch. Ideally we 
haven't made any breaking changes since the 1.2 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5038][SQL] Add explicit return type for...

2014-12-31 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/3859

[SPARK-5038][SQL] Add explicit return type for implicit functions in Spark 
SQL



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark sql-implicits

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3859.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3859


commit 30c2c2463fd7fafbce2f072e9f81ec813b9e6589
Author: Reynold Xin r...@databricks.com
Date:   2014-12-31T19:21:09Z

[SPARK-5038] Add explicit return type for implicit functions in Spark SQL.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5038][SQL] Add explicit return type for...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3859#issuecomment-68463483
  
  [Test build #24966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24966/consoleFull)
 for   PR 3859 at commit 
[`30c2c24`](https://github.com/apache/spark/commit/30c2c2463fd7fafbce2f072e9f81ec813b9e6589).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5038] Add explicit return type for impl...

2014-12-31 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/3860

[SPARK-5038] Add explicit return type for implicit functions.

This is a follow up PR for rest of Spark (outside Spark SQL).

The original PR for Spark SQL can be found at 
https://github.com/apache/spark/pull/3859

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark implicit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3860


commit 73702f9ad2725c51fb1a58aa8c3b58eb1b0fd88d
Author: Reynold Xin r...@databricks.com
Date:   2014-12-31T19:29:45Z

[SPARK-5038] Add explicit return type for implicit functions.

This is a follow up PR for rest of Spark (outside Spark SQL).

The original PR for Spark SQL can be found at 
https://github.com/apache/spark/pull/3859




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5038] Add explicit return type for impl...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3860#issuecomment-68464128
  
  [Test build #24967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24967/consoleFull)
 for   PR 3860 at commit 
[`73702f9`](https://github.com/apache/spark/commit/73702f9ad2725c51fb1a58aa8c3b58eb1b0fd88d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3562#issuecomment-68464188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24963/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3562#issuecomment-68464184
  
  [Test build #24963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24963/consoleFull)
 for   PR 3562 at commit 
[`6e97f72`](https://github.com/apache/spark/commit/6e97f72ca401e21e6ef81f7a0535b96801776e6f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68464357
  
I ran some quick tests with random sparsity patterns.  Averaged over 1000 
iterations, it's definitely faster:

length  |   v1 sparsity |   v2 sparsity |   new 
time|   old time|   speedup
-   |   -   |   --- |   
--- |   --- |   ---
1000|   1   |   0.5 |   9.42E-06|   
6.73E-04|   71.44
1000|   1   |   0.1 |   1.69E-06|   
5.50E-05|   32.43
1000|   1   |   0.01|   1.90E-06|   
3.30E-05|   17.40
1000|   0.5 |   0.1 |   9.89E-06|   
7.17E-05|   7.25
1000|   0.5 |   0.01|   2.54E-06|   
5.80E-05|   22.80
1000|   0.1 |   0.01|   1.95E-06|   
5.82E-05|   29.84
1   |   1   |   0.5 |   1.11E-05|   
2.30E-04|   20.73
1   |   1   |   0.1 |   1.03E-05|   
2.54E-04|   24.54
1   |   1   |   0.01|   8.69E-06|   
3.90E-04|   44.92
1   |   0.5 |   0.1 |   1.47E-05|   
3.90E-04|   26.63
1   |   0.5 |   0.01|   8.63E-06|   
4.03E-04|   46.76
1   |   0.1 |   0.01|   1.81E-06|   
5.96E-04|   329.01
10  |   1   |   0.5 |   9.27E-05|   
0.004039351 |   43.60
10  |   1   |   0.1 |   9.06E-05|   
0.001540544 |   17.01
10  |   1   |   0.01|   8.71E-05|   
0.002636216 |   30.25
10  |   0.5 |   0.1 |   1.15E-04|   
0.003777669 |   32.76
10  |   0.5 |   0.01|   9.61E-05|   
0.004879063 |   50.79
10  |   0.1 |   0.01|   1.89E-05|   
0.003148419 |   166.29
100 |   1   |   0.5 |   0.001017196 |   
0.05418411  |   53.27



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68464642
  
@viirya Thanks for the updates!  LGTM pending Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5032] [graphx] Remove GraphX MIMA exclu...

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3856#issuecomment-68464735
  
@pwendell  Check out the MIMA failures before the excludes:
```
[error]  * abstract method unpersist(Boolean)org.apache.spark.graphx.Graph 
in class org.apache.spark.graphx.Graph does not have a correspondent in old 
version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.graphx.Graph.unpersist)
[error]  * abstract method checkpoint()Unit in class 
org.apache.spark.graphx.Graph does not have a correspondent in old version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.graphx.Graph.checkpoint)
[error]  * method 
fromEdges(org.apache.spark.rdd.RDD,scala.reflect.ClassTag,scala.reflect.ClassTag)org.apache.spark.graphx.EdgeRDD
 in object org.apache.spark.graphx.EdgeRDD has now a different result type; 
was: org.apache.spark.graphx.EdgeRDD, is now: 
org.apache.spark.graphx.impl.EdgeRDDImpl
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.graphx.EdgeRDD.fromEdges)
[error]  * abstract method 
filter(scala.Function1,scala.Function2)org.apache.spark.graphx.EdgeRDD in class 
org.apache.spark.graphx.EdgeRDD does not have a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.graphx.EdgeRDD.filter)
[error]  * method 
filter(scala.Function1,scala.Function2)org.apache.spark.graphx.EdgeRDD in class 
org.apache.spark.graphx.impl.EdgeRDDImpl has now a different result type; was: 
org.apache.spark.graphx.EdgeRDD, is now: 
org.apache.spark.graphx.impl.EdgeRDDImpl
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.graphx.impl.EdgeRDDImpl.filter)
```

Were the version tags not centered at the right commits?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68465045
  
  [Test build #24964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24964/consoleFull)
 for   PR 3643 at commit 
[`f28b275`](https://github.com/apache/spark/commit/f28b275e153b3d093bf063c53efe1dea91084918).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68465046
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24964/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1010] Clean up uses of System.setProper...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3739#issuecomment-68465243
  
It doesn't look like there are any new test failures / flakiness that can 
be attributed to this patch, so I've finished backporting this to `branch-1.2` 
(1.2.1), `branch-1.1` (1.1.2), and `branch-1.0` (1.0.3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3643


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread tgaloppo
Github user tgaloppo commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68465266
  
@jkbradley I am not crazy about the name predictMembership() ... to me it 
implies the hard assignment;  a simple change like predictMemberships() might 
be more clear, or predictSoft(), or (thinking from a slightly different 
direction) allocate().  Any of those should be robust enough to reuse for soft 
k-means or LDA (or other such partial assignment algorithms).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3643#issuecomment-68465304
  
Merged into master. Thanks! (minor TODO: Though `sqdist` is touched in 
MLUtilsSuite, it would be nice to add unit tests to `VectorsSuite`.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68465381
  
I finished my backports of the other patch, so I'm going to merge this now. 
 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3561


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68465590
  
It may be hard for users to tell the difference between `predict` and 
`predictMembership`, because `predict` is also predicting the membership. 
`predictFuzzy` or `predictSoft` sounds better to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-68465589
  
**[Test build #24962 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24962/consoleFull)**
 for PR 3850 at commit 
[`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...

2014-12-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3850#issuecomment-68465596
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24962/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3561#issuecomment-68465783
  
Alright, I've merged this to `master` (1.3.0), `branch-1.2` (1.2.1), 
`branch-1.1` (1.1.2), and `branch-1.0` (1.0.3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-794][Core] Remove sleep() in ClusterSch...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3851#issuecomment-68465933
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-794][Core] Remove sleep() in ClusterSch...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3851#issuecomment-68466150
  
  [Test build #24968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24968/consoleFull)
 for   PR 3851 at commit 
[`04c3e64`](https://github.com/apache/spark/commit/04c3e648021fa38acdde0745d4f7f961ef125dc1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Integrate external shuffle service to coarse g...

2014-12-31 Thread tnachen
GitHub user tnachen opened a pull request:

https://github.com/apache/spark/pull/3861

Integrate external shuffle service to coarse grained Mesos mode



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tnachen/spark mesos_shuffle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3861


commit 60df548387412ae12e6bd8439d48931aa564a22b
Author: Timothy Chen tnac...@apache.org
Date:   2014-12-05T05:55:42Z

Launch External Shuffle Service with mesos

commit 145bf5b9578d8087fb926e8fd73a8b04c34d07aa
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-13T20:17:47Z

Support total and kill executors in coarse grained mesos mode.

commit 03ee4f79c86c846c923ec83e17dd5ea7805091f6
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-17T01:36:56Z

Propogate the shuffle service setting.

commit 7434bb22858899a8808edee16c11c7bd68263828
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-20T01:27:42Z

Implement a new executor for coarse grained mesos mode.

commit 25331b1216889a9abfb40e583b56657ba45f840e
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-24T05:28:09Z

Launch executor with shell and add traces

commit 1aca094a1a20e961c76f65c98139ead4de8e4eab
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-30T08:12:42Z

Fix destroying executor.

commit 5c9fd75b2ae48f8a2c8c0b6cf8ebd7ab58e84b18
Author: Timothy Chen tnac...@gmail.com
Date:   2014-12-31T20:10:19Z

Only process status update if task is still tracked.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68466378
  
+1 for predictSoft


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Integrate external shuffle service to coarse g...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-68466436
  
  [Test build #24969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24969/consoleFull)
 for   PR 3861 at commit 
[`5c9fd75`](https://github.com/apache/spark/commit/5c9fd75b2ae48f8a2c8c0b6cf8ebd7ab58e84b18).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68466535
  
  [Test build #558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/558/consoleFull)
 for   PR 3854 at commit 
[`0f1d96e`](https://github.com/apache/spark/commit/0f1d96e2b4292edbf0a4c9db82fc2969016b0587).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4835] [WIP] Disable validateOutputSpecs...

2014-12-31 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3832#issuecomment-68466612
  
I'd be glad to add a test here, although this might be a little tricky 
since the old behavior resulted in silent failures; I should be able to come up 
with a test though.

Regarding the streaming-specific 
`spark.streaming.hadoop.validateOutputSpecs` setting, which of the following 
behaviors is more intuitive?

1. Streaming jobs always respect the Streaming version of the setting and 
non-streaming jobs respect the regular version.  If the streaming checks are 
enabled but the core checks are disabled, then we do output spec validation for 
streaming.
2. The Streaming version is just a gate which controls whether the core 
setting also applies to streaming jobs.  If the streaming setting is true but 
the core setting is false, then the checks are not applied.

Which of these makes more sense?  I think that option 2 is a better 
backwards-compatibility escape hatch / flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5020 [MLlib] GaussianMixtureModel.predic...

2014-12-31 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3854#issuecomment-68466757
  
streaming failure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >