date:20141109

[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62295971
  
  [Test build #23115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23115/consoleFull)
 for   PR 3159 at commit 
[`5dcd602`](https://github.com/apache/spark/commit/5dcd602ca04d90d80066d6405920a684749aeea4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62295972
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23115/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62296123
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23114/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62296120
  
  [Test build #23114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23114/consoleFull)
 for   PR 3159 at commit 
[`5dcd602`](https://github.com/apache/spark/commit/5dcd602ca04d90d80066d6405920a684749aeea4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4237][BUILD] Fix MANIFEST.MF in maven a...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3103#issuecomment-62296677
  
  [Test build #23116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23116/consoleFull)
 for   PR 3103 at commit 
[`8332304`](https://github.com/apache/spark/commit/8332304f00130c4a7ff429d3892c55e02494a0c0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4237][BUILD] Fix MANIFEST.MF in maven a...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3103#issuecomment-62296681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23116/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3971][SQL] Backport #2843 to branch-1.1

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3113#issuecomment-62297437
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23117/consoleFull)
 for   PR 3113 at commit 
[`d354161`](https://github.com/apache/spark/commit/d3541613da1c3e5b309645cb103d9a4a972b812b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update RecoverableNetworkWordCount.scala

2014-11-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2735#discussion_r20057909
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala
 ---
@@ -114,7 +115,7 @@ object RecoverableNetworkWordCount {
 val Array(ip, IntParam(port), checkpointDirectory, outputPath) = args
 val ssc = StreamingContext.getOrCreate(checkpointDirectory,
   () = {
-createContext(ip, port, outputPath)
+createContext(ip, port, outputPath, checkpointDirectory)
--- End diff --

@tdas Can I double-check that it's correct to call 
`StreamingContext.checkpoint` only within the create context function? as 
opposed to always calling it on the result of `StreamingContext.getOrCreate`? 
That is, if it reads checkpoint data, it already configures itself to continue 
using that checkpoint directory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

2014-11-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3099#issuecomment-62299039
  
@shivaram 
  IMHO it would be good to have the developer API updates as well and test 
a couple of more pipelines before we push this out.

I'll try to get a branch based on this PR ready next week for feedback.  
Not sure if we want to do a mega-PR though; hopefully it can be kept as a 
separate follow-up.

 Also I am not sure I fully understand the difference between the User API 
and the Developer API

These are loose terms; part of the Developer API will actually be public. 
 E.g., Classifier will be public since it will be needed for the boosting API.  
But most users won't have to worry about these abstract classes, and the 
classes will include some private[ml] methods to make developers' lives easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3971][SQL] Backport #2843 to branch-1.1

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3113#issuecomment-62299144
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23117/consoleFull)
 for   PR 3113 at commit 
[`d354161`](https://github.com/apache/spark/commit/d3541613da1c3e5b309645cb103d9a4a972b812b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3971][SQL] Backport #2843 to branch-1.1

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3113#issuecomment-62299146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23117/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3971][SQL] Backport #2843 to branch-1.1

2014-11-09 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3113#issuecomment-62299785
  
@marmbrus Backported #2164 to fix the Jenkins build failure 
(ParquetQuerySuite). Should be ready to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread yu-iskw

Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62302445
  
There is a few conflicts with master brach. I will rebase my PR branch, and 
then force push it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...

2014-11-09 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3112#issuecomment-62302608
  
Additionally, I think spark.driver.host is useful in all client mode 
including standalone, mesos mode ( i don't know it very mych) and yarn-client 
mode. When the cluster cannot resolve client's hostname we must set this 
configuration to client's ip address to avoid failure to connect to driver.

If i understood it wrong, correct me please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

2014-11-09 Thread SaintBacchus

Github user SaintBacchus closed the pull request at:

https://github.com/apache/spark/pull/2874


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3971][SQL] Backport #2843 to branch-1.1

2014-11-09 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3113#issuecomment-62304982
  
@marmbrus However, I didn't quite get why #2164 fixes those Parquet tests. 
Especially why did you say the original test cases are order dependent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Change the initial iteration num of ruleExecut...

2014-11-09 Thread DoingDone9

GitHub user DoingDone9 opened a pull request:

https://github.com/apache/spark/pull/3174

Change the initial iteration num of ruleExecutor from 1 to 0

Change the initial iteration num of ruleExecutor from 1 to 0.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DoingDone9/spark catalyst_issue_01

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3174


commit fe0bc4ca5656ba8ec490428748fecc22948b7d95
Author: DoingDone9 799203...@qq.com
Date:   2014-11-09T14:49:08Z

Change the first iteration num of ruleExecutor from 1 to 0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Change the initial iteration num of ruleExecut...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3174#issuecomment-62305964
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-11-09 Thread helena

Github user helena commented on a diff in the pull request:

https://github.com/apache/spark/pull/2994#discussion_r20059145
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
 ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka
+
+import java.util.Properties
+
+import scala.reflect.ClassTag
+
+import kafka.producer.{ProducerConfig, KeyedMessage, Producer}
+
+import org.apache.spark.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.streaming.dstream.DStream
+
+/**
+ * Import this object in this form:
+ * {{{
+ *   import org.apache.spark.streaming.kafka.KafkaWriter._
+ * }}}
+ *
+ * Once imported, the `writeToKafka` can be called on any [[DStream]] 
object in this form:
+ * {{{
+ *   dstream.writeToKafka(producerConfig, f)
+ * }}}
+ */
+object KafkaWriter {
+  import scala.language.implicitConversions
+  /**
+   * This implicit method allows the user to call dstream.writeToKafka(..)
+   * @param dstream - DStream to write to Kafka
+   * @tparam T - The type of the DStream
+   * @tparam K - The type of the key to serialize to
+   * @tparam V - The type of the value to serialize to
+   * @return
+   */
+  implicit def createKafkaOutputWriter[T: ClassTag, K, V](dstream: 
DStream[T]): KafkaWriter[T] = {
+new KafkaWriter[T](dstream)
+  }
+}
+
+/**
+ *
+ * This class can be used to write data to Kafka from Spark Streaming. To 
write data to Kafka
+ * simply `import org.apache.spark.streaming.kafka.KafkaWriter._` in your 
application and call
+ * `dstream.writeToKafka(producerConf, func)`
+ *
+ * Here is an example:
+ * {{{
+ * // Adding this line allows the user to call 
dstream.writeDStreamToKafka(..)
+ * import org.apache.spark.streaming.kafka.KafkaWriter._
+ *
+ * class ExampleWriter {
+ *   val instream = ssc.queueStream(toBe)
+ *   val producerConf = new Properties()
+ *   producerConf.put(serializer.class, 
kafka.serializer.DefaultEncoder)
+ *   producerConf.put(key.serializer.class, 
kafka.serializer.StringEncoder)
+ *   producerConf.put(metadata.broker.list, kafka.example.com:5545)
+ *   producerConf.put(request.required.acks, 1)
+ *   instream.writeToKafka(producerConf,
+ *(x: String) = new KeyedMessage[String, String](default, null, x))
+ *   ssc.start()
+ * }
+ *
+ * }}}
+ * @param dstream - The [[DStream]] to be written to Kafka
+ *
+ */
+class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) extends 
Serializable with Logging {
+
+  /**
+   * To write data from a DStream to Kafka, call this function after 
creating the DStream. Once
+   * the DStream is passed into this function, all data coming from the 
DStream is written out to
+   * Kafka. The properties instance takes the configuration required to 
connect to the Kafka
+   * brokers in the standard Kafka format. The serializerFunc is a 
function that converts each
+   * element of the RDD to a Kafka [[KeyedMessage]]. This closure should 
be serializable - so it
+   * should use only instances of Serializables.
+   * @param producerConfig The configuration that can be used to connect 
to Kafka
+   * @param serializerFunc The function to convert the data from the 
stream into Kafka
+   *   [[KeyedMessage]]s.
+   * @tparam K The type of the key
+   * @tparam V The type of the value
+   *
+   */
+  def writeToKafka[K, V](producerConfig: Properties,
+serializerFunc: T = KeyedMessage[K, V]): Unit = {
+
+// Broadcast the producer to avoid sending it every time.
+val broadcastedConfig = dstream.ssc.sc.broadcast(producerConfig)
+
+def func = (rdd: RDD[T]) = {
+  rdd.foreachPartition(events = {
+// The ForEachDStream runs the function locally on the driver.
+// This code

[GitHub] spark pull request: Change the initial iteration num of ruleExecut...

2014-11-09 Thread DoingDone9

Github user DoingDone9 commented on the pull request:

https://github.com/apache/spark/pull/3174#issuecomment-62306218
  
i am new, but i think it is should be 0 not 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4274] [SQL] Print informative message w...

2014-11-09 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3139#discussion_r20059176
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala
 ---
@@ -341,12 +341,21 @@ abstract class HiveComparisonTest
   val query = new TestHive.HiveQLQueryExecution(queryString)
   try { (query, prepareAnswer(query, query.stringResult())) } 
catch {
 case e: Throwable =
+  val logicalQueryInString = try {
+query.toString
--- End diff --

Oh, I didn't know this. Thank you @marmbrus very much for the code snippet.

After debugging, I think you are right, the exception is thrown by 
`${executedPlan.codegenEnabled}`, the `executedPlan` is null if something wrong 
in parsing or analyzing etc. I've updated the code again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4274] [SQL] Print informative message w...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3139#issuecomment-62306341
  
  [Test build #23118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23118/consoleFull)
 for   PR 3139 at commit 
[`f5d7146`](https://github.com/apache/spark/commit/f5d714662d4a2e487d42531c4df6dfcf0c49b296).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Sets SQL operation state to ERROR when excepti...

2014-11-09 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/3175

Sets SQL operation state to ERROR when exception is thrown

In `HiveThriftServer2`, when an exception is thrown during a SQL execution, 
the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. 
This affects the result of the `GetOperationStatus` Thrift API.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark fix-op-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3175


commit 6d4c1fed5e701c79de1e1489342e0d167159ba12
Author: Cheng Lian l...@databricks.com
Date:   2014-11-09T10:08:43Z

Sets SQL operation state to ERROR when exception is thrown




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Sets SQL operation state to ...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3175#issuecomment-62306503
  
  [Test build #23119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23119/consoleFull)
 for   PR 3175 at commit 
[`6d4c1fe`](https://github.com/apache/spark/commit/6d4c1fed5e701c79de1e1489342e0d167159ba12).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Follow up of #3175 for branc...

2014-11-09 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/3176

[SPARK-4308][SQL] Follow up of #3175 for branch 1.1

The PR for master branch can't be backported to branch 1.1 directly because 
Hive 0.13.1 support.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark fix-op-state-for-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3176


commit 8791d87661f91a72fbd605bdfc9dd56bfa621821
Author: Cheng Lian l...@databricks.com
Date:   2014-11-09T15:16:51Z

This is a follow up of #3175 for branch 1.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2213][SQL] Sort Merge Join

2014-11-09 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3173#issuecomment-62306671
  
That's really nice to have the Sort-Merge-Join, as we did meet some of join 
queries couldn't run completely in real cases.

One high level comment on this, can we also keep the `ShuffleHashJoin`? It 
still can be faster than the Sort-Merge-Join in some cases, all we need is a 
configuration/strategy to map to different Join Operators.

BTW: do you have any performance comparison result can be shared with us?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Follow up of #3175 for branc...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3176#issuecomment-62306835
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23120/consoleFull)
 for   PR 3176 at commit 
[`8791d87`](https://github.com/apache/spark/commit/8791d87661f91a72fbd605bdfc9dd56bfa621821).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62307500
  
  [Test build #23121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23121/consoleFull)
 for   PR 2906 at commit 
[`691c49a`](https://github.com/apache/spark/commit/691c49adf9751193f3b8928211e77d307ef44c37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4295][External]Fix exception in SparkSi...

2014-11-09 Thread maji2014

GitHub user maji2014 opened a pull request:

https://github.com/apache/spark/pull/3177

[SPARK-4295][External]Fix exception in SparkSinkSuite

Handle exception in SparkSinkSuite, please refer to [SPARK-4295]

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maji2014/spark spark-4295

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3177


commit c807bf66a8d945708af0f620576255cc133ffe46
Author: maji2014 ma...@asiainfo.com
Date:   2014-11-09T15:58:50Z

Fix exception in SparkSinkSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4295][External]Fix exception in SparkSi...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3177#issuecomment-62308120
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4274] [SQL] Print informative message w...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3139#issuecomment-62308752
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23118/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4274] [SQL] Print informative message w...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3139#issuecomment-62308750
  
  [Test build #23118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23118/consoleFull)
 for   PR 3139 at commit 
[`f5d7146`](https://github.com/apache/spark/commit/f5d714662d4a2e487d42531c4df6dfcf0c49b296).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Sets SQL operation state to ...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3175#issuecomment-62309229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23119/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Sets SQL operation state to ...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3175#issuecomment-62309225
  
  [Test build #23119 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23119/consoleFull)
 for   PR 3175 at commit 
[`6d4c1fe`](https://github.com/apache/spark/commit/6d4c1fed5e701c79de1e1489342e0d167159ba12).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2811 upgrade algebird to 0.8.1

2014-11-09 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2947#issuecomment-62309276
  
Looks good then.
On Nov 9, 2014 8:24 AM, Adam Pingel notificati...@github.com wrote:

 Algebird 0.8.1 for Scala 2.11 is on the central repo:
 
http://search.maven.org/#artifactdetails%7Ccom.twitter%7Calgebird_2.11%7C0.8.1%7Cjar

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2947#issuecomment-62288693.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4309][SQL] Date type support for Thrift...

2014-11-09 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/3178

[SPARK-4309][SQL] Date type support for Thrift server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark date-for-thriftserver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3178


commit 70b1becb6d0e852cad6baa9457a1e741036347fd
Author: Cheng Lian l...@databricks.com
Date:   2014-11-09T16:20:46Z

Adds Date support for HiveThriftServer2 (Hive 0.12.0)

commit 313248c8545b105b2ac83d0062ba0306fabd7859
Author: Cheng Lian l...@databricks.com
Date:   2014-11-09T16:39:59Z

Updates HiveShim for 0.13.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4309][SQL] Date type support for Thrift...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3178#issuecomment-62309727
  
  [Test build #23122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23122/consoleFull)
 for   PR 3178 at commit 
[`313248c`](https://github.com/apache/spark/commit/313248c8545b105b2ac83d0062ba0306fabd7859).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4000][Build] Uploads HiveCompatibilityS...

2014-11-09 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2993#issuecomment-62309812
  
@pwendell ping :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62310159
  
  [Test build #23121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23121/consoleFull)
 for   PR 2906 at commit 
[`691c49a`](https://github.com/apache/spark/commit/691c49adf9751193f3b8928211e77d307ef44c37).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaHierarchicalClustering `
  * `trait HierarchicalClusteringConf extends Serializable `
  * `class HierarchicalClustering(`
  * `class HierarchicalClusteringModel(object):`
  * `class HierarchicalClustering(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3158#issuecomment-62310177
  
  [Test build #23123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23123/consoleFull)
 for   PR 3158 at commit 
[`c5fb299`](https://github.com/apache/spark/commit/c5fb299c3327a78fb9ab1988e46f64a2bdd83807).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62310162
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23121/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Follow up of #3175 for branc...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3176#issuecomment-62310223
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23120/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4308][SQL] Follow up of #3175 for branc...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3176#issuecomment-62310222
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23120/consoleFull)
 for   PR 3176 at commit 
[`8791d87`](https://github.com/apache/spark/commit/8791d87661f91a72fbd605bdfc9dd56bfa621821).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3158#issuecomment-62311505
  
  [Test build #23123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23123/consoleFull)
 for   PR 3158 at commit 
[`c5fb299`](https://github.com/apache/spark/commit/c5fb299c3327a78fb9ab1988e46f64a2bdd83807).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LhsLiteral(x: Any) `
  * `final class MutableDate extends MutableValue `
  * `final class MutableTimestamp extends MutableValue `
  * `class RichDate(milliseconds: Long) extends Date(milliseconds) `
  * `class RichTimestamp(milliseconds: Long) extends 
Timestamp(milliseconds) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3158#issuecomment-62311507
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23123/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4309][SQL] Date type support for Thrift...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3178#issuecomment-62312335
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23122/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4309][SQL] Date type support for Thrift...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3178#issuecomment-62312332
  
  [Test build #23122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23122/consoleFull)
 for   PR 3178 at commit 
[`313248c`](https://github.com/apache/spark/commit/313248c8545b105b2ac83d0062ba0306fabd7859).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3648: Provide a script for fetching remo...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3165#issuecomment-62313063
  
Okay i'm gonna close this. If one of you guys could quickly add docs on our 
wiki, that would be great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3648: Provide a script for fetching remo...

2014-11-09 Thread pwendell

Github user pwendell closed the pull request at:

https://github.com/apache/spark/pull/3165


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4047] - Generate runtime warnings for e...

2014-11-09 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2894#issuecomment-62313396
  
@varadharajan Good suggestion about documenting algs for LR; I'll make a 
note to do that for the upcoming release.  Thank you for the PR!

LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3022#issuecomment-62313507
  
  [Test build #514 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/514/consoleFull)
 for   PR 3022 at commit 
[`c15405c`](https://github.com/apache/spark/commit/c15405c78345e9a46549a398c6b59bed80274f9e).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4079] [CORE] Default to LZF if Snappy n...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3119#issuecomment-62313527
  
Could this instead just throw an exception when Snappy is configured but 
not supported? We typically try not to silently mutate configs in the 
background in favor of giving users an actionable exception. I think this could 
be accomplished by just modifying `SnappyCompressionCodec` to guard the 
creation of an input stream or output stream with a check as to whether Snappy 
is enabled, and throw an exception if it is not enabled.

The current approach could lead to very confusing failure behavior. For 
instance say a user has the Snappy native library installed on some machines 
but not others. What will happen is that there will be a stream corruption 
exception somewhere inside of Spark where one node writes data as Snappy and 
another reads it as LZF. And to figure out what caused this a user will have to 
troll through executor logs for a somewhat innocuous looking `WARN` statement.

@rxin designed this codec interface (I think) so maybe he has more comments 
also.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20060444
  
--- Diff: docs/configuration.md ---
@@ -563,8 +566,8 @@ Apart from these, the following properties are also 
available, and may be useful
 /ul
   /td
   td
-Default number of tasks to use across the cluster for distributed 
shuffle operations
-(codegroupByKey/code, codereduceByKey/code, etc) when not set 
by user.
+Default number of output partitions for operations like 
codejoin/code,
--- End diff --

Should this say number of shuffle partitions - it's slightly weird to me 
to say output when this refers to something that is totally internal to Spark 
- it's output on the map side but input on he read side. In other cases I think 
output tends to mean things like saving as HDFS data, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20060453
  
--- Diff: docs/configuration.md ---
@@ -556,6 +556,9 @@ Apart from these, the following properties are also 
available, and may be useful
 tr
   tdcodespark.default.parallelism/code/td
   td
+For distributed shuffle operations like codereduceByKey/code and 
codejoin/code, the
+largest number of partitions in parent RDD.  For operations like 
codeparallelize/code with
--- End diff --

Is this just the number of partitions in the parent RDD (why largest?) 
Doesn't the parentRDD have the a fixed number of partitions? Or is this a 
maximum across all parents...?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3107#issuecomment-62313723
  
Had some minor wording questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20060629
  
--- Diff: docs/configuration.md ---
@@ -556,6 +556,9 @@ Apart from these, the following properties are also 
available, and may be useful
 tr
   tdcodespark.default.parallelism/code/td
   td
+For distributed shuffle operations like codereduceByKey/code and 
codejoin/code, the
+largest number of partitions in parent RDD.  For operations like 
codeparallelize/code with
--- End diff --

I was worried the number of partitions of the largest parent RDD could be 
construed as the number of partitions in the parent RDD containing the most 
data.

Do you think the largest number of partitions in _a_ parent RDD or the 
largest number of partitions in one of the operation's input RDDs would be 
more clear?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20060674
  
--- Diff: docs/configuration.md ---
@@ -563,8 +566,8 @@ Apart from these, the following properties are also 
available, and may be useful
 /ul
   /td
   td
-Default number of tasks to use across the cluster for distributed 
shuffle operations
-(codegroupByKey/code, codereduceByKey/code, etc) when not set 
by user.
+Default number of output partitions for operations like 
codejoin/code,
--- End diff --

My thinking was that Spark's APIs have no mention of the concept of a 
shuffle partition (e.g. the term is referenced nowhere on 
https://spark.apache.org/docs/latest/programming-guide.html), but even novice 
Spark users are meant to understand that every transformation has input and 
output RDDs and that every RDD has a number of partitions.

Maybe Default number of partitions for the RDDs produced by operations 
like ...?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3022#issuecomment-62316919
  
  [Test build #514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/514/consoleFull)
 for   PR 3022 at commit 
[`c15405c`](https://github.com/apache/spark/commit/c15405c78345e9a46549a398c6b59bed80274f9e).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class GaussianMixtureModel(val w: Array[Double], val mu: 
Array[Vector], val sigma: Array[Matrix]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4276 fix for two working thread

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/3141#issuecomment-62320554
  
I agress w/ TD, I don't think this change is necessary.  I think we should 
close this and, @svar29 , maybe you can discuss the problem you are running 
into on the spark-user mailing list, hopefully we can help you out there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4260] Httpbroadcast should set connecti...

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/3122#issuecomment-62320949
  
This looks good, but could also explain what necessitates this change?  Did 
you observe some error?  If nothing else, just putting the error you observed 
in the JIRA would help somebody else find this patch if they run into the error 
as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Change the initial iteration num of ruleExecut...

2014-11-09 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3174#issuecomment-62321314
  
-1 This breaks the logic of the loop. For example if maxIterations is 1, 
now it will execute twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4295][External]Fix exception in SparkSi...

2014-11-09 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3177#issuecomment-62321381
  
Can you clarify how the tests pass if an exception is thrown? does that 
also need a fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3936] Add aggregateMessages, which supe...

2014-11-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/3100#discussion_r20062181
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/TripletFields.scala 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.graphx
+
+/**
+ * Represents a subset of the fields of an [[EdgeTriplet]] or 
[[EdgeContext]]. This allows the
+ * system to populate only those fields for efficiency.
+ */
+class TripletFields private (
+val useSrc: Boolean,
+val useDst: Boolean,
+val useEdge: Boolean)
--- End diff --

maybe I'm just missing it, but it seems like `useEdge` is never used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1344 [DOCS] Scala API docs for top metho...

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/3168#issuecomment-62323279
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-971 [DOCS] Link to Confluence wiki from ...

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/3169#issuecomment-62323284
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62323346
  
  [Test build #23124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23124/consoleFull)
 for   PR 2906 at commit 
[`cfdf842`](https://github.com/apache/spark/commit/cfdf8429bf4afb3e7a6a329dd285fe48429aec46).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...

2014-11-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/3079#discussion_r20062337
  
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +117,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
   private var ordering = implicitly[Ordering[K]]
 
   // An array of upper bounds for the first (partitions - 1) partitions
-  private var rangeBounds: Array[K] = {
-if (partitions = 1) {
+  @volatile private var valRB: Array[K] = null
--- End diff --

`valRD` is a kinda confusing name.  I think the convention would be to name 
it `_rangeBounds`.   Eg.


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/FutureAction.scala#L111


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3936] Add aggregateMessages, which supe...

2014-11-09 Thread ankurdave

Github user ankurdave commented on a diff in the pull request:

https://github.com/apache/spark/pull/3100#discussion_r20062658
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/TripletFields.scala 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.graphx
+
+/**
+ * Represents a subset of the fields of an [[EdgeTriplet]] or 
[[EdgeContext]]. This allows the
+ * system to populate only those fields for efficiency.
+ */
+class TripletFields private (
+val useSrc: Boolean,
+val useDst: Boolean,
+val useEdge: Boolean)
--- End diff --

Yeah, we don't currently use it since it's cheap to access the edge 
attributes, but I think @jegonzal added it in case our internal representation 
changes and it becomes useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4000][Build] Uploads HiveCompatibilityS...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2993#issuecomment-62325053
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62325994
  
  [Test build #23124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23124/consoleFull)
 for   PR 2906 at commit 
[`cfdf842`](https://github.com/apache/spark/commit/cfdf8429bf4afb3e7a6a329dd285fe48429aec46).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaHierarchicalClustering `
  * `trait HierarchicalClusteringConf extends Serializable `
  * `class HierarchicalClustering(`
  * `class HierarchicalClusteringModel(object):`
  * `class HierarchicalClustering(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62325997
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23124/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20063433
  
--- Diff: docs/configuration.md ---
@@ -556,6 +556,9 @@ Apart from these, the following properties are also 
available, and may be useful
 tr
   tdcodespark.default.parallelism/code/td
   td
+For distributed shuffle operations like codereduceByKey/code and 
codejoin/code, the
+largest number of partitions in parent RDD.  For operations like 
codeparallelize/code with
--- End diff --

Yeah - if you just add in a parent RDD then that seems good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62328415
  
  [Test build #23125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23125/consoleFull)
 for   PR 2906 at commit 
[`b0b061e`](https://github.com/apache/spark/commit/b0b061edc4c2ad42deda00bb664534e1334b50e5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4230. Doc for spark.default.parallelism ...

2014-11-09 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3107#discussion_r20063448
  
--- Diff: docs/configuration.md ---
@@ -563,8 +566,8 @@ Apart from these, the following properties are also 
available, and may be useful
 /ul
   /td
   td
-Default number of tasks to use across the cluster for distributed 
shuffle operations
-(codegroupByKey/code, codereduceByKey/code, etc) when not set 
by user.
+Default number of output partitions for operations like 
codejoin/code,
--- End diff --

Ah I see - what about Default number of partitions in RDD's returned by 
join, reduceByKey...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4156 [MLLIB] EM algorithm for GMMs

2014-11-09 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/3022#discussion_r20063810
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GMMExpectationMaximization.scala
 ---
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import breeze.linalg.{DenseVector = BreezeVector, DenseMatrix = 
BreezeMatrix}
+import breeze.linalg.{Transpose, det, inv}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.mllib.linalg.{Matrices, Vector, Vectors}
+import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
+import org.apache.spark.SparkContext.DoubleAccumulatorParam
+
+/**
+ * Expectation-Maximization for multivariate Gaussian Mixture Models.
+ * 
+ */
+object GMMExpectationMaximization {
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stores as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   * @param delta change in log-likelihood at which convergence is 
considered achieved
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int, delta: Double): 
GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k)
+  .setMaxIterations(maxIterations)
+  .setDelta(delta)
+  .run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stores as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   * @param maxIterations the maximum number of iterations to perform
+   */
+  def train(data: RDD[Vector], k: Int, maxIterations: Int): 
GaussianMixtureModel = {
+new 
GMMExpectationMaximization().setK(k).setMaxIterations(maxIterations).run(data)
+  }
+  
+  /**
+   * Trains a GMM using the given parameters
+   * 
+   * @param data training points stores as RDD[Vector]
+   * @param k the number of Gaussians in the mixture
+   */
+  def train(data: RDD[Vector], k: Int): GaussianMixtureModel = {
+new GMMExpectationMaximization().setK(k).run(data)
+  }
+}
+
+/**
+ * This class performs multivariate Gaussian expectation maximization.  It 
will 
+ * maximize the log-likelihood for a mixture of k Gaussians, iterating 
until
+ * the log-likelihood changes by less than delta, or until it has reached
+ * the max number of iterations.  
+ */
+class GMMExpectationMaximization private (
+private var k: Int, 
+private var delta: Double, 
+private var maxIterations: Int) extends Serializable {
+  
+  // Type aliases for convenience
+  private type DenseDoubleVector = BreezeVector[Double]
+  private type DenseDoubleMatrix = BreezeMatrix[Double]
+  
+  // A default instance, 2 Gaussians, 100 iterations, 0.01 log-likelihood 
threshold
+  def this() = this(2, 0.01, 100)
+  
+  /** Set the number of Gaussians in the mixture model.  Default: 2 */
+  def setK(k: Int): this.type = {
+this.k = k
+this
+  }
+  
+  /** Set the maximum number of iterations to run. Default: 100 */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+  
+  /**
+   * Set the largest change in log-likelihood at which convergence is 
+   * considered to have occurred.
+   */
+  def setDelta(delta: Double): this.type = {
+this.delta = delta
+this
+  }
+  
+  /** Machine precision value used to ensure matrix conditioning */
+  private val eps = math.pow(2.0, -52)
+  
+  /** Perform expectation maximization */
+  def run(data: RDD[Vector]): GaussianMixtureModel = {
+val ctx = data.sparkContext
+
+// we will operate on the data as breeze data
+val breezeData = data.map{ u =

[GitHub] spark pull request: [SPARK-4017] show progress bar in console and ...

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/3029#issuecomment-62330615
  
this is awesome!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support cross building for Scala 2.11

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3159#issuecomment-62330823
  
  [Test build #23126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23126/consoleFull)
 for   PR 3159 at commit 
[`542adea`](https://github.com/apache/spark/commit/542adeaf216cbfd5fbe2a99887e66224cc0f988d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1344 [DOCS] Scala API docs for top metho...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3168#issuecomment-62330950
  
Thanks I pulled this in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...

2014-11-09 Thread squito

Github user squito commented on the pull request:

https://github.com/apache/spark/pull/2933#issuecomment-62330965
  
I agree with @pwendell .  It seems like the right thing to do is just fix 
Broadcast  ... and if we can't, then wouldn't you also want to turn off 
Broadcast even for big closures?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1957] [WIP] Pluggable Diskstore for Blo...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/907#issuecomment-62331200
  
Thit is useful as a prototype, I'd prefer to close this issue as an active 
review. We can use this as a starting point if we revisit the internal 
interfaces here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Updates to shell globbing in run-example and s...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/449#issuecomment-62331162
  
This is stale so let's close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1972: Added support for tracking custom ...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/918#issuecomment-62331367
  
I'd like to close this issue for now and keep the JIRA around. This is a 
completely reasonable way to accomplish adding custom metrics, but this 
overlaps a good amount with Accumulators and their display in the UI - which I 
think is our longer term API for doing things like this. Anyways let's keep 
this patch and the JIRA around and we can consider it in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2165] spark on yarn: add support for se...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1279#issuecomment-62331405
  
Okay let's close this issue for now and he can reopen it if he has time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-971 [DOCS] Link to Confluence wiki from ...

2014-11-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3169


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1344 [DOCS] Scala API docs for top metho...

2014-11-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3051] Support looking-up named accumula...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2438#issuecomment-62332004
  
Hey @nfergu - I was looking for any older PR's that have fallen through the 
cracks and came across this. This is a very well written patch - kudos! When I 
suggested this registry concept initially, I was actually envisioning this 
happening in user-space rather than in Spark itself. I think automatically 
broadcasting all named accumulators is not going to work because some 
applications create thousands of accumulators (e.g. streaming applications), 
and it could end up with an unexpected performance regression.

For some applications this might be acceptable though. How hard would it be 
for a user-space library to implement this rather than having it be inside of 
Spark proper?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2411#issuecomment-62332025
  
Let's close this issue for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

2014-11-09 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/2411


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2671] BlockObjectWriter should create p...

2014-11-09 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/1580


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3171] Don't print meaningless informati...

2014-11-09 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/2078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3551] Remove redundant putting FetchRes...

2014-11-09 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/2413


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3106] Fix the race condition issue abou...

2014-11-09 Thread sarutak

Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/2019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/791#issuecomment-62332299
  
This has mostly gone stale so I'd suggest we close this issue and revisit 
this later. This is a decent idea, but it does complicate things a good amount, 
and this particular piece of code IMO is already quite complicated. As with any 
performance change, it would be useful to quantify the performance problems 
observed as a result of this issue. For instance, has it been observed as a 
bottleneck in real clusters? Putting information of this type on the JIRA would 
be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/283#issuecomment-62332367
  
I'd suggest we close this issue for now and go to the JIRA to discuss 
whether the feature is needed and how high of a priority it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-11-09 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1465#issuecomment-62332453
  
I'm going to close this issue as wontfix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...

2014-11-09 Thread liyezhang556520

Github user liyezhang556520 commented on the pull request:

https://github.com/apache/spark/pull/791#issuecomment-62332899
  
@pwendell , I updated a [design 
doc](https://issues.apache.org/jira/secure/attachment/12679822/Spark-3000%20Design%20Doc.pdf)
 for [SPARK-3000](https://issues.apache.org/jira/browse/SPARK-3000) several 
days ago which is also mainly to resolve the issue, There might have some 
performance problems in some case. You can have a look on 
[this](https://github.com/apache/spark/pull/2134).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62332987
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23125/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-62332985
  
  [Test build #23125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23125/consoleFull)
 for   PR 2906 at commit 
[`b0b061e`](https://github.com/apache/spark/commit/b0b061edc4c2ad42deda00bb664534e1334b50e5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaHierarchicalClustering `
  * `trait HierarchicalClusteringConf extends Serializable `
  * `class HierarchicalClustering(`
  * `class HierarchicalClusteringModel(object):`
  * `class HierarchicalClustering(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2014-11-09 Thread yu-iskw

Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-6214
  
@srowen and @rnowling , 
Sorry for my complicated commits.  I modified my source code. Could you 
review my PR?

- I modified what you pointed out.
- I added a function to cut a cluster tree of a trained hierarchical 
clustering model by a height of dendrogram.
- I rebased my PR with the latest master branch and then force-push my 
branch. Because there are a few conflicts with it.

Thanks,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4310][WebUI] Sort 'Submitted' column in...

2014-11-09 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/3179

[SPARK-4310][WebUI] Sort 'Submitted' column in Stage page by time



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-4310

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3179


commit fb03b354add078ae47db55b79282636fe74ea7dc
Author: zsxwing zsxw...@gmail.com
Date:   2014-11-10T02:30:06Z

Sort 'Submitted' column in Stage page by time




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 186 matches

Mail list logo