date:20150323

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85121921
  
  [Test build #29007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29007/consoleFull)
 for   PR 5093 at commit 
[`b98f78c`](https://github.com/apache/spark/commit/b98f78c8f652b27e66d3fe554b9b972927017658).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85121933
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29007/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6471][SQL]: Metastore schema should onl...

2015-03-23 Thread saucam

GitHub user saucam opened a pull request:

https://github.com/apache/spark/pull/5141

[SPARK-6471][SQL]: Metastore schema should only be a subset of parquet 
schema to support dropping of columns using replace columns

Currently in the parquet relation 2 implementation, error is thrown in case 
merged schema is not exactly the same as metastore schema. 
But to support cases like deletion of column using replace column command, 
we can relax the restriction so that even if metastore schema is a subset of 
merged parquet schema, the query will work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/saucam/spark replace_col

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5141


commit 5f2f4674084b4f6202c0eb884b798f0980659b4b
Author: Yash Datta yash.da...@guavus.com
Date:   2015-03-23T17:35:45Z

SPARK-6471: Metastore schema should only be a subset of parquet schema to 
support dropping of columns using replace columns




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5124#discussion_r26961357
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -805,7 +806,7 @@ class DAGScheduler(
 }
 
 val properties = if (jobIdToActiveJob.contains(jobId)) {
-  jobIdToActiveJob(stage.jobId).properties
+  jobIdToActiveJob(stage.jobId).properties.orNull
--- End diff --

I don't know if there's a good reason for this, but I don't think we can 
change it at this point without breaking binary compatibility.  We could use 
annotations / comments to make those fields' nullability more apparent, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6463] [SQL]AttributeSet.equal should co...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5133#issuecomment-85127620
  
  [Test build #29010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29010/consoleFull)
 for   PR 5133 at commit 
[`035ea67`](https://github.com/apache/spark/commit/035ea6726353cd14455fc2552fd8262cf3bffcf8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85127615
  
  [Test build #29011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29011/consoleFull)
 for   PR 4435 at commit 
[`99764e1`](https://github.com/apache/spark/commit/99764e1afc48608ad6f0a81778a6f03e1ca7a4f1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5124#issuecomment-85132931
  
(Sorry, that should have been `SparkContext.localProperties.initialValue` 
above; I've revised my comment)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85142892
  
  [Test build #29018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29018/consoleFull)
 for   PR 5142 at commit 
[`e661a8f`](https://github.com/apache/spark/commit/e661a8f3b146eef23aa668b2c321fecdc8fc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26960440
  
--- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala ---
@@ -0,0 +1,526 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException}
+
+import scala.collection.mutable
+import scala.concurrent.Await
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.scalatest.{BeforeAndAfterAll, FunSuite}
+import org.scalatest.concurrent.Eventually._
+
+import org.apache.spark.{SparkException, SparkConf}
+
+/**
+ * Common tests for an RpcEnv implementation.
+ */
+abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll {
+
+  var env: RpcEnv = _
+
+  override def beforeAll(): Unit = {
+val conf = new SparkConf()
+env = createRpcEnv(conf, local, 12345)
+  }
+
+  override def afterAll(): Unit = {
+if(env != null) {
+  env.shutdown()
+}
+  }
+
+  def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv
+
+  test(send a message locally) {
+@volatile var message: String = null
+val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint 
{
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+rpcEndpointRef.send(hello)
+eventually(timeout(5 seconds), interval(10 millis)) {
+  assert(hello === message)
+}
+  }
+
+  test(send a message remotely) {
+@volatile var message: String = null
+// Set up a RpcEndpoint using env
+env.setupEndpoint(send-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
send-remotely)
+try {
+  rpcEndpointRef.send(hello)
+  eventually(timeout(5 seconds), interval(10 millis)) {
+assert(hello === message)
+  }
+} finally {
+  anotherEnv.shutdown()
+  anotherEnv.awaitTermination()
+}
+  }
+
+  test(send a RpcEndpointRef) {
+val endpoint = new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case Hello = context.reply(self)
+case Echo = context.reply(Echo)
+  }
+}
+val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint)
+
+val newRpcEndpointRef = 
rpcEndpointRef.askWithReply[RpcEndpointRef](Hello)
+val reply = newRpcEndpointRef.askWithReply[String](Echo)
+assert(Echo === reply)
+  }
+
+  test(ask a message locally) {
+val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+val reply = rpcEndpointRef.askWithReply[String](hello)
+assert(hello === reply)
+  }
+
+  test(ask a message remotely) {
+env.setupEndpoint(ask-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
ask-remotely)

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85123710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29002/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85123697
  
  [Test build #29002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29002/consoleFull)
 for   PR 4435 at commit 
[`1f361c8`](https://github.com/apache/spark/commit/1f361c88d6170a2aae01257bacbc4eebc159202e).
 * This patch **fails Spark unit tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AllJobsResource(uiRoot: UIRoot) `
  * `class AllRDDResource(uiRoot: UIRoot) `
  * `class AllStagesResource(uiRoot: UIRoot) `
  * `class ApplicationListResource(uiRoot: UIRoot) `
  * `class CustomObjectMapper extends ContextResolver[ObjectMapper]`
  * `class SparkEnumSerializer extends JsonSerializer[SparkEnum] `
  * `class ExecutorListResource(uiRoot: UIRoot) `
  * `class JsonRootResource extends UIRootFromServletContext `
  * `trait UIRootFromServletContext `
  * `class NotFoundException(msg: String) extends WebApplicationException(`
  * `class OneApplicationResource(uiRoot: UIRoot) `
  * `class OneJobResource(uiRoot: UIRoot) `
  * `class OneRDDResource(uiRoot: UIRoot) `
  * `class OneStageAttemptResource(uiRoot: UIRoot) `
  * `class OneStageResource(uiRoot: UIRoot) `
  * `class SecurityFilter extends ContainerRequestFilter with 
UIRootFromServletContext `
  * `class ApplicationInfo(`
  * `class ExecutorStageSummary(`
  * `class ExecutorSummary(`
  * `class JobData(`
  * `class RDDStorageInfo(`
  * `class RDDDataDistribution(`
  * `class RDDPartitionInfo(`
  * `class StageData(`
  * `class TaskData(`
  * `class TaskMetrics(`
  * `class InputMetrics(`
  * `class OutputMetrics(`
  * `class ShuffleReadMetrics(`
  * `class ShuffleWriteMetrics(`
  * `class AccumulableInfo (`
  * `throw new SparkException(It appears you are using SparkEnum 
in a class which does not  +`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5124#issuecomment-85124958
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5124#issuecomment-85125446
  
  [Test build #29009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29009/consoleFull)
 for   PR 5124 at commit 
[`687434c`](https://github.com/apache/spark/commit/687434c9ab65601dde095d3cf6bb2f0de2ea90e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85129025
  
  [Test build #29013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29013/consoleFull)
 for   PR 4435 at commit 
[`51eaedb`](https://github.com/apache/spark/commit/51eaedbc864dc41aa5d803b8f3c19cc40bb3040e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85131070
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29014/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85131065
  
  [Test build #29014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29014/consoleFull)
 for   PR 5093 at commit 
[`f2abc8c`](https://github.com/apache/spark/commit/f2abc8c49490970f7b0bd5829a0696655beb4c09).
 * This patch **passes all tests**.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85131060
  
  [Test build #29014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29014/consoleFull)
 for   PR 5093 at commit 
[`f2abc8c`](https://github.com/apache/spark/commit/f2abc8c49490970f7b0bd5829a0696655beb4c09).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5014#issuecomment-85131959
  
  [Test build #28999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28999/consoleFull)
 for   PR 5014 at commit 
[`5b611cb`](https://github.com/apache/spark/commit/5b611cb5b3cbdcd39ce08c15ead83921866d1c5d).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2015-03-23 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-85132747
  
@sarutak it sounds like the plan is to significantly change the 
implementation; if that's the case, then yes, closing this PR and opening a new 
one when the new functionality is ready is the right strategy.

FYI: there's been some effort towards implementing a much more restricted 
version of this that uses D3: https://issues.apache.org/jira/browse/SPARK-6418.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85135520
  
  [Test build #29016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29016/consoleFull)
 for   PR 5142 at commit 
[`6a61364`](https://github.com/apache/spark/commit/6a6136424ab2805148e141471fb2e22d37223d05).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4688#discussion_r26965565
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java 
---
@@ -57,6 +57,8 @@
   protected final String REPOSITORIES = --repositories;
   protected final String STATUS = --status;
   protected final String TOTAL_EXECUTOR_CORES = --total-executor-cores;
+  protected final String PRINCIPAL = --principal;
--- End diff --

nit: should probably be moved below with other YARN-only options. Also, 
sorting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

2015-03-23 Thread brennonyork

GitHub user brennonyork opened a pull request:

https://github.com/apache/spark/pull/5142

[SPARK-4086][GraphX]: Fold-style aggregation for VertexRDD

Adds five new methods into the `VertexRDD` suite to allow for fold-style 
calling conventions. Those methods are:

* `leftZipJoinWithFold`
* `leftJoinWithFold`
* `innerZipJoinWithFold`
* `innerJoinWithFold`
* `aggregateUsingIndexWithFold`

Each of above has a set of tests within the `VertexRDDSuite` to ensure 
proper functionality.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brennonyork/spark SPARK-4086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5142


commit c2ef961e1168bb2de57bc4e12d118d9d5883345b
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-18T21:32:48Z

added leftJoin*WithFold

commit 639046c6fcc0f9f82f5b1fa6fc4092efa2a6ecff
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-19T22:57:51Z

added innerJoin with folds

commit 1229f9fa3ddcadc39a17d1af0146275208f4c34e
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-23T18:07:19Z

added aggregateUsingIndexWithFold

commit 98197e743cf71d954fd45a456a88a7ae2ff47888
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-23T18:10:46Z

updated test to better demonstrate the aggregate fold-style values 
correctly being passed in

commit 6a6136424ab2805148e141471fb2e22d37223d05
Author: Brennon York brennon.y...@capitalone.com
Date:   2015-03-23T18:25:03Z

added proper docstrings




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...

2015-03-23 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5139#issuecomment-85141856
  
@aarondav if you have time, I'd appreciate your input here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5124#issuecomment-85132588

It looks like this NPE bug has been around for a while, but it seems pretty
hard to hit (which is probably why it hasn't been reported before). I think
that we should be able to trigger / reproduce this by creating a new
SparkContext, ensuring that the thread-local properties are null, launching a
long-running job, then attempting to cancel all jobs in some non-existent job
group. Can we add a regression test for this? Shouldn't be too hard if my
hunch is right.

It looks like we don't directly expose the Properties object to users, so
if we wanted to we could go even further and convert all of the upstream
nullable `Properties` into `Options[Properties]` as well. If you look at the
call chain leading to this use of `properties`, it looks like it can only be
null if no local properties have ever been set the job submitting thread, its
parent thread, or any of its other ancestor threads. Therefore, maybe we can
just eliminate the whole null / Option stuff entirely by ensuring that the
thread-local has an `initialValue` instead of having it be `null` in some
circumstances and not others.

Therefore, here's my suggestion:

- Add a regression test and confirm that it reproduces the original bug.
- Override `SparkContext.initialValue` to return a new empty properties
object (since this is [how we lazily
initialize](https://github.com/hunglin/spark/blob/687434c9ab65601dde095d3cf6bb2f0de2ea90e1/core/src/main/scala/org/apache/spark/SparkContext.scala#L478)
the properties in the existing code. Update the other parts of SparkContext
that set this to account for this change.
- Add a few `assert(properties != null)` so that we catch errors up-front.
I'd add these checks at the entry points of the DAGScheduler, e.g. the
`private[spark]` `submitJob` methods that are called from SparkContext.

Your patch looks good overall, but if we can I think we should just fix the
underlying messiness.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

2015-03-23 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26966209
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 union(Seq(first) ++ rest)
 
   /** Get an RDD that has no partitions or elements. */
-  def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
+  def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this)
--- End diff --

shouldn't the return type here by `RDD[T]`, since `EmptyRDD` is 
`private[spark]` and just an implementation detail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4688#issuecomment-85142115
  
Looks OK to me. The code in `Client.scala` is getting pretty hard to 
follow, would probably benefit from some cleanup later on...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5140#issuecomment-85147160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29005/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...

2015-03-23 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4697#issuecomment-85147251
  
Thanks!  Merged to branch-1.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5140#issuecomment-85147094
  
  [Test build #29005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29005/consoleFull)
 for   PR 5140 at commit 
[`d739640`](https://github.com/apache/spark/commit/d739640308ca0884bf5cd678dbedf3cc85c3cec9).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26960403
  
--- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala ---
@@ -0,0 +1,526 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException}
+
+import scala.collection.mutable
+import scala.concurrent.Await
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.scalatest.{BeforeAndAfterAll, FunSuite}
+import org.scalatest.concurrent.Eventually._
+
+import org.apache.spark.{SparkException, SparkConf}
+
+/**
+ * Common tests for an RpcEnv implementation.
+ */
+abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll {
+
+  var env: RpcEnv = _
+
+  override def beforeAll(): Unit = {
+val conf = new SparkConf()
+env = createRpcEnv(conf, local, 12345)
+  }
+
+  override def afterAll(): Unit = {
+if(env != null) {
+  env.shutdown()
+}
+  }
+
+  def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv
+
+  test(send a message locally) {
+@volatile var message: String = null
+val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint 
{
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+rpcEndpointRef.send(hello)
+eventually(timeout(5 seconds), interval(10 millis)) {
+  assert(hello === message)
+}
+  }
+
+  test(send a message remotely) {
+@volatile var message: String = null
+// Set up a RpcEndpoint using env
+env.setupEndpoint(send-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
send-remotely)
+try {
+  rpcEndpointRef.send(hello)
+  eventually(timeout(5 seconds), interval(10 millis)) {
+assert(hello === message)
+  }
+} finally {
+  anotherEnv.shutdown()
+  anotherEnv.awaitTermination()
+}
+  }
+
+  test(send a RpcEndpointRef) {
+val endpoint = new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case Hello = context.reply(self)
+case Echo = context.reply(Echo)
+  }
+}
+val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint)
+
+val newRpcEndpointRef = 
rpcEndpointRef.askWithReply[RpcEndpointRef](Hello)
+val reply = newRpcEndpointRef.askWithReply[String](Echo)
+assert(Echo === reply)
+  }
+
+  test(ask a message locally) {
+val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+val reply = rpcEndpointRef.askWithReply[String](hello)
+assert(hello === reply)
+  }
+
+  test(ask a message remotely) {
+env.setupEndpoint(ask-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
ask-remotely)

[GitHub] spark pull request: [SPARK-6471][SQL]: Metastore schema should onl...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5141#issuecomment-85122746
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...

2015-03-23 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/4337#issuecomment-85122775
  
Sorry, I had no time until last weekend but now I have. I'll address that 
soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85123943
  
  [Test build #29008 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29008/consoleFull)
 for   PR 5093 at commit 
[`6912584`](https://github.com/apache/spark/commit/69125849f4ca32d17e1db6fa47f61c9b992a9a94).
 * This patch **passes all tests**.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85123947
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29008/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85123940
  
  [Test build #29008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29008/consoleFull)
 for   PR 5093 at commit 
[`6912584`](https://github.com/apache/spark/commit/69125849f4ca32d17e1db6fa47f61c9b992a9a94).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85124453
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29003/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85124440
  
  [Test build #29003 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29003/consoleFull)
 for   PR 4435 at commit 
[`a066055`](https://github.com/apache/spark/commit/a066055441f370598bdef7868ff3bd51b4f0136d).
 * This patch **fails Spark unit tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AllStagesResource(uiRoot: UIRoot) `
  * `class OneStageResource(uiRoot: UIRoot) `
  * `class ApplicationInfo(`
  * `class ExecutorStageSummary(`
  * `class ExecutorSummary(`
  * `class JobData(`
  * `class RDDStorageInfo(`
  * `class RDDDataDistribution(`
  * `class RDDPartitionInfo(`
  * `class StageData(`
  * `class TaskData(`
  * `class TaskMetrics(`
  * `class InputMetrics(`
  * `class OutputMetrics(`
  * `class ShuffleReadMetrics(`
  * `class ShuffleWriteMetrics(`
  * `class AccumulableInfo (`
  * `throw new SparkException(It appears you are using SparkEnum 
in a class which does not  +`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6463] [SQL]AttributeSet.equal should co...

2015-03-23 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5133#issuecomment-85126028
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4337#issuecomment-85134053
  
  [Test build #29015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29015/consoleFull)
 for   PR 4337 at commit 
[`16f109f`](https://github.com/apache/spark/commit/16f109f13a90d28c3d187f47cb2d0dcd5fc782bc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4688#discussion_r26965498
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java 
---
@@ -108,6 +110,8 @@
 { REPOSITORIES },
 { STATUS },
 { TOTAL_EXECUTOR_CORES },
+{ PRINCIPAL},
+{ KEYTAB}
--- End diff --

nit: can you add these in sorted order, and add a trailing `,` to the last 
one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

2015-03-23 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26966555
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 union(Seq(first) ++ rest)
 
   /** Get an RDD that has no partitions or elements. */
-  def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
+  def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this)
--- End diff --

it should - except then it broke binary compatibility :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85141005
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29011/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3533][Core][PySpark] Add saveAsTextFile...

2015-03-23 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/4895#issuecomment-85119879
  
@srowen [SPARK-3533](https://issues.apache.org/jira/browse/SPARK-3533) has 
a lot of votes and watchers, and there are a few linked questions on Stack 
Overflow from there, the most popular one being [this 
question](http://stackoverflow.com/q/23995040/877069), which has 12 upvotes ATM 
and close to 4,000 views in about a year, as well as several linked questions 
asking about the same thing.

From a user perspective, I can definitely say that this is a sought-after 
method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4588#discussion_r26960291
  
--- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala ---
@@ -0,0 +1,526 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rpc
+
+import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException}
+
+import scala.collection.mutable
+import scala.concurrent.Await
+import scala.concurrent.duration._
+import scala.language.postfixOps
+
+import org.scalatest.{BeforeAndAfterAll, FunSuite}
+import org.scalatest.concurrent.Eventually._
+
+import org.apache.spark.{SparkException, SparkConf}
+
+/**
+ * Common tests for an RpcEnv implementation.
+ */
+abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll {
+
+  var env: RpcEnv = _
+
+  override def beforeAll(): Unit = {
+val conf = new SparkConf()
+env = createRpcEnv(conf, local, 12345)
+  }
+
+  override def afterAll(): Unit = {
+if(env != null) {
+  env.shutdown()
+}
+  }
+
+  def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv
+
+  test(send a message locally) {
+@volatile var message: String = null
+val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint 
{
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+rpcEndpointRef.send(hello)
+eventually(timeout(5 seconds), interval(10 millis)) {
+  assert(hello === message)
+}
+  }
+
+  test(send a message remotely) {
+@volatile var message: String = null
+// Set up a RpcEndpoint using env
+env.setupEndpoint(send-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receive = {
+case msg: String = message = msg
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
send-remotely)
+try {
+  rpcEndpointRef.send(hello)
+  eventually(timeout(5 seconds), interval(10 millis)) {
+assert(hello === message)
+  }
+} finally {
+  anotherEnv.shutdown()
+  anotherEnv.awaitTermination()
+}
+  }
+
+  test(send a RpcEndpointRef) {
+val endpoint = new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case Hello = context.reply(self)
+case Echo = context.reply(Echo)
+  }
+}
+val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint)
+
+val newRpcEndpointRef = 
rpcEndpointRef.askWithReply[RpcEndpointRef](Hello)
+val reply = newRpcEndpointRef.askWithReply[String](Echo)
+assert(Echo === reply)
+  }
+
+  test(ask a message locally) {
+val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+val reply = rpcEndpointRef.askWithReply[String](hello)
+assert(hello === reply)
+  }
+
+  test(ask a message remotely) {
+env.setupEndpoint(ask-remotely, new RpcEndpoint {
+  override val rpcEnv = env
+
+  override def receiveAndReply(context: RpcCallContext) = {
+case msg: String = {
+  context.reply(msg)
+}
+  }
+})
+
+val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345)
+// Use anotherEnv to find out the RpcEndpointRef
+val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, 
ask-remotely)

[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4588#issuecomment-8519
  
Left mostly minor comments, otherwise looks good. We can iron out any kinks 
later.

There's just some odd code in the test suite, where you're calling `stop` 
in a shared RPC env variable. That looks a little suspicious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4859#issuecomment-85129177
  
  [Test build #29012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29012/consoleFull)
 for   PR 4859 at commit 
[`9f32724`](https://github.com/apache/spark/commit/9f327244eb3ca3ad3a483570ab82999869973150).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...

2015-03-23 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/4859#issuecomment-85131353
  
This LGTM, we'll merge this and later add a jdbc() version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2015-03-23 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-85131296
  
I apologize for my late reply. I had no time until last weekend.
Actually, I'm reconsidering what and how should be visualized and trying 
implementing.
I could show you a concrete implementation in a few weeks.
Should I close this PR for now and reopen later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5014#issuecomment-85131979
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28999/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...

2015-03-23 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/5037#issuecomment-85133430
  
BTW, I really think we should merge this soon for 1.3.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85135353
  
  [Test build #29017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29017/consoleFull)
 for   PR 5093 at commit 
[`56f74a8`](https://github.com/apache/spark/commit/56f74a8a1e7f4c808827ba1f0b09f1f3b40db028).
 * This patch **passes all tests**.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85135359
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29017/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5093#issuecomment-85135350
  
  [Test build #29017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29017/consoleFull)
 for   PR 5093 at commit 
[`56f74a8`](https://github.com/apache/spark/commit/56f74a8a1e7f4c808827ba1f0b09f1f3b40db028).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...

2015-03-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5014#discussion_r26965253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -142,7 +142,7 @@ case class CreateTableAsSelect[T](
 tableName: String,
 child: LogicalPlan,
 allowExisting: Boolean,
-desc: Option[T] = None) extends UnaryNode {
+desc: T) extends UnaryNode {
--- End diff --

Let's get rid of the type parameter and rename it to 
`CreateHiveTableAsSelect` (be a little bit more specific on what this one does).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85135767
  
  [Test build #29016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29016/consoleFull)
 for   PR 5142 at commit 
[`6a61364`](https://github.com/apache/spark/commit/6a6136424ab2805148e141471fb2e22d37223d05).
 * This patch **fails Scala style tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85135778
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29016/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5118#issuecomment-85136374
  
  [Test build #29001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29001/consoleFull)
 for   PR 5118 at commit 
[`6c8ffab`](https://github.com/apache/spark/commit/6c8ffab396d76e329100c9c33a609f1b993e1abb).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5118#issuecomment-85136390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29001/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4688#discussion_r26966346
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -540,6 +560,27 @@ private[spark] class Client(
 amContainer
   }
 
+  def setupCredentials(): Unit = {
+if (args.principal != null) {
+  Preconditions.checkNotNull(
--- End diff --

sorry for flip-flopping here. The methods in `Preconditions` work weirdly 
in Scala as you've noticed. Should probably use Scala's `require()` here (which 
throws IllegalArgumentException, which is also a little more correct).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2015-03-23 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-85139282
  
@kayousterhout The basic idea is not changed but I try to use vis.js 
instead of D3.js because vis.js is easy to build rich timeline view.

This is under development version of the new implementation.

[Timeline view for an application]
![2015-03-23 11 38 
50](https://cloud.githubusercontent.com/assets/4736016/6787702/da5b6a06-d151-11e4-89c5-d8d1ba68297f.png)

[Timeline view for a stage]
![2015-03-23 11 40 
09](https://cloud.githubusercontent.com/assets/4736016/6787735/f54c25e4-d151-11e4-8a7a-2f6d9b0325be.png)


Actually, I talked with Matei at New York last week and showed the 
implementation, then got some feedbacks.
One of the feedbacks is that each square which meaning each task should 
show it's  proportion of duration like you are suggesting at SPARK-6418.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

2015-03-23 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26966597
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 union(Seq(first) ++ rest)
 
   /** Get an RDD that has no partitions or elements. */
-  def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
+  def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this)
--- End diff --

BTW functions are like this are why we should always declare types 
explicitly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4435#issuecomment-85140966
  
  [Test build #29011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29011/consoleFull)
 for   PR 4435 at commit 
[`99764e1`](https://github.com/apache/spark/commit/99764e1afc48608ad6f0a81778a6f03e1ca7a4f1).
 * This patch **fails Spark unit tests**.

 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AllStagesResource(uiRoot: UIRoot) `
  * `class OneStageResource(uiRoot: UIRoot) `
  * `class ApplicationInfo(`
  * `class ExecutorStageSummary(`
  * `class ExecutorSummary(`
  * `class JobData(`
  * `class RDDStorageInfo(`
  * `class RDDDataDistribution(`
  * `class RDDPartitionInfo(`
  * `class StageData(`
  * `class TaskData(`
  * `class TaskMetrics(`
  * `class InputMetrics(`
  * `class OutputMetrics(`
  * `class ShuffleReadMetrics(`
  * `class ShuffleWriteMetrics(`
  * `class AccumulableInfo (`
  * `throw new SparkException(It appears you are using SparkEnum 
in a class which does not  +`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

2015-03-23 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26967311
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 union(Seq(first) ++ rest)
 
   /** Get an RDD that has no partitions or elements. */
-  def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
+  def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this)
--- End diff --

This is written up a bit more in 
https://issues.apache.org/jira/browse/SPARK-2331 which should be reopened for 
the new 2+ bucket in JIRA. This should be fixed when binary compatibility can 
be broken. Yes, big +1 to tightening up types like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-03-23 Thread tnachen

Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4027#discussion_r26987588
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -63,20 +63,25 @@ private[spark] class CoarseMesosSchedulerBackend(
   // Maximum number of cores to acquire (TODO: we'll need more flexible 
controls here)
   val maxCores = conf.get(spark.cores.max,  Int.MaxValue.toString).toInt
 
+  val maxExecutorsPerSlave = 
conf.getInt(spark.mesos.coarse.executors.max, 1)
+  val maxCpusPerExecutor = conf.getInt(spark.mesos.coarse.cores.max, 
Int.MaxValue)
--- End diff --

It's quite hard to differentiate between spark.cores.max since spark.cores 
itself is already very vague, where it's a configuration to set the total 
number of cores a Spark app can schedule.
spark.mesos.coarse.cores.max is the Maximum cores a coarse grained Spark 
executor can take up to, and the scheduler will schedule any cores between 1 to 
spark.mesos.coarse.cores.max. 

So calling it coresPerExecutor doesn't seem right as it's not a hard value 
that the scheduler tries to schedule.

How about spark.mesos.coarse.coresPerExecutor.max?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-03-23 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85232216
  
@hellertime how about just add the Apache license on the top of the 
Dockerfile?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...

2015-03-23 Thread calvinjia

Github user calvinjia commented on the pull request:

https://github.com/apache/spark/pull/4867#issuecomment-85234685
  
@JoshRosen Just from a quick glance at the output log, it seems to be a 
style issue (line  100 characters). I don't think this patch should have 
caused the issues, since the errors have been the same as the ones since build 
#1937.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26988014
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
+costly operation.
+
+ Background
+
+To understand what happens during the shuffle we can consider the example 
of the 
+[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all 
+values for a single key are combined into a tuple - the key and the result 
of executing a reduce 
+function against all values associated with that key. The challenge is 
that not all values for a 
+single key necessarily reside on the same partition, or even the same 
machine, but they must be 
+co-located to present a single array per key.
+
+In Spark, data is generally not distributed across partitions to be in the 
necessary place for a 
+specific operation. During computations, a single task will operate on a 
single partition - thus, to
+organize all the data for a single `reduceByKey` reduce task to execute, 
Spark needs to perform an 
+all-to-all operation. It must read from all partitions to find all the 
values for all keys, and then
+organize those such that all values for any key lie within the same 
partition - this is called the 
+**shuffle**.
+
+Although the set of elements in each partition of newly shuffled data will 
be deterministic, the 
+ordering of these elements is not. If one desires predictably ordered data 
following shuffle 
+operations, [`mapPartitions`](#MapPartLink) can be used to sort each 
partition or `sortBy` can be
+used to perform a global sort. A similar operation, 
+[`repartitionAndSortWithinPartitions`](#Repartition2Link`) coupled with 
`mapPartitions`, 
+may be used to enact a Hadoop style shuffle.
+
+Operations which can cause a shuffle include **repartition** operations 
like 
+[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), 
**'byKey** operations
+(except for counting) like [`groupByKey`](#GroupByLink) and 
[`reduceByKey`](#ReduceByLink), and 
+**join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink).
+
+ Performance Impact
+**Shuffle** is an expensive operation since it involves disk I/O, data 
serialization, and 
+network I/O. To organize data for the shuffle, Spark generates two sets of 
tasks - map tasks to 
+organize the data, and a set of reduce tasks to aggregate it. Internally, 
results from individual 
+map jobs are kept in memory until they can't fit. Then, these are sorted 
based on the target reduce 
+task and written to a single file. On the reduce side, tasks read the 
relevant sorted blocks.
+
+Certain shuffle operations can consume significant amounts of heap memory 
since they generate hash 
+tables in memory. Specifically, `reduceByKey` and `aggregateByKey` on the 
map-side and `'byKey` 
+operations on the reduce-side. When data does not fit in memory Spark will 
spill these tables to 
+disk, incurring the additional overhead of disk I/O and increased garbage 
collection. 
+
+Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files 
+are not cleaned up from Spark's temporary storage until Spark is stopped, 
which means that
+long-running Spark jobs may consume available disk space. This is done so 
the shuffle doesn't need 
+to be re-computed if the lineage is re-computed. The temporary storage 
directory is specified by the 
+`spark.local.dir` configuration parameter when configuring the Spark 
context.
+
+Shuffle behavior can be fine-tuned by adjusting a variety of configuration 
parameters. See the 
--- End diff --

fine-tuned - tuned.

Also, can we link to the relevant tuning section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85256109
  
  [Test build #29037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29037/consoleFull)
 for   PR 3074 at commit 
[`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85256117
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29037/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6325] [core,yarn] Do not change target ...

2015-03-23 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/5018#issuecomment-85262710
  
@vanzin @sryza Thanks for working on the fix. I was away for the past week 
and did not have the chance to review this before it went in. Regarding the 
code being overly complicated, the reason why the bookkeeping is done in each 
of the three places you pointed out is the following:
- We need to do it in `ExecutorAllocationManager` so we don't keep 
requesting beyond the configured maximum
- We need to do it in `CoarseGrainedExecutorBackend` because the user can 
bypass the dynamic scaling logic and explicitly request executors through 
`sc.requestTotalExecutors`.
- We need to do it in `YarnAllocator` to ensure we don't over-allocate 
containers, regardless of whether `sc.requestTotalExecutors` or the dynamic 
scaling logic is used.

My intent is also to simplify the code so as to minimize the possibility of 
this feature breaking again in a future release. If either of you have concrete 
suggestions on refactoring it for this purpose, I would love to hear them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...

2015-03-23 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/4944#issuecomment-85264973
  
Yeah it looks okay to me but I also would feel more comfortable if a second 
core committer took a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26995821
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala ---
@@ -169,8 +177,8 @@ private[deploy] class DriverRunner(
 runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
   }
 
-  def runCommandWithRetry(command: ProcessBuilderLike, initialize: Process 
= Unit,
-supervise: Boolean) {
+  def runCommandWithRetry(
+  command: ProcessBuilderLike, initialize: Process = Unit, supervise: 
Boolean) {
--- End diff --

Should this have an explicit `:Unit = `?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...

2015-03-23 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/5145#issuecomment-85267784
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6478] New RDD.pipeWithPartition method

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5147#issuecomment-85275593
  
I'm a little hesitant to want to add a new `withPartition` or 
`withSplit`-like method, since we've been deprecating those in favor of using 
things like TaskContext.  Can you address your use-case with `TaskContext.get` 
and `printPipeContext`?  For example, how about this:

```scala
myRDD.pipe(
   command=,
   printPipeContext = (p = p(PARTITION= + TaskContext.get.partitionId())
```

Is the problem that you want to be able to store the partition as part of 
the command or environment?  If so, then maybe we could generalize this so that 
the function is invoked with a TaskContext instead of a Partition (in other 
words, change it to pipeWithContext).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4259#issuecomment-85275422
  
  [Test build #29042 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29042/consoleFull)
 for   PR 4259 at commit 
[`66a4dc3`](https://github.com/apache/spark/commit/66a4dc31aa56ced603cd1172719dc4510fcdbaa1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...

2015-03-23 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/5074#discussion_r26987728
  
--- Diff: docs/programming-guide.md ---
@@ -1086,6 +1086,62 @@ for details.
 /tr
 /table
 
+### Shuffle operations
+
+Certain operations within Spark trigger an event known as the shuffle. The 
shuffle is Spark's 
+mechanism for re-distributing data so that is grouped differently across 
partitions. This typically 
+involves re-arranging and copying data across executors and machines, 
making shuffle a complex and 
+costly operation.
+
+ Background
+
+To understand what happens during the shuffle we can consider the example 
of the 
+[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all 
+values for a single key are combined into a tuple - the key and the result 
of executing a reduce 
+function against all values associated with that key. The challenge is 
that not all values for a 
+single key necessarily reside on the same partition, or even the same 
machine, but they must be 
+co-located to present a single array per key.
+
+In Spark, data is generally not distributed across partitions to be in the 
necessary place for a 
+specific operation. During computations, a single task will operate on a 
single partition - thus, to
+organize all the data for a single `reduceByKey` reduce task to execute, 
Spark needs to perform an 
+all-to-all operation. It must read from all partitions to find all the 
values for all keys, and then
+organize those such that all values for any key lie within the same 
partition - this is called the 
+**shuffle**.
+
+Although the set of elements in each partition of newly shuffled data will 
be deterministic, the 
+ordering of these elements is not. If one desires predictably ordered data 
following shuffle 
+operations, [`mapPartitions`](#MapPartLink) can be used to sort each 
partition or `sortBy` can be
+used to perform a global sort. A similar operation, 
+[`repartitionAndSortWithinPartitions`](#Repartition2Link`) coupled with 
`mapPartitions`, 
+may be used to enact a Hadoop style shuffle.
+
+Operations which can cause a shuffle include **repartition** operations 
like 
+[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), 
**'byKey** operations
+(except for counting) like [`groupByKey`](#GroupByLink) and 
[`reduceByKey`](#ReduceByLink), and 
+**join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink).
+
+ Performance Impact
+**Shuffle** is an expensive operation since it involves disk I/O, data 
serialization, and 
+network I/O. To organize data for the shuffle, Spark generates two sets of 
tasks - map tasks to 
+organize the data, and a set of reduce tasks to aggregate it. Internally, 
results from individual 
+map jobs are kept in memory until they can't fit. Then, these are sorted 
based on the target reduce 
--- End diff --

map jobs - map tasks

Slightly more precise to say that they're sorted based on the target 
partition (because multiple partitions could end up in the same task if 
`coalesce` is called).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-03-23 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85235696
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: New RDD.pipeWithPartition method

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5147#issuecomment-85235660
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85235864
  
  [Test build #29036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29036/consoleFull)
 for   PR 3074 at commit 
[`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...

2015-03-23 Thread calvinjia

Github user calvinjia commented on the pull request:

https://github.com/apache/spark/pull/4867#issuecomment-85237240
  
@srowen Oh I see the build still failed after the fix to that patch. 
Strange that there would be issues between now and the last test run for this 
patch since there should not have been any dependency changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5144#issuecomment-85243242
  
  [Test build #29038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29038/consoleFull)
 for   PR 5144 at commit 
[`df925b7`](https://github.com/apache/spark/commit/df925b780348e72e3a6f592590f2e868e74cf8a3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...

2015-03-23 Thread nishkamravi2

Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/5085#issuecomment-85244894
  
Thanks for the comments @vanzin. Will address them soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85247503
  
  [Test build #29033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29033/consoleFull)
 for   PR 5142 at commit 
[`c6744b8`](https://github.com/apache/spark/commit/c6744b82776263889c7a5eb7664835419834d28b).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5142#issuecomment-85247520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29033/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...

2015-03-23 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/5014#discussion_r26994637
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -142,7 +142,7 @@ case class CreateTableAsSelect[T](
 tableName: String,
 child: LogicalPlan,
 allowExisting: Boolean,
-desc: Option[T] = None) extends UnaryNode {
+desc: T) extends UnaryNode {
--- End diff --

The `CreateTableAsSelect` is designed as a common logical plan node, that's 
why I made the `desc` as `T`, and also the optional parameter. Otherwise, every 
SQL dialect will implements it's own `CTAS` node(logical plan).
Or is the `CreateTableUsingAsSelect` a more generic interface for the same 
purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...

2015-03-23 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5093#discussion_r26996263
  
--- Diff: dev/tests/pr_new_dependencies.sh ---
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#
+# This script follows the base format for testing pull requests against
+# another branch and returning results to be published. More details can be
+# found at dev/run-tests-jenkins.
+#
+# Arg1: The Github Pull Request Actual Commit
+#+ known as `ghprbActualCommit` in `run-tests-jenkins`
+# Arg2: The SHA1 hash
+#+ known as `sha1` in `run-tests-jenkins`
+#
+
+ghprbActualCommit=$1
+sha1=$2
+
+MVN_BIN=`pwd`/build/mvn
+CURR_CP_FILE=my-classpath.txt
+MASTER_CP_FILE=master-classpath.txt
+
+${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \
--- End diff --

If that's the case, we should probably gate this entire thing in a check as 
to whether any pom.xml files are modified. Then for most builds this will not 
add any time, since most builds do not modify dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4259#issuecomment-85275731
  
  [Test build #29042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29042/consoleFull)
 for   PR 4259 at commit 
[`66a4dc3`](https://github.com/apache/spark/commit/66a4dc31aa56ced603cd1172719dc4510fcdbaa1).
 * This patch **fails Scala style tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-23 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/4259#issuecomment-85275887
  
@jkbradley and @mengxr I just rebased it. Will do couple optimizations to 
avoid the scaling on the datasets which can be done in the optimization 
instead. You guys can start to give me feedback so we have ample time to 
address issues before 1.4. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4259#issuecomment-85275736
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29042/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4259#issuecomment-85278373
  
  [Test build #29043 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29043/consoleFull)
 for   PR 4259 at commit 
[`ea3e1dc`](https://github.com/apache/spark/commit/ea3e1dc55583d1fdd69c74a0201c5743a0baef2a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-03-23 Thread tnachen

Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4027#discussion_r26988123
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
 ---
@@ -204,35 +209,43 @@ private[spark] class CoarseMesosSchedulerBackend(
 
   for (offer - offers) {
 val slaveId = offer.getSlaveId.toString
-val mem = getResource(offer.getResourcesList, mem)
-val cpus = getResource(offer.getResourcesList, cpus).toInt
-if (totalCoresAcquired  maxCores 
-mem = MemoryUtils.calculateTotalMemory(sc) 
-cpus = 1 
+var totalMem = getResource(offer.getResourcesList, mem)
+var totalCpus = getResource(offer.getResourcesList, cpus).toInt
--- End diff --

I'm calling it remainingCores


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-85239748
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29034/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-85239717
  
  [Test build #29034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29034/consoleFull)
 for   PR 4027 at commit 
[`6d04da1`](https://github.com/apache/spark/commit/6d04da11e44d395416f208a20d250c17c672fcc9).
 * This patch **fails Spark unit tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6480 [CORE] histogram() bucket function ...

2015-03-23 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/5148

SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge 
cases

Fix fastBucketFunction for histogram() to handle edge conditions more 
correctly. Add a test, and fix existing one accordingly

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-6480

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5148


commit 23ec01e8276478f716ebd6307eb88d7d1581ef14
Author: Sean Owen so...@cloudera.com
Date:   2015-03-23T23:21:25Z

Fix fastBucketFunction for histogram() to handle edge conditions more 
correctly. Add a test, and fix existing one accordingly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5961][Streaming]Allow specific nodes in...

2015-03-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/5114#discussion_r26993189
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala
 ---
@@ -44,12 +44,14 @@ import org.jboss.netty.handler.codec.compression._
 
 private[streaming]
 class FlumeInputDStream[T: ClassTag](
-  @transient ssc_ : StreamingContext,
-  host: String,
-  port: Int,
-  storageLevel: StorageLevel,
-  enableDecompression: Boolean
-) extends ReceiverInputDStream[SparkFlumeEvent](ssc_) {
+@transient ssc_ : StreamingContext,
--- End diff --

What has changed in these lines? Why are they in the diff?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-85254916
  
  [Test build #29036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29036/consoleFull)
 for   PR 3074 at commit 
[`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404).
 * This patch **passes all tests**.

 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...

2015-03-23 Thread marsishandsome

Github user marsishandsome commented on the pull request:

https://github.com/apache/spark/pull/5095#issuecomment-85257353
  
@tgravescs You are right. Maybe we should provide two choices: Ip and 
Hostname. Both will be automatically figured out by Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5125#discussion_r26995786
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala ---
@@ -33,6 +33,8 @@ import org.apache.spark.deploy.master.DriverState
 import org.apache.spark.deploy.master.DriverState.DriverState
 import org.apache.spark.util.{Clock, SystemClock}
 
+import scala.collection.mutable
--- End diff --

Nit: this should be grouped with the other Scala imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...