[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152155372
  
Looks great! I look forward to getting this merged. Once you address the 
comments I will do so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152155531
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152155500
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152155517
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152155505
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152173539
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9351#issuecomment-152173452
  
**[Test build #44587 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44587/consoleFull)**
 for PR 9351 at commit 
[`16c5b89`](https://github.com/apache/spark/commit/16c5b8914a49eb2a55e68fe5cf7022a5fcee34fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152173505
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9351#issuecomment-152173585
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44587/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152178361
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/9339#issuecomment-152178217
  
That's a known flaky pyspark test. Change LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread dragos
Github user dragos commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152180069
  
The serializer is delegating to the context class loader for instantiating 
classes it receives on the wire. When this class loader is missing (`null`), 
the JVM looks up the class in the *primordial* classloader, which usually 
contains only the JDK classes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11393][SQL] CoGroupedIterator should re...

2015-10-29 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/9346#issuecomment-152157478
  
Maybe "not idempotent" is not a proper word to describe this problem, 
`GroupedIterator` has a special 
[constraint](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala#L93-L95)
 which is diffrent from normal iterator, and `CoGroupedIterator` breaks this 
constraint at the condition described in PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11393][SQL] CoGroupedIterator should re...

2015-10-29 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/9346#issuecomment-152158690
  
btw as https://github.com/apache/spark/pull/9330 has been merge, the 
problem is not generating an extra empty group but making the last group empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43377401
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped.
+ *
+ * 1. For implementations to be loadable by [[SchedulerExtensionServices]],
+ * they must provide an empty constructor.
+ * 2. The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]].
+ *
+ * The attempt ID will be set if the service is started within a YARN 
application master;
+ * there is then a different attempt ID for every time that AM is 
restarted.
+ * When the service binding is instantiated on a client, there's no 
attempt ID, as it lacks
+ * this information.
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId YARN attemptID -if known.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` 
calls.
+   *
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+val sparkContext = binding.sparkContext
+val appId = binding.applicationId
+val attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+  s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+  .map { s =>
+s.split(",").map(_.trim()).filter(!_.isEmpty)
+  .map { sClass =>
+val instance = Utils.classForName(sClass)
+  

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152164064
  
**[Test build #44593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/consoleFull)**
 for PR 9182 at commit 
[`8a6a1f1`](https://github.com/apache/spark/commit/8a6a1f13235fd00dcc58c4106b0314098f961e67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9352#issuecomment-152169589
  
**[Test build #44586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44586/consoleFull)**
 for PR 9352 at commit 
[`4048c2d`](https://github.com/apache/spark/commit/4048c2dc5626e926a04774bffecaf7c6a6ac4cf7).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9352#issuecomment-152169728
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44586/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9351#issuecomment-152173584
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-152174392
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-152174382
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152176171
  
**[Test build #44596 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/consoleFull)**
 for PR 9287 at commit 
[`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class ExternalShuffleService(`\n  * `  case class BoundPortsResponse(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152176180
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152176182
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8872#issuecomment-152151363
  
@tnachen can you address the comments? I would like to get this merged. 
Also it's still failing style tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152151242
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11361][Streaming] Show scopes of RDD op...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9315#issuecomment-152151178
  
@tdas This looks good. I just think the code can be simplified a little.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152156815
  
**[Test build #44591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44591/consoleFull)**
 for PR 9282 at commit 
[`ec1c11b`](https://github.com/apache/spark/commit/ec1c11b6f599d950abb5c0496c2c85c5951f9fa7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-29 Thread selvinsource
Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/9057#issuecomment-152173191
  
@JasmineGeorge, it would be great if you can add a test for the validator 
to ensure the exported xml file can be loaded in JPMML and score the same 
results.

Please use my latest branch

https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class

I renamed the datasets' names to be generic so that we can use them for 
different algorithms for example iris can be used for both kmeans and 
multiclass logistic regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread jacek-lewandowski
Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152174907
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152177498
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43374435
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -17,17 +17,17 @@
 
 package org.apache.spark.scheduler.cluster
 
-import scala.collection.mutable.ArrayBuffer
-import scala.concurrent.{Future, ExecutionContext}
+import scala.concurrent.{ExecutionContext, Future}
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
 
-import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.rpc._
-import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.scheduler._
+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.ui.JettyUtils
-import org.apache.spark.util.{ThreadUtils, RpcUtils}
-
-import scala.util.control.NonFatal
+import org.apache.spark.util.{RpcUtils, ThreadUtils}
+import org.apache.spark.{Logging, SparkContext}
--- End diff --

I know what's up. It's sorting alphabetically within a group, and comes `{` 
after the alphabet, so child packages come first. I'll review these things by 
hand & will have to do the same through the other patches. Something to call 
out on the spark style guide maybe —it does cover the IDEA import patterns, 
but not this quirk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152153722
  
**[Test build #44589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/consoleFull)**
 for PR 9291 at commit 
[`8bcb3dc`](https://github.com/apache/spark/commit/8bcb3dc16dd07916ef829bceced46f1d436d1b10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9253#discussion_r43375184
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -706,6 +706,23 @@ abstract class RDD[T: ClassTag](
   }
 
   /**
+   * Spark's internal mapPartitions method which skips closure cleaning. 
To be used carefully
+   * only if we are sure that the RDD elements are serializable and don't 
require closure
+   * cleaning
+   *
+   * `preservesPartitioning` indicates whether the input function 
preserves the partitioner, which
--- End diff --

just use `@param` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152155075
  
Looks like this is a regression from 1.5.1 so we should definitely fix it. 
Even though this change is only one line it could change a lot of things. Can 
we verify that it doesn't cause any new regressions? @dragos can you explain to 
us the root cause of the issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152155205
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9552] Add force control for killExecuto...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7888#issuecomment-152151508
  
@vanzin can you have a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152152043
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152151993
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152152042
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152151986
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152180565
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152180570
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...

2015-10-29 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9355#discussion_r43390028
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
@@ -238,7 +238,7 @@ object YarnSparkHadoopUtil {
 if (Utils.isWindows) {
   escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID p")
 } else {
-  "-XX:OnOutOfMemoryError='kill %p'"
+  "-XX:OnOutOfMemoryError='echo OnOutOfMemoryError; kill %p'"
--- End diff --

Does this require `bash` to interpret, and do we know the JVM would execute 
the command in a  shell? if you're tested this and it works, OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152151854
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43373740
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped.
+ *
+ * 1. For implementations to be loadable by [[SchedulerExtensionServices]],
+ * they must provide an empty constructor.
+ * 2. The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]].
+ *
+ * The attempt ID will be set if the service is started within a YARN 
application master;
+ * there is then a different attempt ID for every time that AM is 
restarted.
+ * When the service binding is instantiated on a client, there's no 
attempt ID, as it lacks
+ * this information.
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId YARN attemptID -if known.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` 
calls.
+   *
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+val sparkContext = binding.sparkContext
+val appId = binding.applicationId
+val attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+  s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+  .map { s =>
+s.split(",").map(_.trim()).filter(!_.isEmpty)
+  .map { sClass =>
+val instance = Utils.classForName(sClass)
+  

[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152154833
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11383][Docs] Replaced example code in m...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9353#issuecomment-152157846
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152162385
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152162358
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9291#issuecomment-152180307
  
**[Test build #44589 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/consoleFull)**
 for PR 9291 at commit 
[`8bcb3dc`](https://github.com/apache/spark/commit/8bcb3dc16dd07916ef829bceced46f1d436d1b10).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152188739
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9253#discussion_r43375231
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -706,6 +706,23 @@ abstract class RDD[T: ClassTag](
   }
 
   /**
+   * Spark's internal mapPartitions method which skips closure cleaning. 
To be used carefully
+   * only if we are sure that the RDD elements are serializable and don't 
require closure
+   * cleaning
--- End diff --

can you add that this is mainly for performance improvements? Also you're 
missing a period at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9003#issuecomment-152175141
  
**[Test build #44595 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44595/consoleFull)**
 for PR 9003 at commit 
[`ff363cc`](https://github.com/apache/spark/commit/ff363cca57e2b1c2bb28e281d014d33b930fd603).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175275
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175308
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9354#issuecomment-152182642
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9354#issuecomment-152182613
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152187108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44591/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9282#issuecomment-152187102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152156215
  
**[Test build #44592 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44592/consoleFull)**
 for PR 9253 at commit 
[`6a9f738`](https://github.com/apache/spark/commit/6a9f738bb3008cadc7ce855fd33115fbb29d1c0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43378157
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -51,6 +51,41 @@ private[spark] abstract class YarnSchedulerBackend(
 
   private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf)
 
+  /** Application ID. Must be set by a subclass before starting the 
service */
+  private var appId: ApplicationId = null
+
+  /** Attempt ID. This is unset for client-mode schedulers */
+  private var attemptId: Option[ApplicationAttemptId] = None
+
+  /** Scheduler extension services */
+  private val services: SchedulerExtensionServices = new 
SchedulerExtensionServices()
+
+  /**
+* Bind to YARN. This *must* be done before calling [[start()]].
+*
+* @param appId YARN application ID
+* @param attemptId Optional YARN attempt ID
+*/
+  protected def bindToYarn(appId: ApplicationId, attemptId: 
Option[ApplicationAttemptId]): Unit = {
+this.appId = appId
+this.attemptId = attemptId
+  }
+
+  override def start() {
+require(appId != null, "application ID unset")
+val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId)
+services.start(binding)
--- End diff --

But do you need the parsed information? e.g. `ApplicationId` has a "cluster 
timestamp" and an id; I don't see much use in providing those separately to 
these services, the string id seems good enough in my view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175077
  
**[Test build #44594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/consoleFull)**
 for PR 9287 at commit 
[`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152177512
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152179766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152179768
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152179761
  
**[Test build #44597 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/consoleFull)**
 for PR 9340 at commit 
[`ab42465`](https://github.com/apache/spark/commit/ab42465d44393a869fef7a3d9f674f77f9155793).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`public class JavaAssociationRulesExample `\n  * `public class 
JavaPrefixSpanExample `\n  * `public class JavaSimpleFPGrowth `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152181074
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-29 Thread JasmineGeorge
Github user JasmineGeorge commented on the pull request:

https://github.com/apache/spark/pull/9057#issuecomment-152181045
  
Sorry I can't get to it until next Wednesday.. Can someone else take over


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11235] [network] Add ability to stream ...

2015-10-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/9206#discussion_r43389586
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.util;
+
+import com.google.common.base.Preconditions;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.CompositeByteBuf;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelInboundHandlerAdapter;
+
+/**
+ * A customized frame decoder that allows intercepting raw data.
+ * 
+ * This behaves like Netty's frame decoder (with harcoded parameters that 
match this library's
+ * needs), except it allows an interceptor to be installed to read data 
directly before it's
+ * framed.
+ * 
+ * Unlike Netty's frame decoder, each frame is dispatched to child 
handlers as soon as it's
+ * decoded, instead of building as many frames as the current buffer 
allows and dispatching
+ * all of them. This allows a child handler to install an interceptor if 
needed.
+ * 
+ * If an interceptor is installed, framing stops, and data is instead fed 
directly to the
+ * interceptor. When the interceptor indicates that it doesn't need to 
read any more data,
+ * framing resumes. Interceptors should not hold references to the data 
buffers provided
+ * to their handle() method.
+ */
+public class TransportFrameDecoder extends ChannelInboundHandlerAdapter {
+
+  public static final String HANDLER_NAME = "frameDecoder";
+  private static final int LENGTH_SIZE = 8;
+  private static final int MAX_FRAME_SIZE = Integer.MAX_VALUE;
+
+  private CompositeByteBuf buffer;
+  private volatile Interceptor interceptor;
+
+  @Override
+  public void channelRead(ChannelHandlerContext ctx, Object data) throws 
Exception {
+ByteBuf in = (ByteBuf) data;
+
+if (buffer == null) {
+  buffer = in.alloc().compositeBuffer();
+}
+
+buffer.writeBytes(in);
+
+while (buffer.isReadable()) {
+  feedInterceptor();
+  if (interceptor != null) {
+continue;
+  }
+
+  ByteBuf frame = decodeNext();
+  if (frame != null) {
+ctx.fireChannelRead(frame);
+  } else {
+break;
+  }
+}
+
+// We can't discard read sub-buffers if there are other references to 
the buffer (e.g.
+// through slices used for framing). This assumes that code that 
retains references
+// will call retain() from the thread that called "fireChannelRead()" 
above, otherwise
+// ref counting will go awry.
+if (buffer != null && buffer.refCnt() == 1) {
+  buffer.discardReadComponents();
+}
+  }
+
+  protected ByteBuf decodeNext() throws Exception {
+if (buffer.readableBytes() < LENGTH_SIZE) {
+  return null;
+}
+
+int frameLen = (int) buffer.readLong() - LENGTH_SIZE;
--- End diff --

doh, sorry I totally missed that, this is fine.  I guess I have just seen 
it the other way in some examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2750][WEB UI]Add Https support for Web ...

2015-10-29 Thread jacek-lewandowski
Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/5664#issuecomment-152194366
  
@WangTaoTheTonic can you rebase and squash?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152152727
  
**[Test build #44590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/consoleFull)**
 for PR 9258 at commit 
[`824be91`](https://github.com/apache/spark/commit/824be9104bfb81b260912dc86a0dba7508d1d3f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11383][Docs] Replaced example code in m...

2015-10-29 Thread rishabhbhardwaj
GitHub user rishabhbhardwaj opened a pull request:

https://github.com/apache/spark/pull/9353

[SPARK-11383][Docs] Replaced example code in mllib using include_example

I have made the required changes in 
mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them.
Kindle Review it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rishabhbhardwaj/spark SPARK-11383

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9353


commit d152cb5ac855eeeac962a4b547f6f96522fd1223
Author: Rishabh Bhardwaj 
Date:   2015-10-19T06:42:56Z

[ SPARK-11180 ] [ SQL ] DataFrame.na.fill does not support Boolean Type

commit a53a20d756cfd26ca37acf9dbbd0b4e034f430d8
Author: Rishabh Bhardwaj 
Date:   2015-10-20T09:50:28Z

Merge remote-tracking branch 'upstream/master'

commit 870cbb384db84ffcc128114b38b495095e424ace
Author: Rishabh Bhardwaj 
Date:   2015-10-26T09:58:48Z

Merge remote-tracking branch 'upstream/master'

commit a21b0ed6d86811e5eedff0e4634da010062d225b
Author: Rishabh Bhardwaj 
Date:   2015-10-29T08:57:54Z

Merge remote-tracking branch 'upstream/master'

commit f40fcc182bb82d7d12aeb98b080b7362bd75ee4e
Author: Rishabh Bhardwaj 
Date:   2015-10-29T11:54:55Z

[SPARK-11383][Docs] Replace example code in 
mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43376459
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -51,6 +51,41 @@ private[spark] abstract class YarnSchedulerBackend(
 
   private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf)
 
+  /** Application ID. Must be set by a subclass before starting the 
service */
+  private var appId: ApplicationId = null
+
+  /** Attempt ID. This is unset for client-mode schedulers */
+  private var attemptId: Option[ApplicationAttemptId] = None
+
+  /** Scheduler extension services */
+  private val services: SchedulerExtensionServices = new 
SchedulerExtensionServices()
+
+  /**
+* Bind to YARN. This *must* be done before calling [[start()]].
+*
+* @param appId YARN application ID
+* @param attemptId Optional YARN attempt ID
+*/
+  protected def bindToYarn(appId: ApplicationId, attemptId: 
Option[ApplicationAttemptId]): Unit = {
+this.appId = appId
+this.attemptId = attemptId
+  }
+
+  override def start() {
+require(appId != null, "application ID unset")
+val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId)
+services.start(binding)
--- End diff --

string parsing has proven fairly brittle in the past; the move from single 
to multiple attempts broke all apps trying to do it across versions (i.e. a 
hadoop 2.2 parser in a 2.5 cluster). Unless you want to base-64 encode the 
protobuf representation, I'd avoid that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152164469
  
**[Test build #44593 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/consoleFull)**
 for PR 9182 at commit 
[`8a6a1f1`](https://github.com/apache/spark/commit/8a6a1f13235fd00dcc58c4106b0314098f961e67).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152164476
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152164473
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9352#issuecomment-152169726
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152176928
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152178362
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...

2015-10-29 Thread jacek-lewandowski
GitHub user jacek-lewandowski opened a pull request:

https://github.com/apache/spark/pull/9354

SPARK-11402: Use ChildRunnerProvider to create ExecutorRunner and 
DriverRunner

Abstracted ExecutorRunner and DriverRunner. The current implementations 
were renamed to ExecutorRunnerImpl and DriverRunnerImpl respectively.
Added a way to provide a custom implemnetation of the runners by defining 
their factories.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacek-lewandowski/spark SPARK-11402

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9354.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9354


commit a2d2ec8a555d5a2adf20cb2cf29fc76b02e923a6
Author: Jacek Lewandowski 
Date:   2015-10-15T15:08:21Z

SPARK-11402: Use ChildRunnerProvider to create ExecutorRunner and 
DriverRunner

Abstracted ExecutorRunner and DriverRunner. The current implementations 
were renamed to ExecutorRunnerImpl and DriverRunnerImpl respectively.
Added a way to provide a custom implemnetation of the runners by defining 
their factories.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-152188763
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10958] Use json4s 3.3.0. Formats is now...

2015-10-29 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8992#issuecomment-152193338
  
Do you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152154426
  
@zsxwing I took a quick look and I have a high level question. Why not just 
do the checkpointing iterator? IIUC this approach involves reading the iterator 
back from disk to return the values. Wouldn't that be potentially expensive? 
Also, this doesn't fix it for local checkpointing.

If we have a general checkpointing iterator, then RDD doesn't have to 
change much and we don't need to introduce another `CheckpointManager`, which I 
find a little clunky.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9339#issuecomment-152170766
  
**[Test build #44588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44588/consoleFull)**
 for PR 9339 at commit 
[`7d80528`](https://github.com/apache/spark/commit/7d8052830cdf6456e4a8e3233c943bccf595dc9d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9339#issuecomment-152171116
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44588/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9339#issuecomment-152171108
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175508
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175595
  
**[Test build #44596 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/consoleFull)**
 for PR 9287 at commit 
[`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175502
  
**[Test build #44594 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/consoleFull)**
 for PR 9287 at commit 
[`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class ExternalShuffleService(`\n  * `  case class BoundPortsResponse(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9287#issuecomment-152175511
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9258#issuecomment-152178180
  
**[Test build #44590 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/consoleFull)**
 for PR 9258 at commit 
[`824be91`](https://github.com/apache/spark/commit/824be9104bfb81b260912dc86a0dba7508d1d3f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152179279
  
**[Test build #44597 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/consoleFull)**
 for PR 9340 at commit 
[`ab42465`](https://github.com/apache/spark/commit/ab42465d44393a869fef7a3d9f674f77f9155793).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9340#issuecomment-152181139
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.

2015-10-29 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9339#issuecomment-152192964
  
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...

2015-10-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9253#issuecomment-152196314
  
**[Test build #44592 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44592/consoleFull)**
 for PR 9253 at commit 
[`6a9f738`](https://github.com/apache/spark/commit/6a9f738bb3008cadc7ce855fd33115fbb29d1c0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11348 Replace addOnCompleteCallback with...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9356#issuecomment-152201206
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11348 Replace addOnCompleteCallback with...

2015-10-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9356#issuecomment-152201243
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...

2015-10-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8587#discussion_r43395879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
 ---
@@ -524,6 +525,133 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate {
   override val evaluateExpression = Cast(currentSum, resultType)
 }
 
+/**
+ * Compute Pearson correlation between two expressions.
+ * When applied on empty data (i.e., count is zero), it returns NaN.
+ *
+ * Definition of Pearson correlation can be found at
+ * 
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
+ *
+ * @param left one of the expressions to compute correlation with.
+ * @param right another expression to compute correlation with.
+ */
+case class Corr(
+left: Expression,
+right: Expression,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends ImperativeAggregate {
+
+  def children: Seq[Expression] = Seq(left, right)
+
+  def nullable: Boolean = false
+
+  def dataType: DataType = DoubleType
+
+  def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
+
+  def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
+
+  def inputAggBufferAttributes: Seq[AttributeReference] = 
aggBufferAttributes.map(_.newInstance())
+
+  val aggBufferAttributes: Seq[AttributeReference] = Seq(
+AttributeReference("xAvg", DoubleType)(),
+AttributeReference("yAvg", DoubleType)(),
+AttributeReference("Ck", DoubleType)(),
+AttributeReference("MkX", DoubleType)(),
+AttributeReference("MkY", DoubleType)(),
+AttributeReference("count", LongType)())
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): ImperativeAggregate =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
ImperativeAggregate =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  override def initialize(buffer: MutableRow): Unit = {
+(0 until 5).map(idx => buffer.setDouble(mutableAggBufferOffset + idx, 
0.0))
+buffer.setLong(mutableAggBufferOffset + 5, 0L)
+  }
+
+  override def update(buffer: MutableRow, input: InternalRow): Unit = {
+val x = left.eval(input).asInstanceOf[Double]
+val y = right.eval(input).asInstanceOf[Double]
+
+var xAvg = buffer.getDouble(mutableAggBufferOffset)
+var yAvg = buffer.getDouble(mutableAggBufferOffset + 1)
+var Ck = buffer.getDouble(mutableAggBufferOffset + 2)
+var MkX = buffer.getDouble(mutableAggBufferOffset + 3)
+var MkY = buffer.getDouble(mutableAggBufferOffset + 4)
+var count = buffer.getLong(mutableAggBufferOffset + 5)
+
+val deltaX = x - xAvg
+val deltaY = y - yAvg
+count += 1
+xAvg += deltaX / count
+yAvg += deltaY / count
+Ck += deltaX * (y - yAvg)
+MkX += deltaX * (x - xAvg)
+MkY += deltaY * (y - yAvg)
+
+buffer.setDouble(mutableAggBufferOffset, xAvg)
+buffer.setDouble(mutableAggBufferOffset + 1, yAvg)
+buffer.setDouble(mutableAggBufferOffset + 2, Ck)
+buffer.setDouble(mutableAggBufferOffset + 3, MkX)
+buffer.setDouble(mutableAggBufferOffset + 4, MkY)
+buffer.setLong(mutableAggBufferOffset + 5, count)
+  }
+
+  // Merge counters from other partitions. Formula can be found at:
+  // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+  override def merge(buffer1: MutableRow, buffer2: InternalRow): Unit = {
+val count2 = buffer2.getLong(inputAggBufferOffset + 5)
+
+if (count2 > 0) {
--- End diff --

We only need to consider count in buffer2. I will add document for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...

2015-10-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8587#discussion_r43395797
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
 ---
@@ -556,6 +556,33 @@ abstract class AggregationQuerySuite extends QueryTest 
with SQLTestUtils with Te
 Row(0, null, 1, 1, null, 0) :: Nil)
   }
 
+  test("pearson correlation") {
+val df = Seq.tabulate(10)(i => (1.0 * i, 2.0 * i, i * -1.0)).toDF("a", 
"b", "c")
+val corr1 = df.repartition(2).groupBy().agg(corr("a", 
"b")).collect()(0).getDouble(0)
+assert(math.abs(corr1 - 1.0) < 1e-12)
+val corr2 = df.groupBy().agg(corr("a", "c")).collect()(0).getDouble(0)
+assert(math.abs(corr2 + 1.0) < 1e-12)
+// non-trivial example. To reproduce in python, use:
+// >>> from scipy.stats import pearsonr
+// >>> import numpy as np
+// >>> a = np.array(range(20))
+// >>> b = np.array([x * x - 2 * x + 3.5 for x in range(20)])
+// >>> pearsonr(a, b)
+// (0.95723391394758572, 3.8902121417802199e-11)
+// In R, use:
+// > a <- 0:19
+// > b <- mapply(function(x) x * x - 2 * x + 3.5, a)
+// > cor(a, b)
+// [1] 0.957233913947585835
+val df2 = Seq.tabulate(20)(x => (1.0 * x, x * x - 2 * x + 
3.5)).toDF("a", "b")
+val corr3 = df2.groupBy().agg(corr("a", "b")).collect()(0).getDouble(0)
+assert(math.abs(corr3 - 0.95723391394758572) < 1e-12)
+
+val df3 = Seq.tabulate(0)(i => (1.0 * i, 2.0 * i)).toDF("a", "b")
+val corr4 = df3.groupBy().agg(corr("a", "b")).collect()(0).getDouble(0)
+assert(corr4.isNaN)
+  }
--- End diff --

I will add ImplicitCastInputTypes to case class Corr. So the other 
NumericType can be automatically casting to double.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >