date:20151028

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151808021
  
**[Test build #44521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/consoleFull)**
 for PR 9331 at commit 
[`90927e6`](https://github.com/apache/spark/commit/90927e6e4cd46a6752fe4cdd7d1214112d218278).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43246322
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/scheduler/cluster/SimpleExtensionService.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+
--- End diff --

got it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43246273
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend(
 
   private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf)
 
+  /** Application ID. Must be set by a subclass before starting the 
service */
+  private var appId: ApplicationId = null
+
+  /** Attempt ID. This is unset for client-side schedulers */
+  private var attemptId: Option[ApplicationAttemptId] = None
+
+  /** Scheduler extension services */
+  private val services: SchedulerExtensionServices = new 
SchedulerExtensionServices()
+
+  /**
+* Bind to YARN. This *must* be done before calling [[start()]].
+*
+* @param appId YARN application ID
+* @param attemptId Optional YARN attempt ID
+*/
+  protected def bindToYARN(appId: ApplicationId, attemptId: 
Option[ApplicationAttemptId]): Unit = {
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8744#issuecomment-151827220
  
**[Test build #44525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44525/consoleFull)**
 for PR 8744 at commit 
[`e89959c`](https://github.com/apache/spark/commit/e89959cb7592e92b4306e357dc200d259ede814d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...

2015-10-28 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9326#issuecomment-151837034
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11188][SQL][WIP] Elide stacktraces in b...

2015-10-28 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9194#issuecomment-151842731
  
As a side note, I think the target user of `bin/spark-sql` is probably less 
advanced and thus all the info logs probably aren't that useful.  It doesn't 
have to be in this PR, but I'd be generally supportive of having a different 
default log4j config that is used for this binary logs to the console at 
warning instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Header formatting fix

2015-10-28 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9312#issuecomment-151847282
  
Hm, OK. The thing is (unfortunately) a fair number of the copyright headers 
aren't strictly correctly formatted. Many start with javadoc-style opening for 
example. Functionally it makes no difference and the automatic copyright 
checker will deal with it all. So it doesn't really seem worth fixing this up 
everywhere. I can maybe see fixing this if making changes to the header of the 
file otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9227#issuecomment-151819613
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44517/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43246543
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private var sparkContext: SparkContext = _
+  private var appId: ApplicationId = _
+  private var attemptId: Option[ApplicationAttemptId] = _
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
--- End diff --

OK, saving binding as a field; converting the others to local vars.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-151827062
  
**[Test build #44526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44526/consoleFull)**
 for PR 5423 at commit 
[`2c1db93`](https://github.com/apache/spark/commit/2c1db93bb1fe72a03e4b866741b6b803b30bb2b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-151827066
  
**[Test build #44524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44524/consoleFull)**
 for PR 9182 at commit 
[`a4358d5`](https://github.com/apache/spark/commit/a4358d5b23dd2d7db706574124e4c69d1171ffb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9200#issuecomment-151832701
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9200#issuecomment-151832617
  
**[Test build #44520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/consoleFull)**
 for PR 9200 at commit 
[`b2dd6b8`](https://github.com/apache/spark/commit/b2dd6b87865eed5519d8ad278e09ba17c1334c6c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class BernoulliSampler[T: ClassTag](fraction: Double,`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9257#discussion_r43251769
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -365,33 +366,37 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf)
* we throw an exception.
*/
   private def generateSecretKey(): String = {
-if (!isAuthenticationEnabled) return null
-// first check to see if the secret is already set, else generate a 
new one if on yarn
-val sCookie = if (SparkHadoopUtil.get.isYarnMode) {
-  val secretKey = 
SparkHadoopUtil.get.getSecretKeyFromUserCredentials(sparkSecretLookupKey)
-  if (secretKey != null) {
-logDebug("in yarn mode, getting secret from credentials")
-return new Text(secretKey).toString
+if (!isAuthenticationEnabled) {
+  null
+} else if (SparkHadoopUtil.get.isYarnMode) {
+  // In YARN mode, the secure cookie will be created by the driver and 
stashed in the
+  // user's credentials, where executors can get it. The check for an 
array of size 0
+  // is because of the test code in YarnSparkHadoopUtilSuite.
+  val secretKey = 
SparkHadoopUtil.get.getSecretKeyFromUserCredentials(SECRET_LOOKUP_KEY)
+  if (secretKey == null || secretKey.length == 0) {
+val rnd = new SecureRandom()
+val length = 
sparkConf.getInt("spark.authenticate.secretBitLength", 256) / 8
+val secret = new Array[Byte](length)
+rnd.nextBytes(secret)
+
+val cookie = HashCodes.fromBytes(secret).toString()
+
SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, cookie)
+cookie
   } else {
-logDebug("getSecretKey: yarn mode, secret key from credentials is 
null")
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9326#issuecomment-151839311
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43252081
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
--- End diff --

When will this be not set? I assume in client mode? Could you mention that 
in the scaladoc above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9326#issuecomment-151841140
  
**[Test build #44527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44527/consoleFull)**
 for PR 9326 at commit 
[`402d8e4`](https://github.com/apache/spark/commit/402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151824901
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151824900
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...

2015-10-28 Thread gliptak

Github user gliptak commented on the pull request:

https://github.com/apache/spark/pull/9263#issuecomment-151833067
  
This last run had a different failure than the previous run with the same 
code ...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43253567
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+val sparkContext = binding.sparkContext
+val appId = binding.applicationId
+val attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+  s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+  .map { s =>
+  s.split(",").map(_.trim()).filter(!_.isEmpty)
+.map { sClass =>
+  val instance = Utils.classForName(sClass)
+.newInstance()
+.asInstanceOf[SchedulerExtensionService]
+  // bind this service
+  instance.start(binding)
+  logInfo(s"Service $sClass started")
+  instance
+}
+}.map(_.toList).getOrElse(Nil)
--- End diff --

minor: instead of another call to `map` you could add the `toList` call to 
the code inside the previous closure.


---
If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43254205
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend(
 
   private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf)
 
+  /** Application ID. Must be set by a subclass before starting the 
service */
+  private var appId: ApplicationId = null
+
+  /** Attempt ID. This is unset for client-side schedulers */
+  private var attemptId: Option[ApplicationAttemptId] = None
+
+  /** Scheduler extension services */
+  private val services: SchedulerExtensionServices = new 
SchedulerExtensionServices()
+
+  /**
+* Bind to YARN. This *must* be done before calling [[start()]].
+*
+* @param appId YARN application ID
+* @param attemptId Optional YARN attempt ID
+*/
+  protected def bindToYarn(appId: ApplicationId, attemptId: 
Option[ApplicationAttemptId]): Unit = {
+this.appId = appId
+this.attemptId = attemptId
+  }
+
+  override def start() {
+require(appId != null, "application ID unset")
+val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId)
+services.start(binding)
+super.start()
+  }
+
+  override def stop(): Unit = {
+super.stop()
--- End diff --

super minor, but maybe do a try..finally here just in case `super.stop()` 
throws?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43254282
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/scheduler/cluster/ExtensionServiceIntegrationSuite.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite}
+
+/**
+ * Test the integration with [[SchedulerExtensionServices]]
+ */
+class ExtensionServiceIntegrationSuite extends SparkFunSuite
+  with BeforeAndAfter
+  with Logging {
+
+  val applicationId = new StubApplicationId(0, L)
+  val attemptId = new StubApplicationAttemptId(applicationId, 1)
+  var sparkCtx: SparkContext = _
+
+  /*
+   * Setup phase creates the spark context
+   */
+  before {
+val sparkConf = new SparkConf()
+sparkConf.set(SchedulerExtensionServices.SPARK_YARN_SERVICES,
+  "org.apache.spark.scheduler.cluster.SimpleExtensionService")
--- End diff --

nit: `classOf[SimpleExtensionService].getName()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9330#issuecomment-151850104
  
**[Test build #44529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44529/consoleFull)**
 for PR 9330 at commit 
[`9282e48`](https://github.com/apache/spark/commit/9282e488d58859a473e1413a611719829846971a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151810105
  
**[Test build #44523 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/consoleFull)**
 for PR 9196 at commit 
[`b52a98d`](https://github.com/apache/spark/commit/b52a98d75b340e0f8d290deae528057bb5d28738).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43246410
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/scheduler/cluster/StubApplicationAttemptId.scala
 ---
@@ -0,0 +1,50 @@
+/*
--- End diff --

I'm using them more in the tests in the later patches. I can (and will) 
move them into the test helper, but be assured, there's a lot more tests to 
come.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9330#discussion_r43249575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala ---
@@ -96,46 +97,64 @@ class GroupedIterator private(
 ret
   }
 
-  def fetchNextGroupIterator(): Boolean = {
-if (currentRow != null || input.hasNext) {
-  val inputIterator = new Iterator[InternalRow] {
-// Return true if we have a row and it is in the current group, or 
if fetching a new row is
-// successful.
-def hasNext = {
-  (currentRow != null && keyOrdering.compare(currentGroup, 
currentRow) == 0) ||
-fetchNextRowInGroup()
-}
+  private def fetchNextGroupIterator(): Boolean = {
+assert(currentIterator eq null)
+
+if (currentRow.eq(null) && input.hasNext) {
+  currentRow = input.next()
+}
+
+if (currentRow eq null) {
+  // These is no data left, return false.
+  false
+} else {
+  // Skip to next group.
+  while (input.hasNext && keyOrdering.compare(currentGroup, 
currentRow) == 0) {
+currentRow = input.next()
+  }
+
+  if (keyOrdering.compare(currentGroup, currentRow) == 0) {
+// These is no more group. return false.
--- End diff --

nit: "there" or maybe more clearly "we are no longer in the current group, 
return false."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11313][SQL] implement cogroup on DataSe...

2015-10-28 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9324#issuecomment-151836209
  
Thanks!  Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r43251011
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
+}
+assertNestedHiveException(e)
+// expect exception trapping code to unwind this hive-side exception
+assertNestedHiveException(intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastore(hadoopConf)
+})
+  }
+
+  def assertNestedHiveException(e: InvocationTargetException): Throwable = 
{
+val inner = e.getCause
+if (inner == null) {
+  fail("No inner cause", e)
+}
+if (!inner.isInstanceOf[HiveException]) {
+  fail(s"Not a hive exception", inner)
+}
+inner
+  }
+
+  test("handleTokenIntrospectionFailure") {
+val util = new YarnSparkHadoopUtil
+// downgraded exceptions
+util.handleTokenIntrospectionFailure("hive", new 
ClassNotFoundException("cnfe"))
--- End diff --

I think that because there's really only one exception that's currently 
interesting, you need more code to implement this "shared policy" approach than 
just catching the one interesting exception in each call site. It's true that 
if you need to modify the policy you'd need you'd need to duplicate code (or 
switch to your current approach), but then do you envision needing to do that? 
What if the policy for each service needs to be different?

Personally I think that the current approach is a little confusing for 
someone reading the code (and inconsistent; for example the current code 
catches `Exception` and then feeds it to a method that matches on `Throwable`), 
and because the policy is so simple, the sharing argument doesn't justify 
making the code harder to follow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11313][SQL] implement cogroup on DataSe...

2015-10-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/9257#discussion_r43251680
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -365,33 +366,37 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf)
* we throw an exception.
*/
   private def generateSecretKey(): String = {
-if (!isAuthenticationEnabled) return null
-// first check to see if the secret is already set, else generate a 
new one if on yarn
-val sCookie = if (SparkHadoopUtil.get.isYarnMode) {
-  val secretKey = 
SparkHadoopUtil.get.getSecretKeyFromUserCredentials(sparkSecretLookupKey)
-  if (secretKey != null) {
-logDebug("in yarn mode, getting secret from credentials")
-return new Text(secretKey).toString
+if (!isAuthenticationEnabled) {
+  null
+} else if (SparkHadoopUtil.get.isYarnMode) {
+  // In YARN mode, the secure cookie will be created by the driver and 
stashed in the
+  // user's credentials, where executors can get it. The check for an 
array of size 0
+  // is because of the test code in YarnSparkHadoopUtilSuite.
+  val secretKey = 
SparkHadoopUtil.get.getSecretKeyFromUserCredentials(SECRET_LOOKUP_KEY)
+  if (secretKey == null || secretKey.length == 0) {
+val rnd = new SecureRandom()
+val length = 
sparkConf.getInt("spark.authenticate.secretBitLength", 256) / 8
+val secret = new Array[Byte](length)
+rnd.nextBytes(secret)
+
+val cookie = HashCodes.fromBytes(secret).toString()
+
SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, cookie)
+cookie
   } else {
-logDebug("getSecretKey: yarn mode, secret key from credentials is 
null")
--- End diff --

 I'd prefer to see this one left.  Otherwise there is no easy way to see 
what its doing for the secret.  In general I'm against removing debug stuff 
unless its really really noisy. This should only be printed once and can be 
useful debugging user settings or issues with secrets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151806056
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151806066
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9200#issuecomment-151804357
  
**[Test build #44520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/consoleFull)**
 for PR 9200 at commit 
[`b2dd6b8`](https://github.com/apache/spark/commit/b2dd6b87865eed5519d8ad278e09ba17c1334c6c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43244486
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private var sparkContext: SparkContext = _
+  private var appId: ApplicationId = _
+  private var attemptId: Option[ApplicationAttemptId] = _
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+sparkContext = binding.sparkContext
+appId = binding.applicationId
+attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+s" and attemptId $attemptId")
--- End diff --

fixed, +lines directly below



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9331#issuecomment-151824678
  
**[Test build #44521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/consoleFull)**
 for PR 9331 at commit 
[`90927e6`](https://github.com/apache/spark/commit/90927e6e4cd46a6752fe4cdd7d1214112d218278).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9326#issuecomment-151839373
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9330#issuecomment-151847794
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10185] [SQL] Feat sql comma separated p...

2015-10-28 Thread edvald

Github user edvald commented on the pull request:

https://github.com/apache/spark/pull/8416#issuecomment-151805526
  
Hey all. Just ran into this bug when upgrading to 1.5.1, very glad it was 
resolved! That said, I may not be able to run the updated code in my scenario - 
is there a suggested workaround for 1.5.x for loading multiple files, instead 
of using comma separated paths?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151808462
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151808446
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9330#discussion_r43249637
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/GroupedIteratorSuite.scala
 ---
@@ -0,0 +1,65 @@
+package org.apache.spark.sql.execution
--- End diff --

Need to add the apache header


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9257#issuecomment-151841537
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43252317
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+val sparkContext = binding.sparkContext
+val appId = binding.applicationId
+val attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+  s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+  .map { s =>
+  s.split(",").map(_.trim()).filter(!_.isEmpty)
--- End diff --

nit: indentation here is weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9257#issuecomment-151841517
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43253907
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend(
 
   private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf)
 
+  /** Application ID. Must be set by a subclass before starting the 
service */
+  private var appId: ApplicationId = null
+
+  /** Attempt ID. This is unset for client-side schedulers */
--- End diff --

nit: "client mode schedulers" is more in line with how the rest of code 
refers to things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-151845427
  
Looks OK to me, mostly just style nits. Also, needs a rebase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151809495
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151809518
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread sun-rui

Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/9196#discussion_r43243744
  
--- Diff: R/pkg/R/functions.R ---
@@ -2111,3 +2133,66 @@ setMethod("ntile",
 jc <- callJStatic("org.apache.spark.sql.functions", "ntile", 
as.integer(x))
 column(jc)
   })
+
+#' percentRank
+#'
+#' Window function: returns the relative rank (i.e. percentile) of rows 
within a window partition.
+#' 
+#' This is computed by:
+#' 
+#'   (rank of row in its partition - 1) / (number of rows in the partition 
- 1)
+#'
+#' This is equivalent to the PERCENT_RANK function in SQL.
+#'
+#' @rdname percentRank
+#' @name percentRank
+#' @family window_funcs
+#' @export
+#' @examples \dontrun{percentRank()}
+setMethod("percentRank",
+  signature(x = "missing"),
+  function() {
+jc <- callJStatic("org.apache.spark.sql.functions", 
"percentRank")
+column(jc)
+  })
+
+#' rank
+#'
+#' Window function: returns the rank of rows within a window partition.
+#' 
+#' The difference between rank and denseRank is that denseRank leaves no 
gaps in ranking
+#' sequence when there are ties. That is, if you were ranking a 
competition using denseRank
+#' and had three people tie for second place, you would say that all three 
were in second
+#' place and that the next person came in third.
+#' 
+#' This is equivalent to the RANK function in SQL.
+#'
+#' @rdname rank
+#' @name rank
+#' @family window_funcs
+#' @export
+#' @examples \dontrun{rank()}
+setMethod("rank",
--- End diff --

Since base::rank() has a different signature with this rank(), it is 
possible to expose both of them under the same name rank().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151819415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151819418
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9196#issuecomment-151819173
  
**[Test build #44523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/consoleFull)**
 for PR 9196 at commit 
[`b52a98d`](https://github.com/apache/spark/commit/b52a98d75b340e0f8d290deae528057bb5d28738).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9227#issuecomment-151819610
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9227#issuecomment-151819457
  
**[Test build #44517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44517/consoleFull)**
 for PR 9227 at commit 
[`f6a2c01`](https://github.com/apache/spark/commit/f6a2c01bb06c03a31f5efce5d7bb634ad364d775).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-151825740
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8744#issuecomment-151825768
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9330#discussion_r43249449
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala ---
@@ -83,11 +83,12 @@ class GroupedIterator private(
 
   /** Holds a copy of an input row that is in the current group. */
   var currentGroup = currentRow.copy()
-  var currentIterator: Iterator[InternalRow] = null
+
   assert(keyOrdering.compare(currentGroup, currentRow) == 0)
--- End diff --

This is the whole row, not just the key.  This allows us to do the equality 
check on the key columns only (which might short circuit) instead of doing a 
full projection on each row to extract the key columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9330#discussion_r43249474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala ---
@@ -83,11 +83,12 @@ class GroupedIterator private(
 
   /** Holds a copy of an input row that is in the current group. */
   var currentGroup = currentRow.copy()
-  var currentIterator: Iterator[InternalRow] = null
+
   assert(keyOrdering.compare(currentGroup, currentRow) == 0)
+  var currentIterator = createGroupValuesIterator()
 
   // Return true if we already have the next iterator or fetching a new 
iterator is successful.
-  def hasNext: Boolean = currentIterator != null || fetchNextGroupIterator
+  def hasNext: Boolean = currentIterator.ne(null) || fetchNextGroupIterator
--- End diff --

I think these are the same, and I prefer the idiomatic version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...

2015-10-28 Thread gliptak

Github user gliptak commented on a diff in the pull request:

https://github.com/apache/spark/pull/9263#discussion_r43250312
  
--- Diff: python/pyspark/mllib/tests.py ---
@@ -76,7 +76,8 @@
 pass
 
 ser = PickleSerializer()
-sc = SparkContext('local[4]', "MLlib tests")
+conf = SparkConf().set("spark.driver.allowMultipleContexts", "true")
--- End diff --

Reviewing the tests.py-s 


https://github.com/apache/spark/blob/master/python/pyspark/streaming/tests.py

initiates SparkContext differently:

```
@classmethod
def setUpClass(cls):
class_name = cls.__name__
conf = SparkConf().set("spark.default.parallelism", 1)
cls.sc = SparkContext(appName=class_name, conf=conf)
cls.sc.setCheckpointDir("/tmp")

@classmethod
def tearDownClass(cls):
cls.sc.stop()
# Clean up in the JVM just in case there has been some issues in 
Python API
try:
jSparkContextOption = SparkContext._jvm.SparkContext.get()
if jSparkContextOption.nonEmpty():
jSparkContextOption.get().stop()
except:
pass
```

Could this approach be retrofitted into 
https://github.com/apache/spark/blob/master/python/pyspark/mllib/tests.py to 
allow for concurrency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43251937
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
--- End diff --

nit: this goes before the previous import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43252433
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+val sparkContext = binding.sparkContext
+val appId = binding.applicationId
+val attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+  s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+  .map { s =>
+  s.split(",").map(_.trim()).filter(!_.isEmpty)
+.map { sClass =>
+  val instance = Utils.classForName(sClass)
+.newInstance()
--- End diff --

Hmmm... `SchedulerExtensionServices` should probably mention that 
implementations must have an empty constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43254504
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/scheduler/cluster/StubApplicationAttemptId.scala
 ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+
--- End diff --

nit: nuke the extra empty lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9330#issuecomment-151847822
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...

2015-10-28 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/9331

[SPARK-11369] [ML] [R] SparkR glm should support setting standardize

SparkR glm currently support :
```formula, family = c(âgaussianâ, âbinomialâ), data, lambda = 0, 
alpha = 0```
We should also support setting standardize which has been defined at 
[design 
documentation](https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-11369

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9331


commit 90927e6e4cd46a6752fe4cdd7d1214112d218278
Author: Yanbo Liang 
Date:   2015-10-28T11:12:42Z

SparkR glm should support setting standardize




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r43242989
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
+}
+assertNestedHiveException(e)
+// expect exception trapping code to unwind this hive-side exception
+assertNestedHiveException(intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastore(hadoopConf)
+})
+  }
+
+  def assertNestedHiveException(e: InvocationTargetException): Throwable = 
{
+val inner = e.getCause
+if (inner == null) {
+  fail("No inner cause", e)
+}
+if (!inner.isInstanceOf[HiveException]) {
+  fail(s"Not a hive exception", inner)
+}
+inner
+  }
+
+  test("handleTokenIntrospectionFailure") {
+val util = new YarnSparkHadoopUtil
+// downgraded exceptions
+util.handleTokenIntrospectionFailure("hive", new 
ClassNotFoundException("cnfe"))
--- End diff --

As soon as this patch is in I'll  turn to 
[SPARK-11317](https://issues.apache.org/jira/browse/SPARK-11317), which is 
essentially "apply the same catching, filtering and reporting strategy for 
HBase tokens as for Hive ones". It's not as critical as this one (token 
retrieval is working), but as nothing gets logged except 
"InvocationTargetException" with no stack trace, trying to recognise the issue 
is a Kerberos auth problem, let alone trying to fix it, is a weekend's effort, 
rather than 20 minutes worth. 

Because the policy goes in both places, having it separate and re-usable 
makes it a zero-cut-and-paste reuse, with that single test for failures without 
having to mock up failures across two separate clauses. And future maintenance 
costs are kept down if someone ever decides to change the policy again.

Would you be happier if I cleaned up the HBase code as part of this same 
patch? Because I can and it will make the benefits of the factored out 
behaviour clearer. It's just messy to fix two things in one patch, especially 
if someone ever needs to play cherry pick or reverting games.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151809791
  
**[Test build #44522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)**
 for PR 9232 at commit 
[`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43244882
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private var sparkContext: SparkContext = _
+  private var appId: ApplicationId = _
+  private var attemptId: Option[ApplicationAttemptId] = _
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+sparkContext = binding.sparkContext
+appId = binding.applicationId
+attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+.map { s =>
+  s.split(",").map(_.trim()).filter(!_.isEmpty)
+.map { sClass =>
+val instance = Utils.classForName(sClass)
+.newInstance()
+.asInstanceOf[SchedulerExtensionService]
+// bind this service
+instance.start(binding)
+logInfo(s"Service $sClass started")
+instance
+  }
+}.map(_.toList).getOrElse(Nil)
+  }
+
+  /**

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151818616
  
**[Test build #44522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)**
 for PR 9232 at commit 
[`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `
logInfo(s\"$service class not found $e\")`\n  * `
logDebug(\"$service class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151818863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9232#issuecomment-151818859
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11349] [ML] Support transform string la...

2015-10-28 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9302#issuecomment-151806442
  
I think further more we should provide a param named ```family``` for 
```RFormula``` to indicate the estimator/model which will be applied to the 
dataframe transformed by this ```RFormula``` transformer, and then we can do 
more strict label validation check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43244977
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import java.util.concurrent.atomic.AtomicBoolean
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
+
+import org.apache.spark.util.Utils
+import org.apache.spark.{Logging, SparkContext}
+
+/**
+ * An extension service that can be loaded into a Spark YARN scheduler.
+ * A Service that can be started and stopped
+ *
+ * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+ * never invoked.
+ */
+trait SchedulerExtensionService {
+
+  /**
+   * Start the extension service. This should be a no-op if
+   * called more than once.
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit
+
+  /**
+   * Stop the service
+   * The `stop()` operation MUST be idempotent, and succeed even if 
`start()` was
+   * never invoked.
+   */
+  def stop(): Unit
+}
+
+/**
+ * Binding information for a [[SchedulerExtensionService]]
+ * @param sparkContext current spark context
+ * @param applicationId YARN application ID
+ * @param attemptId optional AttemptID.
+ */
+case class SchedulerExtensionServiceBinding(
+sparkContext: SparkContext,
+applicationId: ApplicationId,
+attemptId: Option[ApplicationAttemptId] = None)
+
+/**
+ * Container for [[SchedulerExtensionService]] instances.
+ *
+ * Loads Extension Services from the configuration property
+ * `"spark.yarn.services"`, instantiates and starts them.
+ * When stopped, it stops all child entries.
+ *
+ * The order in which child extension services are started and stopped
+ * is undefined.
+ *
+ */
+private[spark] class SchedulerExtensionServices extends 
SchedulerExtensionService
+with Logging {
+  private var services: List[SchedulerExtensionService] = Nil
+  private var sparkContext: SparkContext = _
+  private var appId: ApplicationId = _
+  private var attemptId: Option[ApplicationAttemptId] = _
+  private val started = new AtomicBoolean(false)
+  private var binding: SchedulerExtensionServiceBinding = _
+
+  /**
+   * Binding operation will load the named services and call bind on them 
too; the
+   * entire set of services are then ready for `init()` and `start()` calls
+
+   * @param binding binding to the spark application and YARN
+   */
+  def start(binding: SchedulerExtensionServiceBinding): Unit = {
+if (started.getAndSet(true)) {
+  logWarning("Ignoring re-entrant start operation")
+  return
+}
+require(binding.sparkContext != null, "Null context parameter")
+require(binding.applicationId != null, "Null appId parameter")
+this.binding = binding
+sparkContext = binding.sparkContext
+appId = binding.applicationId
+attemptId = binding.attemptId
+logInfo(s"Starting Yarn extension services with app 
${binding.applicationId}" +
+s" and attemptId $attemptId")
+
+services = 
sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES)
+.map { s =>
+  s.split(",").map(_.trim()).filter(!_.isEmpty)
+.map { sClass =>
+val instance = Utils.classForName(sClass)
+.newInstance()
--- End diff --

I thought about that, but consider this: when would you want failure to 
load your listed extension services as something not to fail on? Do you want it 
to quitely downgrade, vs noisily fail?

maybe we could make it an option


---
If your project is set up for it,

[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8744#issuecomment-151825736
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-151825730
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-151825769
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-151825741
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9200#issuecomment-151832704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11303] [SQL] filter should not be pushe...

2015-10-28 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9294#issuecomment-151843230
  
I'm going to pick this into branch-1.5 too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9257#issuecomment-151843133
  
**[Test build #44528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44528/consoleFull)**
 for PR 9257 at commit 
[`b4a29bf`](https://github.com/apache/spark/commit/b4a29bf56dbee1e60d36df8d2272e7bfc8794f3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9182#discussion_r43253815
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
 ---
@@ -17,17 +17,17 @@
 
 package org.apache.spark.scheduler.cluster
 
-import scala.collection.mutable.ArrayBuffer
-import scala.concurrent.{Future, ExecutionContext}
+import scala.concurrent.{ExecutionContext, Future}
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, 
ApplicationId}
 
-import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.rpc._
-import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.scheduler._
+import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.ui.JettyUtils
-import org.apache.spark.util.{ThreadUtils, RpcUtils}
-
-import scala.util.control.NonFatal
+import org.apache.spark.util.{RpcUtils, ThreadUtils}
+import org.apache.spark.{Logging, SparkContext}
--- End diff --

nit: move before previous import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...

2015-10-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9330#discussion_r43255063
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala ---
@@ -83,11 +83,12 @@ class GroupedIterator private(
 
   /** Holds a copy of an input row that is in the current group. */
   var currentGroup = currentRow.copy()
-  var currentIterator: Iterator[InternalRow] = null
+
   assert(keyOrdering.compare(currentGroup, currentRow) == 0)
--- End diff --

Ah, sorry I missed it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...

2015-10-28 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/9335

[SPARK-11376] [SQL] Removes duplicated `mutableRow` field

This PR fixes a mistake in the code generated by `GenerateColumnAccessor`. 
Interestingly, although the code is illegal in Java (the class has two fields 
with the same name), Janino accepts it happily and accidentally works properly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-11376.fix-generated-code

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9335


commit 06a56f293bf013f304aac1eee56c2fa3f2bf0f92
Author: Cheng Lian 
Date:   2015-10-28T14:12:07Z

Removes duplicated `mutableRow` field




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9335#issuecomment-151863211
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7842#issuecomment-151863159
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7842#issuecomment-151863260
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11378][STREAMING] make StreamingContext...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9336#issuecomment-151870654
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

2015-10-28 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r43266858
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
+}
+assertNestedHiveException(e)
+// expect exception trapping code to unwind this hive-side exception
+assertNestedHiveException(intercept[InvocationTargetException] {
+  util.obtainTokenForHiveMetastore(hadoopConf)
+})
+  }
+
+  def assertNestedHiveException(e: InvocationTargetException): Throwable = 
{
+val inner = e.getCause
+if (inner == null) {
+  fail("No inner cause", e)
+}
+if (!inner.isInstanceOf[HiveException]) {
+  fail(s"Not a hive exception", inner)
+}
+inner
+  }
+
+  test("handleTokenIntrospectionFailure") {
+val util = new YarnSparkHadoopUtil
+// downgraded exceptions
+util.handleTokenIntrospectionFailure("hive", new 
ClassNotFoundException("cnfe"))
--- End diff --

BTW, if you really want to implement a shared policy, I'd recommend adding 
something like `scala.util.control.NonFatal`. That makes the exception handling 
cleaner; it would look more like this:

try {
  // code that can throw
} catch {
  case IgnorableException(e) => logDebug(...)
}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9552] Add force control for killExecuto...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7888#issuecomment-151875843
  
**[Test build #44534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44534/consoleFull)**
 for PR 7888 at commit 
[`c23f887`](https://github.com/apache/spark/commit/c23f887b62a75415bab74036e78d03b92b1a5541).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-151876039
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...

2015-10-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5423#issuecomment-151875771
  
**[Test build #44526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44526/consoleFull)**
 for PR 5423 at commit 
[`2c1db93`](https://github.com/apache/spark/commit/2c1db93bb1fe72a03e4b866741b6b803b30bb2b3).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9182#issuecomment-151877399
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8744#issuecomment-151878000
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44525/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9257#issuecomment-151880853
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7842#issuecomment-151883050
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11371 Make "mean" an alias for "avg" ope...

2015-10-28 Thread ted-yu

GitHub user ted-yu opened a pull request:

https://github.com/apache/spark/pull/9332

SPARK-11371 Make "mean" an alias for "avg" operator

From Reynold in the thread 'Exception when using some aggregate operators' 
(http://search-hadoop.com/m/q3RTt0xFr22nXB4/):

I don't think these are bugs. The SQL standard for average is "avg", not 
"mean". Similarly, a distinct count is supposed to be written as 
"count(distinct col)", not "countDistinct(col)".
We can, however, make "mean" an alias for "avg" to improve compatibility 
between DataFrame and SQL.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ted-yu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9332


commit f1447f0cba860a84ed60929b4871936198fe4150
Author: tedyu 
Date:   2015-10-28T14:12:12Z

SPARK-11371 Make "mean" an alias for "avg" operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...

2015-10-28 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9335#issuecomment-151862392
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7842#issuecomment-151866430
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44532/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...

2015-10-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7842#issuecomment-151866425
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11378][STREAMING] make StreamingContext...

2015-10-28 Thread manygrams

GitHub user manygrams opened a pull request:

https://github.com/apache/spark/pull/9336

[SPARK-11378][STREAMING] make StreamingContext.awaitTerminationOrTimeout 
return properly

This adds a failing test checking that `awaitTerminationOrTimeout` returns 
the expected value, and then fixes that failing test with the addition of a 
`return`.

@tdas @zsxwing 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manygrams/spark 
fix_await_termination_or_timeout

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9336


commit 7bd9a3fc8593c8d1dce07a9223683bbb8d39cf10
Author: Nick Evans 
Date:   2015-10-28T14:40:41Z

make StreamingContext.awaitTerminationOrTimeout return properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 679 matches

Mail list logo