[GitHub] spark issue #22179: [SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Upgrade K...

2018-09-05 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22179
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it'...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22333


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22333
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22112


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22112
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215446604
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -51,7 +51,11 @@ private[spark] class KubernetesDriverBuilder(
 provideJavaStep: (
   KubernetesConf[KubernetesDriverSpecificConf]
 => JavaDriverFeatureStep) =
-new JavaDriverFeatureStep(_)) {
+new JavaDriverFeatureStep(_),
--- End diff --

Argument type is a function with the default value being assigned to the 
default constructor. It might be cleaner to write this as follows:

- Constructor in the class declaration has _no_ default arguments
- Provide an empty arg constructor (`def this()`) which passes along all 
default implementations to the full-args constructor.

In this case default arguments are hurting more than they are helping, and 
in the unit tests, we always supply mocks for _every_ arg, meaning we always 
use all defaults (real implementation) or no defaults (test implementation).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22270
  
@dilipbiswal Any update on this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r21566
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopStepsOrchestrator.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.features.OptionRequirements
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+private[spark] class HadoopStepsOrchestrator(
+  conf: SparkConf,
+  kubernetesResourceNamePrefix: String,
+  hadoopConfDir: String,
+  hadoopConfigMapName: String,
+  isKerberosEnabled: Boolean) extends Logging {
+
+  private val maybePrincipal = conf.get(KUBERNETES_KERBEROS_PRINCIPAL)
+  private val maybeKeytab = conf.get(KUBERNETES_KERBEROS_KEYTAB)
+.map(k => new File(k))
+  private val maybeExistingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+  private val maybeExistingSecretItemKey =
+conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+  private val maybeRenewerPrincipal =
+conf.get(KUBERNETES_KERBEROS_RENEWER_PRINCIPAL)
+
+  require(maybeKeytab.forall( _ => isKerberosEnabled ),
+"You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+  require(maybeExistingSecretName.forall( _ => isKerberosEnabled ),
+"You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+  OptionRequirements.requireBothOrNeitherDefined(
+maybeKeytab,
+maybePrincipal,
+"If a Kerberos keytab is specified you must also specify a Kerberos 
principal",
+"If a Kerberos principal is specified you must also specify a Kerberos 
keytab")
+
+  OptionRequirements.requireBothOrNeitherDefined(
+maybeExistingSecretName,
+maybeExistingSecretItemKey,
+"If a secret storing a Kerberos Delegation Token is specified you must 
also" +
+  " specify the label where the data is stored",
+"If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+  " you must also specify the name of the secret")
+
+  def getHadoopSteps(kubeTokenManager: 
KubernetesHadoopDelegationTokenManager):
--- End diff --

We did this before, in the prototype, but since we made the new feature 
steps API I'm not sure about having two tiers of steps. I think it'll be 
clearer to just use the top level feature steps and just do all Hadoop things 
in a single step class. Specific submodules can be factored out into separate 
classes, without the strict requirement of sharing the same interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95729/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22333
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22333
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22288
  
**[Test build #95729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95729/testReport)**
 for PR 22288 at commit 
[`87c4e57`](https://github.com/apache/spark/commit/87c4e57bb966078c8a78eabc5a5e4b6f60c78f28).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
> I'm looking up other spark-submit options that can be used to provide the 
jar to other nodes

I don't think there is anything like that currently; so using `--jars` for 
this will only work in YARN mode. But that can be just a documentation thingy, 
when/if this feature is documented.

(It may work with kubernetes too but haven't really tried that out since 
2.3.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22333
  
**[Test build #95728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95728/testReport)**
 for PR 22333 at commit 
[`14d1017`](https://github.com/apache/spark/commit/14d10172f9d62ba9a8eabec5aaa11759757278bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-05 Thread pgandhi999
Github user pgandhi999 commented on the issue:

https://github.com/apache/spark/pull/22144
  
@cloud-fan I have updated the PR and added the configurable flags for 
Window UDAF and Hive UDAF for now. Would appreciate to hear your thoughts. 
Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21525: [SPARK-24513][ML] Attribute support in UnaryTransformer

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21525
  
**[Test build #95732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95732/testReport)**
 for PR 21525 at commit 
[`cfeae4d`](https://github.com/apache/spark/commit/cfeae4d5282be98d619da97f350c11339057e92f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22271
  
**[Test build #4332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4332/testReport)**
 for PR 22271 at commit 
[`7651652`](https://github.com/apache/spark/commit/76516529e31a5507afc0a1de31b30104a4d7c6ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22282
  
**[Test build #95731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95731/testReport)**
 for PR 22282 at commit 
[`b0f64e9`](https://github.com/apache/spark/commit/b0f64e91cb4f6306a7c0c60d4a17f1a0aacb3a51).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215429800
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, 
ContainerBuilder, HasMetadata, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, 
KUBERNETES_KERBEROS_PROXY_USER}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import 
org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, 
HadoopConfigSpec, HadoopConfigurationStep}
+import org.apache.spark.internal.Logging
+
+ /**
+  * This is the main method that runs the hadoopConfigurationSteps defined
+  * by the HadoopStepsOrchestrator. These steps are run to modify the
+  * SparkPod and Kubernetes Resources using the additive method of the 
feature steps
+  */
+private[spark] class HadoopGlobalFeatureDriverStep(
+  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep with Logging {
+   private val hadoopTestOrchestrator =
+ kubernetesConf.getHadoopStepsOrchestrator
--- End diff --

fits in previous line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-09-05 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21710
  
@felixcheung 
Are there any other things I need to change? If not, could this PR be 
merged in 2.4? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR

2018-09-05 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21649
  
@felixcheung 
Are there any other things I need to change? If not, could this PR be 
merged in 2.4? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215433952
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosSecretResolverStep.scala
 ---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step assumes that you have already done all the heavy lifting in 
retrieving a
--- End diff --

This whole "steps" thing used in the k8s backend always throws me off... 
but given the comment, why isn't the code in this step just part of the step 
that actually does the required initialization of the delegation token stuff?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215431441
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, 
ContainerBuilder, KeyToPathBuilder, PodBuilder}
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Location of the krb5 file
+* @param krb5ConfName Name of the ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: String,
+  krb5ConfName: String,
+  pod: SparkPod) : SparkPod = {
+  val krb5File = new File(fileLocation)
--- End diff --

Whole body is indented too far.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215429715
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, 
ContainerBuilder, HasMetadata, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, 
KUBERNETES_KERBEROS_PROXY_USER}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import 
org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, 
HadoopConfigSpec, HadoopConfigurationStep}
+import org.apache.spark.internal.Logging
+
+ /**
+  * This is the main method that runs the hadoopConfigurationSteps defined
+  * by the HadoopStepsOrchestrator. These steps are run to modify the
+  * SparkPod and Kubernetes Resources using the additive method of the 
feature steps
+  */
+private[spark] class HadoopGlobalFeatureDriverStep(
+  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
--- End diff --

nit: params in their own line are double-indented. Also happens in other 
places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215433543
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosKeytabResolverStep.scala
 ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io._
+import java.security.PrivilegedExceptionAction
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step does all the heavy lifting for Delegation Token logic. This 
step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. With a TGT stored locally, by 
running
+  * UGI.getCurrentUser you are able to obtain the current user, 
alternatively
+  * you can run UGI.loginUserFromKeytabAndReturnUGI and by running .doAs 
run
+  * as the logged into user instead of the current user. With the Job User 
principal
+  * you then retrieve the delegation token from the NameNode and store 
values in
+  * DelegationToken. Lastly, the class puts the data into a secret. All 
this is
+  * appended to the current HadoopSpec which in turn will append to the 
current
+  * DriverSpec.
+  */
+private[spark] class HadoopKerberosKeytabResolverStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix : String,
+maybePrincipal: Option[String],
+maybeKeytab: Option[File],
+maybeRenewerPrincipal: Option[String],
+tokenManager: KubernetesHadoopDelegationTokenManager)
+   extends HadoopConfigurationStep with Logging {
+
+override def configureHadoopSpec(hSpec: HadoopConfigSpec): 
HadoopConfigSpec = {
+  val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+  if (!tokenManager.isSecurityEnabled) {
+throw new SparkException("Hadoop not configured with Kerberos")
+  }
+  val maybeJobUserUGI =
+for {
+  principal <- maybePrincipal
+  keytab <- maybeKeytab
+} yield {
+  logDebug("Logged into KDC with keytab using Job User UGI")
+  tokenManager.loginUserFromKeytabAndReturnUGI(
+principal,
+keytab.toURI.toString)
+}
+  // In the case that keytab is not specified we will read from Local 
Ticket Cache
+  val jobUserUGI = 
maybeJobUserUGI.getOrElse(tokenManager.getCurrentUser)
+  val originalCredentials = jobUserUGI.getCredentials
+  // It is necessary to run as jobUserUGI because logged in user != 
Current User
+  val (tokenData, renewalInterval) = jobUserUGI.doAs(
+new PrivilegedExceptionAction[(Array[Byte], Long)] {
+override def run(): (Array[Byte], Long) = {
+  val hadoopTokenManager: HadoopDelegationTokenManager =
+new HadoopDelegationTokenManager(submissionSparkConf, 
hadoopConf)
+  tokenManager.getDelegationTokens(
+originalCredentials,
+submissionSparkConf,
+hadoopConf,
+hadoopTokenManager)
+}})
+  if (tokenData.isEmpty) throw new SparkException(s"Did not obtain any 
delegation tokens")
--- End diff --

Just use `require`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215430312
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, 
ContainerBuilder, HasMetadata, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, 
KUBERNETES_KERBEROS_PROXY_USER}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import 
org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, 
HadoopConfigSpec, HadoopConfigurationStep}
+import org.apache.spark.internal.Logging
+
+ /**
+  * This is the main method that runs the hadoopConfigurationSteps defined
--- End diff --

All this seems to just say "Runs the configuration steps defined by 
HadoopStepsOrchestrator" which is a lot shorter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215428372
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
+ConfigBuilder("spark.kubernetes.kerberos.proxyUser")
+  .doc("Specify the proxy user " +
+"for HadoopUGI login for the Driver + Executors")
+  .internal()
+  .stringConf
+  .createWithDefault("false")
+
+  val KUBERNETES_KERBEROS_KRB5_FILE =
+ConfigBuilder("spark.kubernetes.kerberos.krb5location")
+  .doc("Specify the location of the krb5 file " +
+"to be mounted on the driver and executors for Secure HDFS")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_KEYTAB =
+ConfigBuilder("spark.kubernetes.kerberos.keytab")
+  .doc("Specify the location of keytab " +
--- End diff --

Fits in one line. Probably similar for other confs you're adding.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215435777
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala
 ---
@@ -17,8 +17,8 @@
 package org.apache.spark.scheduler.cluster.k8s
 
 import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesExecutorSpecificConf, KubernetesRoleSpecificConf, SparkPod}
-import org.apache.spark.deploy.k8s.features._
-import org.apache.spark.deploy.k8s.features.{BasicExecutorFeatureStep, 
EnvSecretsFeatureStep, LocalDirsFeatureStep, MountSecretsFeatureStep}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.features.{BasicExecutorFeatureStep, 
EnvSecretsFeatureStep, HadoopConfExecutorFeatureStep, 
HadoopSparkUserExecutorFeatureStep, KerberosConfExecutorFeatureStep, 
KubernetesFeatureConfigStep, LocalDirsFeatureStep, MountSecretsFeatureStep, 
MountVolumesFeatureStep}
--- End diff --

Same: if the import list is getting this noisy, just import everything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215429895
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, 
ContainerBuilder, HasMetadata, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, 
KUBERNETES_KERBEROS_PROXY_USER}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import 
org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, 
HadoopConfigSpec, HadoopConfigurationStep}
+import org.apache.spark.internal.Logging
+
+ /**
+  * This is the main method that runs the hadoopConfigurationSteps defined
+  * by the HadoopStepsOrchestrator. These steps are run to modify the
+  * SparkPod and Kubernetes Resources using the additive method of the 
feature steps
+  */
+private[spark] class HadoopGlobalFeatureDriverStep(
+  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep with Logging {
+   private val hadoopTestOrchestrator =
+ kubernetesConf.getHadoopStepsOrchestrator
+   require(kubernetesConf.hadoopConfDir.isDefined &&
+ hadoopTestOrchestrator.isDefined, "Ensure that HADOOP_CONF_DIR is 
defined")
--- End diff --

keep condition in previous line, break only the message if it doesn't fit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215428441
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
+ConfigBuilder("spark.kubernetes.kerberos.proxyUser")
+  .doc("Specify the proxy user " +
+"for HadoopUGI login for the Driver + Executors")
+  .internal()
+  .stringConf
+  .createWithDefault("false")
+
+  val KUBERNETES_KERBEROS_KRB5_FILE =
+ConfigBuilder("spark.kubernetes.kerberos.krb5location")
+  .doc("Specify the location of the krb5 file " +
+"to be mounted on the driver and executors for Secure HDFS")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_KEYTAB =
+ConfigBuilder("spark.kubernetes.kerberos.keytab")
+  .doc("Specify the location of keytab " +
+"for Kerberos in order to access Secure HDFS")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_PRINCIPAL =
+ConfigBuilder("spark.kubernetes.kerberos.principal")
+  .doc("Specify the principal " +
+"for Kerberos in order to access Secure HDFS")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_RENEWER_PRINCIPAL =
+ConfigBuilder("spark.kubernetes.kerberos.renewer.principal")
+  .doc("Specify the principal " +
+"you wish to renew and retrieve your Kerberos values with")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_DT_SECRET_NAME =
+ConfigBuilder("spark.kubernetes.kerberos.tokensecret.name")
+  .doc("Specify the name of the secret where " +
+"your existing delegation token is stored. This removes the need " 
+
+"for the job user to provide any keytab for launching a job")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY =
+ConfigBuilder("spark.kubernetes.kerberos.tokensecret.itemkey")
+  .doc("Specify the item key of the data where " +
+"your existing delegation token is stored. This removes the need " 
+
+"for the job user to provide any keytab for launching a job")
+  .stringConf
+  .createOptional
+
--- End diff --

Too many empty lines.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215434425
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.security
+
+import java.io.File
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Clock, SystemClock}
+
+ /**
+  * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop 
delegation tokens
+  * on the behalf of the Kubernetes submission client. It is modeled after 
the YARN
+  * AMCredentialRenewer, renewals in Kubernetes happen in a seperate 
microservice that will
+  * automatically update the Tokens via Kubernetes Secrets. The principal 
difference is that
+  * instead of writing the new credentials to HDFS and incrementing the 
timestamp of the file,
--- End diff --

This is not what the YARN backend does anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215435116
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -17,7 +17,7 @@
 package org.apache.spark.deploy.k8s.submit
 
 import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpec, 
KubernetesDriverSpecificConf, KubernetesRoleSpecificConf}
-import org.apache.spark.deploy.k8s.features.{BasicDriverFeatureStep, 
DriverKubernetesCredentialsFeatureStep, DriverServiceFeatureStep, 
EnvSecretsFeatureStep, LocalDirsFeatureStep, MountSecretsFeatureStep, 
MountVolumesFeatureStep}
+import org.apache.spark.deploy.k8s.features.{BasicDriverFeatureStep, 
DriverKubernetesCredentialsFeatureStep, DriverServiceFeatureStep, 
EnvSecretsFeatureStep, HadoopGlobalFeatureDriverStep, 
KubernetesFeatureConfigStep, LocalDirsFeatureStep, MountSecretsFeatureStep, 
MountVolumesFeatureStep}
--- End diff --

At this point, just import `.features._`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215435834
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala
 ---
@@ -35,10 +35,26 @@ private[spark] class KubernetesExecutorBuilder(
   new LocalDirsFeatureStep(_),
 provideVolumesStep: (KubernetesConf[_ <: KubernetesRoleSpecificConf]
   => MountVolumesFeatureStep) =
-  new MountVolumesFeatureStep(_)) {
+new MountVolumesFeatureStep(_),
--- End diff --

Same thing about this code being hard to parse visually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215426740
  
--- Diff: docs/security.md ---
@@ -722,6 +722,62 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
+
+1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
+```bash
+/usr/bin/kinit -kt  /
+/opt/spark/bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.HdfsTest \
+--master k8s:// \
+--conf spark.executor.instances=1 \
+--conf spark.app.name=spark-hdfs \
+--conf spark.kubernetes.container.image=spark:latest \
+--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \
+
+```
+3. Submitting with a local keytab and principle
--- End diff --

principal


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215426993
  
--- Diff: docs/security.md ---
@@ -722,6 +722,62 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
+
+1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
+```bash
+/usr/bin/kinit -kt  /
+/opt/spark/bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.HdfsTest \
+--master k8s:// \
+--conf spark.executor.instances=1 \
+--conf spark.app.name=spark-hdfs \
+--conf spark.kubernetes.container.image=spark:latest \
+--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \
+
+```
+3. Submitting with a local keytab and principle
+```bash
+/opt/spark/bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.HdfsTest \
+--master k8s:// \
+--conf spark.executor.instances=1 \
+--conf spark.app.name=spark-hdfs \
+--conf spark.kubernetes.container.image=spark:latest \
+--conf spark.kubernetes.kerberos.keytab= \
+--conf spark.kubernetes.kerberos.principal= \
+--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \
+
+```
+
+3. Submitting with pre-populated secrets already existing within the 
namespace
--- End diff --

What are these secrets? Are they the ticket cache, a delegation token 
cache, something else?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215427665
  
--- Diff: examples/src/main/scala/org/apache/spark/examples/HdfsTest.scala 
---
@@ -41,6 +41,8 @@ object HdfsTest {
   val end = System.currentTimeMillis()
   println(s"Iteration $iter took ${end-start} ms")
 }
+println(s"File contents: ${file.map(s => 
s.toString).collect().mkString(",")}")
--- End diff --

Spark's convention is either `.map { s=> s.toString }` or 
`.map(_.toString)`. I noticed this in a few other places too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215429280
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 roleSecretEnvNamesToKeyRefs: Map[String, String],
 roleEnvs: Map[String, String],
 roleVolumes: Iterable[KubernetesVolumeSpec[_ <: 
KubernetesVolumeSpecificConf]],
-sparkFiles: Seq[String]) {
+sparkFiles: Seq[String],
+hadoopConfDir: Option[String]) {
+
+  def getHadoopConfigMapName: String = 
s"$appResourceNamePrefix-hadoop-config"
+
+  def getKRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file"
+
+  def getHadoopStepsOrchestrator : Option[HadoopStepsOrchestrator] = 
hadoopConfDir.map {
+hConf => new HadoopStepsOrchestrator(
--- End diff --

nit: `hConf =>` stays together with the `{`; if it doesn't fit, use a full 
method declaration with enclosing braces.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215434745
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.security
+
+import java.io.File
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Clock, SystemClock}
+
+ /**
+  * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop 
delegation tokens
+  * on the behalf of the Kubernetes submission client. It is modeled after 
the YARN
+  * AMCredentialRenewer, renewals in Kubernetes happen in a seperate 
microservice that will
+  * automatically update the Tokens via Kubernetes Secrets. The principal 
difference is that
+  * instead of writing the new credentials to HDFS and incrementing the 
timestamp of the file,
+  * the new credentials (called Tokens when they are serialized) are 
stored in Secrets accessible
+  * to the driver and executors, when new Tokens are received they 
overwrite the current Secrets.
+  */
+private[spark] class KubernetesHadoopDelegationTokenManager extends 
Logging {
+
+   // HadoopUGI Util methods
+   private val clock: Clock = new SystemClock()
+   def getCurrentUser: UserGroupInformation = 
UserGroupInformation.getCurrentUser
+   def getShortUserName : String = getCurrentUser.getShortUserName
+   def getFileSystem(hadoopConf: Configuration) : FileSystem = 
FileSystem.get(hadoopConf)
+   def isSecurityEnabled: Boolean = UserGroupInformation.isSecurityEnabled
+   def loginUserFromKeytabAndReturnUGI(principal: String, keytab: String): 
UserGroupInformation =
+ UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, 
keytab)
+   def getCurrentTime: Long = clock.getTimeMillis()
+   def serializeCreds(creds: Credentials): Array[Byte] = 
SparkHadoopUtil.get.serialize(creds)
+   def nextRT(rt: Long, conf: SparkConf): Long = 
SparkHadoopUtil.nextCredentialRenewalTime(rt, conf)
+
+   // Grab files in the HADOOP_CONF directory
+   def getHadoopConfFiles(path: String) : Seq[File] = {
+ val dir = new File(path)
+ if (dir.isDirectory) {
+   dir.listFiles.flatMap { file => Some(file).filter(_.isFile) }.toSeq
+ } else {
+   Seq.empty[File]
+ }
+   }
+
+   // Principle method in charge of retrieving new Delegation Tokens
--- End diff --

Main method. Or just drop the comment altogether, it's not really adding 
anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215428117
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
--- End diff --

I see this config being set in the code, but never read?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95730/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215427909
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
+ConfigBuilder("spark.kubernetes.kerberos.proxyUser")
+  .doc("Specify the proxy user " +
--- End diff --

Fits in one line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215431579
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, 
ContainerBuilder, KeyToPathBuilder, PodBuilder}
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Location of the krb5 file
+* @param krb5ConfName Name of the ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: String,
+  krb5ConfName: String,
+  pod: SparkPod) : SparkPod = {
+  val krb5File = new File(fileLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  val kerberizedPod = new PodBuilder(pod.pod)
+.editOrNewSpec()
+  .addNewVolume()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withNewSecret()
+  .withSecretName(dtSecretName)
+  .endSecret()
+.endVolume()
+  .addNewVolume()
+.withName(KRB_FILE_VOLUME)
+  .withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+  .endVolume()
+  .endSpec()
+.build()
+  val kerberizedContainer = new ContainerBuilder(pod.container)
+.addNewVolumeMount()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+  .endVolumeMount()
+.addNewVolumeMount()
+  .withName(KRB_FILE_VOLUME)
+  .withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf")
+  .withSubPath("krb5.conf")
+  .endVolumeMount()
+.addNewEnv()
+  .withName(ENV_HADOOP_TOKEN_FILE_LOCATION)
+  
.withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey")
+  .endEnv()
+.addNewEnv()
+  .withName(ENV_SPARK_USER)
+  .withValue(userName)
+  .endEnv()
+.build()
+SparkPod(kerberizedPod, kerberizedContainer)
+  }
+
+   /**
+* setting ENV_SPARK_USER when HADOOP_FILES are detected
+*
+* @param sparkUserName Name of the SPARK_USER
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapSparkUserPod(
+ sparkUserName: String,
+ pod: SparkPod) : SparkPod = {
+ val envModifiedContainer = new ContainerBuilder(pod.container)
+   .addNewEnv()
--- End diff --

indent


---

-
To unsubscribe, e-mail: 

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215431371
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/OptionRequirements.scala
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+private[spark] object OptionRequirements {
--- End diff --

These methods seem to fit just fine in `KubernetesUtils`, instead of adding 
a new object for them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215435605
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -51,7 +51,11 @@ private[spark] class KubernetesDriverBuilder(
 provideJavaStep: (
   KubernetesConf[KubernetesDriverSpecificConf]
 => JavaDriverFeatureStep) =
-new JavaDriverFeatureStep(_)) {
+new JavaDriverFeatureStep(_),
--- End diff --

Indentation here is super odd, but it is also odd for all the existing 
code. But it's kinda hard to tell what's going on.

Seems like ` new JavaDriverFeatureStep(_)` is the body of a closure, in 
which case it might be clearer to either keep in the same line as the `=`, or 
just use braces and indent the block properly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215427411
  
--- Diff: docs/security.md ---
@@ -722,6 +722,62 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
+
+1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
+```bash
+/usr/bin/kinit -kt  /
+/opt/spark/bin/spark-submit \
--- End diff --

Instead of using a command line example with some configs that are really 
unrelated to the feature being explained, how about only explaining the configs 
that need to be set to enable the feature? Preferably using a table like other 
config-related documents use.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #95730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95730/testReport)**
 for PR 22305 at commit 
[`09a79c2`](https://github.com/apache/spark/commit/09a79c216a42b3f7a20e4aaef6f56bcf79e405cb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215429047
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 roleSecretEnvNamesToKeyRefs: Map[String, String],
 roleEnvs: Map[String, String],
 roleVolumes: Iterable[KubernetesVolumeSpec[_ <: 
KubernetesVolumeSpecificConf]],
-sparkFiles: Seq[String]) {
+sparkFiles: Seq[String],
+hadoopConfDir: Option[String]) {
+
+  def getHadoopConfigMapName: String = 
s"$appResourceNamePrefix-hadoop-config"
--- End diff --

Please avoid `get` in the property names (see existing methods in this 
class).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215432887
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosKeytabResolverStep.scala
 ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io._
+import java.security.PrivilegedExceptionAction
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step does all the heavy lifting for Delegation Token logic. This 
step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. With a TGT stored locally, by 
running
+  * UGI.getCurrentUser you are able to obtain the current user, 
alternatively
+  * you can run UGI.loginUserFromKeytabAndReturnUGI and by running .doAs 
run
+  * as the logged into user instead of the current user. With the Job User 
principal
+  * you then retrieve the delegation token from the NameNode and store 
values in
+  * DelegationToken. Lastly, the class puts the data into a secret. All 
this is
+  * appended to the current HadoopSpec which in turn will append to the 
current
+  * DriverSpec.
+  */
+private[spark] class HadoopKerberosKeytabResolverStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix : String,
+maybePrincipal: Option[String],
+maybeKeytab: Option[File],
+maybeRenewerPrincipal: Option[String],
+tokenManager: KubernetesHadoopDelegationTokenManager)
+   extends HadoopConfigurationStep with Logging {
+
+override def configureHadoopSpec(hSpec: HadoopConfigSpec): 
HadoopConfigSpec = {
+  val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+  if (!tokenManager.isSecurityEnabled) {
+throw new SparkException("Hadoop not configured with Kerberos")
+  }
+  val maybeJobUserUGI =
--- End diff --

`SparkSubmit` already logs in the user if a keytab is provided, so you 
could reuse that. Only issue is that it uses the existing configs which have 
"yarn" in their name.

I think it would be better to create common names for the principal and 
keytab configs, and deprecate the YARN-specific ones. Then you can just use 
`UGI.getCurrentUser` here. You could even simplify the logic below (no need for 
`doAs` in that case).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215430810
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopGlobalFeatureDriverStep.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, 
ContainerBuilder, HasMetadata, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.{KUBERNETES_KERBEROS_KRB5_FILE, 
KUBERNETES_KERBEROS_PROXY_USER}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import 
org.apache.spark.deploy.k8s.features.hadoopsteps.{HadoopBootstrapUtil, 
HadoopConfigSpec, HadoopConfigurationStep}
+import org.apache.spark.internal.Logging
+
+ /**
+  * This is the main method that runs the hadoopConfigurationSteps defined
+  * by the HadoopStepsOrchestrator. These steps are run to modify the
+  * SparkPod and Kubernetes Resources using the additive method of the 
feature steps
+  */
+private[spark] class HadoopGlobalFeatureDriverStep(
+  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+  extends KubernetesFeatureConfigStep with Logging {
+   private val hadoopTestOrchestrator =
+ kubernetesConf.getHadoopStepsOrchestrator
+   require(kubernetesConf.hadoopConfDir.isDefined &&
+ hadoopTestOrchestrator.isDefined, "Ensure that HADOOP_CONF_DIR is 
defined")
+   private val hadoopSteps =
+ hadoopTestOrchestrator
+   .map(hto => hto.getHadoopSteps(kubernetesConf.getTokenManager))
+ .getOrElse(Seq.empty[HadoopConfigurationStep])
+
+   var currentHadoopSpec = HadoopConfigSpec(
+ configMapProperties = Map.empty[String, String],
+ dtSecret = None,
+ dtSecretName = KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME,
+ dtSecretItemKey = None,
+ jobUserName = None)
+
+   for (nextStep <- hadoopSteps) {
+ currentHadoopSpec = nextStep.configureHadoopSpec(currentHadoopSpec)
+   }
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+val hadoopBasedSparkPod = HadoopBootstrapUtil.bootstrapHadoopConfDir(
+  kubernetesConf.hadoopConfDir.get,
+  kubernetesConf.getHadoopConfigMapName,
+  kubernetesConf.getTokenManager,
+  pod)
+
+val maybeKerberosModification =
+  for {
+secretItemKey <- currentHadoopSpec.dtSecretItemKey
+userName <- currentHadoopSpec.jobUserName
+krb5fileLocation <- 
kubernetesConf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+  } yield {
+HadoopBootstrapUtil.bootstrapKerberosPod(
+  currentHadoopSpec.dtSecretName,
+  secretItemKey,
+  userName,
+  krb5fileLocation,
+  kubernetesConf.getKRBConfigMapName,
+  hadoopBasedSparkPod)
+  }
+maybeKerberosModification.getOrElse(
+  HadoopBootstrapUtil.bootstrapSparkUserPod(
+kubernetesConf.getTokenManager.getCurrentUser.getShortUserName,
+hadoopBasedSparkPod))
+  }
+
+  override def getAdditionalPodSystemProperties(): Map[String, String] = {
+val maybeKerberosConfValues =
+  for {
+secretItemKey <- currentHadoopSpec.dtSecretItemKey
+userName <- currentHadoopSpec.jobUserName
+  } yield {
+Map(KERBEROS_KEYTAB_SECRET_NAME -> currentHadoopSpec.dtSecretName,
+  KERBEROS_KEYTAB_SECRET_KEY -> secretItemKey,
+  KERBEROS_SPARK_USER_NAME -> userName)
+}
--- End diff --

nit: more indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215435018
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -109,7 +112,8 @@ private[spark] class Client(
   def run(): Unit = {
 val resolvedDriverSpec = builder.buildFromFeatures(kubernetesConf)
 val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map"
-val configMap = buildConfigMap(configMapName, 
resolvedDriverSpec.systemProperties)
+val configMap =
--- End diff --

Fits in one line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215431778
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, 
ContainerBuilder, KeyToPathBuilder, PodBuilder}
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param fileLocation Location of the krb5 file
+* @param krb5ConfName Name of the ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+  dtSecretName: String,
+  dtSecretItemKey: String,
+  userName: String,
+  fileLocation: String,
+  krb5ConfName: String,
+  pod: SparkPod) : SparkPod = {
+  val krb5File = new File(fileLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  val kerberizedPod = new PodBuilder(pod.pod)
+.editOrNewSpec()
+  .addNewVolume()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withNewSecret()
+  .withSecretName(dtSecretName)
+  .endSecret()
+.endVolume()
+  .addNewVolume()
+.withName(KRB_FILE_VOLUME)
+  .withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+  .endVolume()
+  .endSpec()
+.build()
+  val kerberizedContainer = new ContainerBuilder(pod.container)
+.addNewVolumeMount()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+  .endVolumeMount()
+.addNewVolumeMount()
+  .withName(KRB_FILE_VOLUME)
+  .withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf")
+  .withSubPath("krb5.conf")
+  .endVolumeMount()
+.addNewEnv()
+  .withName(ENV_HADOOP_TOKEN_FILE_LOCATION)
+  
.withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey")
+  .endEnv()
+.addNewEnv()
+  .withName(ENV_SPARK_USER)
+  .withValue(userName)
+  .endEnv()
+.build()
+SparkPod(kerberizedPod, kerberizedContainer)
+  }
+
+   /**
+* setting ENV_SPARK_USER when HADOOP_FILES are detected
+*
+* @param sparkUserName Name of the SPARK_USER
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapSparkUserPod(
+ sparkUserName: String,
--- End diff --

this is actually indented 3 spaces from the `def`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215434370
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.security
+
+import java.io.File
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Clock, SystemClock}
+
+ /**
+  * The KubernetesHadoopDelegationTokenManager fetches and updates Hadoop 
delegation tokens
+  * on the behalf of the Kubernetes submission client. It is modeled after 
the YARN
+  * AMCredentialRenewer, renewals in Kubernetes happen in a seperate 
microservice that will
--- End diff --

I commented on the spec, and I have issues with this microservice thingy, 
but since it's not being added in this PR the comment should not refer to it; 
instead just say renewals are not yet supported.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #95730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95730/testReport)**
 for PR 22305 at commit 
[`09a79c2`](https://github.com/apache/spark/commit/09a79c216a42b3f7a20e4aaef6f56bcf79e405cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2884/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95726/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22325
  
**[Test build #95726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95726/testReport)**
 for PR 22325 at commit 
[`ec069b3`](https://github.com/apache/spark/commit/ec069b3d269b6ec2e6ca9edcc0f6981ccecdad3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22221: [SPARK-25231] : Fix synchronization of executor h...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-09-05 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22112
  
testing so far looks good.  I'm +1 for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22221: [SPARK-25231] : Fix synchronization of executor heartbea...

2018-09-05 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/1
  
merged to master and 2.3, thanks @pgandhi999 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #95725 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95725/testReport)**
 for PR 20999 at commit 
[`148f477`](https://github.com/apache/spark/commit/148f47742cae892260c46f9ffa97bb2d0422701d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-09-05 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22209
  
would be nice to put into 2.3 as well, I realize we are close to rc but I 
don't think we should stop backporting fixes since I don't expect 2.4 to be 
stable for a while.  If we stop for a bit for this rc we should have some way 
to track to pull back after rc. thoughts?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...

2018-09-05 Thread arunmahadevan
Github user arunmahadevan closed the pull request at:

https://github.com/apache/spark/pull/22299


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r215402140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -921,12 +924,18 @@ class Analyzer(
 failAnalysis("Invalid usage of '*' in explode/json_tuple/UDTF")
 
   // To resolve duplicate expression IDs for Join and Intersect
-  case j @ Join(left, right, _, _) if !j.duplicateResolved =>
-j.copy(right = dedupRight(left, right))
+  case j @ Join(left, right, _, condition) if !j.duplicateResolved =>
+val (dedupedRight, attributeRewrites) = dedupRight(left, right)
+val changedCondition = condition.map(_.transform {
+  case attr: Attribute if attr.resolved => 
attributeRewrites.getOrElse(attr, attr)
--- End diff --

This looks fishy. We will not rewrite attributes in the join condition when 
they are not resolved? What will happen then?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread NiharS
Github user NiharS commented on the issue:

https://github.com/apache/spark/pull/22192
  
You're right, I ran in local-cluster and it exited very quickly citing 
executors shutting down after not being able to find my test plugin. Although, 
the logs say that it does use a CoarseGrainedExecutorBackend:

`18/09/05 12:03:20 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
spark://CoarseGrainedScheduler@nihar-xxx:45767`

unless you mean Yarn only uses that one command line option. I'm looking 
into how it reacts in regular standalone mode with different thread counts. 
Thanks for pointing this out!

I'm looking up other spark-submit options that can be used to provide the 
jar to other nodes, but I'm not super hopeful it's going to work out. If it 
indeed doesn't, I'll start exploring other options once I figure out why the 
pyspark tests are failing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-05 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r215396772
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
+  var wrapCorruption: Boolean = false
+  try {
+input = streamWrapper(blockId, in)
+  } catch {
+case e: IOException =>
+  buf.release()
+  logWarning(s"got a corrupted block $blockId from $address 
while wrapping it" +
+s" locally, fetch again", e)
+  corruptedBlocks += blockId
+  fetchRequests += FetchRequest(address, Array((blockId, 
size)))
+  wrapCorruption = true
+  result = null
+  in.close
+  }
   // Only copy the stream if it's wrapped by compression or 
encryption, also the size of
   // block is small (the decompressed block is smaller than 
maxBytesInFlight)
-  if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 
3) {
+  if (detectCorrupt && !wrapCorruption && !input.eq(in) && size < 
maxBytesInFlight / 3) {
 val originalInput = input
--- End diff --

@davies would appreciate your comments about this change as well. Also I 
have a qq. What is the use of originalInput? it seems that this var isn'e used 
any where except in the finally clause. I think it can be removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged into master. Thanks @WeichenXu123 for the implementation and 
everyone for the review! I created the following JIRAs as follow-ups:

* deprecate ImageSchema: https://issues.apache.org/jira/browse/SPARK-25345
* list built-in data sources in doc site: 
https://issues.apache.org/jira/browse/SPARK-25346
* doc for image data source: 
https://issues.apache.org/jira/browse/SPARK-25347
* data source for binary files: 
https://issues.apache.org/jira/browse/SPARK-25348
* support sample pushdown: https://issues.apache.org/jira/browse/SPARK-25349


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-05 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r215391696
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
+  var wrapCorruption: Boolean = false
--- End diff --

yeah that is right. I will change that


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r215391569
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -805,10 +807,10 @@ class Analyzer(
* that this rule cannot handle. When that is the case, there 
must be another rule
* that resolves these conflicts. Otherwise, the analysis will 
fail.
*/
-  right
+  (right, AttributeMap.empty)
--- End diff --

`AttributeMap.empty[Attribute]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r215391402
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
 ---
@@ -23,12 +23,14 @@ package org.apache.spark.sql.catalyst.expressions
  * of the name, or the expected nullability).
  */
 object AttributeMap {
+  val empty = new AttributeMap(Map.empty)
--- End diff --

```Scala
def empty[A]: AttributeMap[A] = new AttributeMap(Map.empty)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r215391459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
 ---
@@ -23,12 +23,14 @@ package org.apache.spark.sql.catalyst.expressions
  * of the name, or the expected nullability).
  */
 object AttributeMap {
+  val empty = new AttributeMap(Map.empty)
+
   def apply[A](kvs: Seq[(Attribute, A)]): AttributeMap[A] = {
 new AttributeMap(kvs.map(kv => (kv._1.exprId, kv)).toMap)
   }
 }
 
-class AttributeMap[A](val baseMap: Map[ExprId, (Attribute, A)])
+class AttributeMap[+A](val baseMap: Map[ExprId, (Attribute, A)])
--- End diff --

revert the change here. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22328


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r215390096
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
+  var wrapCorruption: Boolean = false
--- End diff --

The reason for the error is the same (corruption in the source data), so a 
different error message doesn't really add any value. The exception will 
contain the line where it happened so that's enough for debugging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-05 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r215388601
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,10 +444,23 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
+  var wrapCorruption: Boolean = false
--- End diff --

Thanks for the review. I didn't do that to give more specific warning 
message and being able to point whether the problem happened during wrapping or 
after, but if there isn'r any value in separating that I can change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2883/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22318
  
**[Test build #95723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)**
 for PR 22318 at commit 
[`809b8a8`](https://github.com/apache/spark/commit/809b8a83b7ec3d62ba6d65f6aff6a7d3175bd3e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][Scheduler] Acquire new executors to avoid ...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22288
  
**[Test build #95729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95729/testReport)**
 for PR 22288 at commit 
[`87c4e57`](https://github.com/apache/spark/commit/87c4e57bb966078c8a78eabc5a5e4b6f60c78f28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22328
  
**[Test build #95727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95727/testReport)**
 for PR 22328 at commit 
[`218ce4c`](https://github.com/apache/spark/commit/218ce4cf796308c8705a27889b25100e2b779365).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95721/testReport)**
 for PR 22338 at commit 
[`6cb94ed`](https://github.com/apache/spark/commit/6cb94ed5ed76e23e4fb775a2fd6f0e66e9c15abd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Thank you for approval, @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ig...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22336


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22336
  
Wow, Great. Thanks, @gatorsmile ! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22335: [SPARK-25091][SQL] reduce the storage memory in Executor...

2018-09-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22335
  
Also, this should be `[core]` not `[sql]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22336
  
Sorry,,, I just merged it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22336
  
Finally, it's passed. Can I merge this, @cloud-fan and @HyukjinKwon ? :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22336
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22336
  
**[Test build #95722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95722/testReport)**
 for PR 22336 at commit 
[`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >