date:20181025

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22845
  
**[Test build #98074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98074/testReport)**
 for PR 22845 at commit 
[`a10eb1a`](https://github.com/apache/spark/commit/a10eb1aa8f06fc94fa097c2ab9023a67256d30c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22844
  
**[Test build #98075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98075/testReport)**
 for PR 22844 at commit 
[`62af4fd`](https://github.com/apache/spark/commit/62af4fd4182f9b63f529efbcb51c15535e200a5b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Improve BenchmarkWideTab...

2018-10-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22823#discussion_r228409979
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
 ---
@@ -1,52 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.benchmark
-
-import org.apache.spark.benchmark.Benchmark
-
-/**
- * Benchmark to measure performance for wide table.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.BenchmarkWideTable"
- *
- * Benchmarks in this file are skipped in normal builds.
- */
-class BenchmarkWideTable extends BenchmarkWithCodegen {
-
-  ignore("project on wide table") {
-val N = 1 << 20
-val df = sparkSession.range(N)
-val columns = (0 until 400).map{ i => s"id as id$i"}
-val benchmark = new Benchmark("projection on wide table", N)
-benchmark.addCase("wide table", numIters = 5) { iter =>
-  df.selectExpr(columns : _*).queryExecution.toRdd.count()
-}
-benchmark.run()
-
-/**
- * Here are some numbers with different split threshold:
- *
- *  Split threshold  methods   Rate(M/s)   Per Row(ns)
- *  10   400   0.4 2279
- *  100  200   0.6 1554
- *  1k   370.9 1116
--- End diff --

I think we should have a PR to add this config officially. It should be 
useful for performance tunning.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22771: [SPARK-25773][Core]Cancel zombie tasks in a resul...

2018-10-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/22771#discussion_r228409787
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1364,6 +1385,21 @@ private[spark] class DAGScheduler(
   if (job.numFinished == job.numPartitions) {
 markStageAsFinished(resultStage)
 cleanupStateForJobAndIndependentStages(job)
+try {
+  // killAllTaskAttempts will fail if a 
SchedulerBackend does not implement
+  // killTask.
+  logInfo(s"Job ${job.jobId} is finished. Killing 
potential speculative or " +
+s"zombie tasks for this job")
--- End diff --

I created https://issues.apache.org/jira/browse/SPARK-25849 to improve the 
document.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22771
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22771
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22771
  
**[Test build #98073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98073/testReport)**
 for PR 22771 at commit 
[`2e03290`](https://github.com/apache/spark/commit/2e0329039435b7bc61ef0370490efe45ba8048c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-25 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/22771
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs in...

2018-10-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22841


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22841
  
thanks, merging to master/2.4!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Improve BenchmarkWideTab...

2018-10-25 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22823#discussion_r228407600
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -910,12 +910,14 @@ class CodegenContext {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
 var length = 0
+val splitThreshold =
+  SQLConf.get.getConfString("spark.testing.codegen.splitThreshold", 
"1024").toInt
--- End diff --

Personally I don't think this is a good solution:
1. The configuration contains "testing", which is super wired as it can be 
used in production.
2. We should start a new discuss about whether to make it configurable. The 
reason should not be making the benchmark easier.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22843
  
**[Test build #98070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98070/testReport)**
 for PR 22843 at commit 
[`60c5808`](https://github.com/apache/spark/commit/60c5808ddd72f0f41cb33208268dfac3da5baa03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98070/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22816: [SPARK-25822][PySpark]Fix a race condition when r...

2018-10-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22816


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22816: [SPARK-25822][PySpark]Fix a race condition when releasin...

2018-10-25 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22816
  
Thanks! merging to master/2.4/2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22816: [SPARK-25822][PySpark]Fix a race condition when releasin...

2018-10-25 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22816
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread functicons

Github user functicons commented on the issue:

https://github.com/apache/spark/pull/21588
  
Do we really want to switch to Hive 2.3? From this page 
https://hive.apache.org/downloads.html, Hive 2.3 works with Hadoop 2.x (Hive 
3.x works with Hadoop 3.x).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228400112
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/security/KubernetesHadoopDelegationTokenManager.scala
 ---
@@ -18,45 +18,20 @@
 package org.apache.spark.deploy.k8s.security
 
 import org.apache.hadoop.conf.Configuration
-import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.apache.hadoop.security.UserGroupInformation
 
 import org.apache.spark.SparkConf
-import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.deploy.security.HadoopDelegationTokenManager
-import org.apache.spark.internal.Logging
 
 /**
- * The KubernetesHadoopDelegationTokenManager fetches Hadoop delegation 
tokens
- * on the behalf of the Kubernetes submission client. The new credentials
- * (called Tokens when they are serialized) are stored in Secrets 
accessible
- * to the driver and executors, when new Tokens are received they 
overwrite the current Secrets.
+ * Adds Kubernetes-specific functionality to HadoopDelegationTokenManager.
  */
 private[spark] class KubernetesHadoopDelegationTokenManager(
--- End diff --

this class doesn't really seem necessary anymore, but not a big deal


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228398765
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -110,32 +209,105 @@ private[spark] class HadoopDelegationTokenManager(
   }
 
   /**
-   * Get delegation token provider for the specified service.
+   * List of file systems for which to obtain delegation tokens. The base 
implementation
+   * returns just the default file system in the given Hadoop 
configuration.
*/
-  def getServiceDelegationTokenProvider(service: String): 
Option[HadoopDelegationTokenProvider] = {
-delegationTokenProviders.get(service)
+  protected def fileSystemsToAccess(): Set[FileSystem] = {
+Set(FileSystem.get(hadoopConf))
+  }
+
+  private def scheduleRenewal(delay: Long): Unit = {
+val _delay = math.max(0, delay)
+logInfo(s"Scheduling login from keytab in 
${UIUtils.formatDuration(delay)}.")
+
+val renewalTask = new Runnable() {
+  override def run(): Unit = {
+updateTokensTask()
+  }
+}
+renewalExecutor.schedule(renewalTask, _delay, TimeUnit.MILLISECONDS)
   }
 
   /**
-   * Writes delegation tokens to creds.  Delegation tokens are fetched 
from all registered
-   * providers.
-   *
-   * @param hadoopConf hadoop Configuration
-   * @param creds Credentials that will be updated in place (overwritten)
-   * @return Time after which the fetched delegation tokens should be 
renewed.
+   * Periodic task to login to the KDC and create new delegation tokens. 
Re-schedules itself
+   * to fetch the next set of tokens when needed.
*/
-  def obtainDelegationTokens(
-  hadoopConf: Configuration,
-  creds: Credentials): Long = {
-delegationTokenProviders.values.flatMap { provider =>
-  if (provider.delegationTokensRequired(sparkConf, hadoopConf)) {
-provider.obtainDelegationTokens(hadoopConf, sparkConf, creds)
+  private def updateTokensTask(): Unit = {
+try {
+  val freshUGI = doLogin()
+  val creds = obtainTokensAndScheduleRenewal(freshUGI)
+  val tokens = SparkHadoopUtil.get.serialize(creds)
+
+  val driver = driverRef.get()
+  if (driver != null) {
+logInfo("Updating delegation tokens.")
+driver.send(UpdateDelegationTokens(tokens))
   } else {
-logDebug(s"Service ${provider.serviceName} does not require a 
token." +
-  s" Check your configuration to see if security is disabled or 
not.")
-None
+// This shouldn't really happen, since the driver should register 
way before tokens expire.
+logWarning("Delegation tokens close to expiration but no driver 
has registered yet.")
+SparkHadoopUtil.get.addDelegationTokens(tokens, sparkConf)
   }
-}.foldLeft(Long.MaxValue)(math.min)
+} catch {
+  case e: Exception =>
+val delay = 
TimeUnit.SECONDS.toMillis(sparkConf.get(CREDENTIALS_RENEWAL_RETRY_WAIT))
+logWarning(s"Failed to update tokens, will try again in 
${UIUtils.formatDuration(delay)}!" +
+  " If this happens too often tasks will fail.", e)
+scheduleRenewal(delay)
+}
+  }
+
+  /**
+   * Obtain new delegation tokens from the available providers. Schedules 
a new task to fetch
+   * new tokens before the new set expires.
+   *
+   * @return Credentials containing the new tokens.
+   */
+  private def obtainTokensAndScheduleRenewal(ugi: UserGroupInformation): 
Credentials = {
+ugi.doAs(new PrivilegedExceptionAction[Credentials]() {
+  override def run(): Credentials = {
+val creds = new Credentials()
+val nextRenewal = obtainDelegationTokens(creds)
+
+// Calculate the time when new credentials should be created, 
based on the configured
+// ratio.
+val now = System.currentTimeMillis
+val ratio = sparkConf.get(CREDENTIALS_RENEWAL_INTERVAL_RATIO)
+val adjustedNextRenewal = (now + (ratio * (nextRenewal - 
now))).toLong
+
+scheduleRenewal(adjustedNextRenewal - now)
+creds
+  }
+})
+  }
+
+  private def doLogin(): UserGroupInformation = {
+logInfo(s"Attempting to login to KDC using principal: $principal")
+val ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
+logInfo("Successfully logged into KDC.")
+ugi
+  }
+
+  private def loadProviders(): Map[String, HadoopDelegationTokenProvider] 
= {
+val providers = Seq(new 
HadoopFSDelegationTokenProvider(fileSystemsToAccess)) ++
+

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228400837
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -17,76 +17,175 @@
 
 package org.apache.spark.deploy.security
 
+import java.io.File
+import java.security.PrivilegedExceptionAction
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.AtomicReference
+
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
 
 import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.rpc.RpcEndpointRef
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.UpdateDelegationTokens
+import org.apache.spark.ui.UIUtils
+import org.apache.spark.util.ThreadUtils
 
 /**
- * Manages all the registered HadoopDelegationTokenProviders and offer 
APIs for other modules to
- * obtain delegation tokens and their renewal time. By default 
[[HadoopFSDelegationTokenProvider]],
- * [[HiveDelegationTokenProvider]] and [[HBaseDelegationTokenProvider]] 
will be loaded in if not
- * explicitly disabled.
+ * Manager for delegation tokens in a Spark application.
+ *
+ * This manager has two modes of operation:
+ *
+ * 1.  When configured with a principal and a keytab, it will make sure 
long-running apps can run
+ * without interruption while accessing secured services. It periodically 
logs in to the KDC with
+ * user-provided credentials, and contacts all the configured secure 
services to obtain delegation
+ * tokens to be distributed to the rest of the application.
+ *
+ * Because the Hadoop UGI API does not expose the TTL of the TGT, a 
configuration controls how often
+ * to check that a relogin is necessary. This is done reasonably often 
since the check is a no-op
+ * when the relogin is not yet needed. The check period can be overridden 
in the configuration.
  *
- * Also, each HadoopDelegationTokenProvider is controlled by
- * spark.security.credentials.{service}.enabled, and will not be loaded if 
this config is set to
- * false. For example, Hive's delegation token provider 
[[HiveDelegationTokenProvider]] can be
- * enabled/disabled by the configuration 
spark.security.credentials.hive.enabled.
+ * New delegation tokens are created once 75% of the renewal interval of 
the original tokens has
+ * elapsed. The new tokens are sent to the Spark driver endpoint once it's 
registered with the AM.
+ * The driver is tasked with distributing the tokens to other processes 
that might need them.
  *
- * @param sparkConf Spark configuration
- * @param hadoopConf Hadoop configuration
- * @param fileSystems Delegation tokens will be fetched for these Hadoop 
filesystems.
+ * 2. When operating without an explicit principal and keytab, token 
renewal will not be available.
+ * Starting the manager will distribute an initial set of delegation 
tokens to the provided Spark
+ * driver, but the app will not get new tokens when those expire.
+ *
+ * It can also be used just to create delegation tokens, by calling the 
`obtainDelegationTokens`
+ * method. This option does not require calling the `start` method, but 
leaves it up to the
+ * caller to distribute the tokens that were generated.
  */
 private[spark] class HadoopDelegationTokenManager(
-sparkConf: SparkConf,
-hadoopConf: Configuration,
-fileSystems: Configuration => Set[FileSystem])
-  extends Logging {
+protected val sparkConf: SparkConf,
+protected val hadoopConf: Configuration) extends Logging {
 
   private val deprecatedProviderEnabledConfigs = List(
 "spark.yarn.security.tokens.%s.enabled",
 "spark.yarn.security.credentials.%s.enabled")
   private val providerEnabledConfig = 
"spark.security.credentials.%s.enabled"
 
-  // Maintain all the registered delegation token providers
-  private val delegationTokenProviders = getDelegationTokenProviders
+  private val principal = sparkConf.get(PRINCIPAL).orNull
+  private val keytab = sparkConf.get(KEYTAB).orNull
+
+  if (principal != null) {
+require(keytab != null, "Kerberos principal specified without a 
keytab.")
+require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
+  }
+
+  private val delegationTokenProviders = loadProviders()
   logDebug("Using the following builtin delegation token providers: " +

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228400760
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -17,76 +17,175 @@
 
 package org.apache.spark.deploy.security
 
+import java.io.File
+import java.security.PrivilegedExceptionAction
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.AtomicReference
+
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
 
 import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.rpc.RpcEndpointRef
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.UpdateDelegationTokens
+import org.apache.spark.ui.UIUtils
+import org.apache.spark.util.ThreadUtils
 
 /**
- * Manages all the registered HadoopDelegationTokenProviders and offer 
APIs for other modules to
- * obtain delegation tokens and their renewal time. By default 
[[HadoopFSDelegationTokenProvider]],
- * [[HiveDelegationTokenProvider]] and [[HBaseDelegationTokenProvider]] 
will be loaded in if not
- * explicitly disabled.
+ * Manager for delegation tokens in a Spark application.
+ *
+ * This manager has two modes of operation:
+ *
+ * 1.  When configured with a principal and a keytab, it will make sure 
long-running apps can run
+ * without interruption while accessing secured services. It periodically 
logs in to the KDC with
+ * user-provided credentials, and contacts all the configured secure 
services to obtain delegation
+ * tokens to be distributed to the rest of the application.
+ *
+ * Because the Hadoop UGI API does not expose the TTL of the TGT, a 
configuration controls how often
+ * to check that a relogin is necessary. This is done reasonably often 
since the check is a no-op
+ * when the relogin is not yet needed. The check period can be overridden 
in the configuration.
  *
- * Also, each HadoopDelegationTokenProvider is controlled by
- * spark.security.credentials.{service}.enabled, and will not be loaded if 
this config is set to
- * false. For example, Hive's delegation token provider 
[[HiveDelegationTokenProvider]] can be
- * enabled/disabled by the configuration 
spark.security.credentials.hive.enabled.
+ * New delegation tokens are created once 75% of the renewal interval of 
the original tokens has
+ * elapsed. The new tokens are sent to the Spark driver endpoint once it's 
registered with the AM.
+ * The driver is tasked with distributing the tokens to other processes 
that might need them.
  *
- * @param sparkConf Spark configuration
- * @param hadoopConf Hadoop configuration
- * @param fileSystems Delegation tokens will be fetched for these Hadoop 
filesystems.
+ * 2. When operating without an explicit principal and keytab, token 
renewal will not be available.
+ * Starting the manager will distribute an initial set of delegation 
tokens to the provided Spark
+ * driver, but the app will not get new tokens when those expire.
+ *
+ * It can also be used just to create delegation tokens, by calling the 
`obtainDelegationTokens`
+ * method. This option does not require calling the `start` method, but 
leaves it up to the
+ * caller to distribute the tokens that were generated.
  */
 private[spark] class HadoopDelegationTokenManager(
-sparkConf: SparkConf,
-hadoopConf: Configuration,
-fileSystems: Configuration => Set[FileSystem])
-  extends Logging {
+protected val sparkConf: SparkConf,
+protected val hadoopConf: Configuration) extends Logging {
 
   private val deprecatedProviderEnabledConfigs = List(
 "spark.yarn.security.tokens.%s.enabled",
 "spark.yarn.security.credentials.%s.enabled")
   private val providerEnabledConfig = 
"spark.security.credentials.%s.enabled"
 
-  // Maintain all the registered delegation token providers
-  private val delegationTokenProviders = getDelegationTokenProviders
+  private val principal = sparkConf.get(PRINCIPAL).orNull
+  private val keytab = sparkConf.get(KEYTAB).orNull
+
+  if (principal != null) {
+require(keytab != null, "Kerberos principal specified without a 
keytab.")
+require(new File(keytab).isFile(), s"Cannot find keytab at $keytab.")
+  }
+
+  private val delegationTokenProviders = loadProviders()
   logDebug("Using the following builtin delegation token providers: " +

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228397209
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -17,76 +17,175 @@
 
 package org.apache.spark.deploy.security
 
+import java.io.File
+import java.security.PrivilegedExceptionAction
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+import java.util.concurrent.atomic.AtomicReference
+
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.FileSystem
-import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
 
 import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.rpc.RpcEndpointRef
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.UpdateDelegationTokens
+import org.apache.spark.ui.UIUtils
+import org.apache.spark.util.ThreadUtils
 
 /**
- * Manages all the registered HadoopDelegationTokenProviders and offer 
APIs for other modules to
- * obtain delegation tokens and their renewal time. By default 
[[HadoopFSDelegationTokenProvider]],
- * [[HiveDelegationTokenProvider]] and [[HBaseDelegationTokenProvider]] 
will be loaded in if not
- * explicitly disabled.
+ * Manager for delegation tokens in a Spark application.
+ *
+ * This manager has two modes of operation:
+ *
+ * 1.  When configured with a principal and a keytab, it will make sure 
long-running apps can run
+ * without interruption while accessing secured services. It periodically 
logs in to the KDC with
+ * user-provided credentials, and contacts all the configured secure 
services to obtain delegation
+ * tokens to be distributed to the rest of the application.
+ *
+ * Because the Hadoop UGI API does not expose the TTL of the TGT, a 
configuration controls how often
+ * to check that a relogin is necessary. This is done reasonably often 
since the check is a no-op
+ * when the relogin is not yet needed. The check period can be overridden 
in the configuration.
  *
- * Also, each HadoopDelegationTokenProvider is controlled by
- * spark.security.credentials.{service}.enabled, and will not be loaded if 
this config is set to
- * false. For example, Hive's delegation token provider 
[[HiveDelegationTokenProvider]] can be
- * enabled/disabled by the configuration 
spark.security.credentials.hive.enabled.
+ * New delegation tokens are created once 75% of the renewal interval of 
the original tokens has
+ * elapsed. The new tokens are sent to the Spark driver endpoint once it's 
registered with the AM.
+ * The driver is tasked with distributing the tokens to other processes 
that might need them.
  *
- * @param sparkConf Spark configuration
- * @param hadoopConf Hadoop configuration
- * @param fileSystems Delegation tokens will be fetched for these Hadoop 
filesystems.
+ * 2. When operating without an explicit principal and keytab, token 
renewal will not be available.
+ * Starting the manager will distribute an initial set of delegation 
tokens to the provided Spark
+ * driver, but the app will not get new tokens when those expire.
+ *
+ * It can also be used just to create delegation tokens, by calling the 
`obtainDelegationTokens`
+ * method. This option does not require calling the `start` method, but 
leaves it up to the
+ * caller to distribute the tokens that were generated.
  */
 private[spark] class HadoopDelegationTokenManager(
-sparkConf: SparkConf,
-hadoopConf: Configuration,
-fileSystems: Configuration => Set[FileSystem])
-  extends Logging {
+protected val sparkConf: SparkConf,
+protected val hadoopConf: Configuration) extends Logging {
 
   private val deprecatedProviderEnabledConfigs = List(
 "spark.yarn.security.tokens.%s.enabled",
 "spark.yarn.security.credentials.%s.enabled")
   private val providerEnabledConfig = 
"spark.security.credentials.%s.enabled"
 
-  // Maintain all the registered delegation token providers
-  private val delegationTokenProviders = getDelegationTokenProviders
+  private val principal = sparkConf.get(PRINCIPAL).orNull
+  private val keytab = sparkConf.get(KEYTAB).orNull
+
+  if (principal != null) {
+require(keytab != null, "Kerberos principal specified without a 
keytab.")
--- End diff --

what if the keytab is specified but not the principal? shoudl this be the 
same check as in Client.scala

[GitHub] spark pull request #22624: [SPARK-23781][CORE] Merge token renewer functiona...

2018-10-25 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/22624#discussion_r228398489
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -110,32 +209,105 @@ private[spark] class HadoopDelegationTokenManager(
   }
 
   /**
-   * Get delegation token provider for the specified service.
+   * List of file systems for which to obtain delegation tokens. The base 
implementation
+   * returns just the default file system in the given Hadoop 
configuration.
*/
-  def getServiceDelegationTokenProvider(service: String): 
Option[HadoopDelegationTokenProvider] = {
-delegationTokenProviders.get(service)
+  protected def fileSystemsToAccess(): Set[FileSystem] = {
+Set(FileSystem.get(hadoopConf))
+  }
+
+  private def scheduleRenewal(delay: Long): Unit = {
+val _delay = math.max(0, delay)
+logInfo(s"Scheduling login from keytab in 
${UIUtils.formatDuration(delay)}.")
+
+val renewalTask = new Runnable() {
+  override def run(): Unit = {
+updateTokensTask()
+  }
+}
+renewalExecutor.schedule(renewalTask, _delay, TimeUnit.MILLISECONDS)
   }
 
   /**
-   * Writes delegation tokens to creds.  Delegation tokens are fetched 
from all registered
-   * providers.
-   *
-   * @param hadoopConf hadoop Configuration
-   * @param creds Credentials that will be updated in place (overwritten)
-   * @return Time after which the fetched delegation tokens should be 
renewed.
+   * Periodic task to login to the KDC and create new delegation tokens. 
Re-schedules itself
+   * to fetch the next set of tokens when needed.
*/
-  def obtainDelegationTokens(
-  hadoopConf: Configuration,
-  creds: Credentials): Long = {
-delegationTokenProviders.values.flatMap { provider =>
-  if (provider.delegationTokensRequired(sparkConf, hadoopConf)) {
-provider.obtainDelegationTokens(hadoopConf, sparkConf, creds)
+  private def updateTokensTask(): Unit = {
+try {
+  val freshUGI = doLogin()
+  val creds = obtainTokensAndScheduleRenewal(freshUGI)
+  val tokens = SparkHadoopUtil.get.serialize(creds)
+
+  val driver = driverRef.get()
+  if (driver != null) {
+logInfo("Updating delegation tokens.")
+driver.send(UpdateDelegationTokens(tokens))
   } else {
-logDebug(s"Service ${provider.serviceName} does not require a 
token." +
-  s" Check your configuration to see if security is disabled or 
not.")
-None
+// This shouldn't really happen, since the driver should register 
way before tokens expire.
+logWarning("Delegation tokens close to expiration but no driver 
has registered yet.")
+SparkHadoopUtil.get.addDelegationTokens(tokens, sparkConf)
   }
-}.foldLeft(Long.MaxValue)(math.min)
+} catch {
+  case e: Exception =>
+val delay = 
TimeUnit.SECONDS.toMillis(sparkConf.get(CREDENTIALS_RENEWAL_RETRY_WAIT))
+logWarning(s"Failed to update tokens, will try again in 
${UIUtils.formatDuration(delay)}!" +
+  " If this happens too often tasks will fail.", e)
+scheduleRenewal(delay)
+}
+  }
+
+  /**
+   * Obtain new delegation tokens from the available providers. Schedules 
a new task to fetch
+   * new tokens before the new set expires.
+   *
+   * @return Credentials containing the new tokens.
+   */
+  private def obtainTokensAndScheduleRenewal(ugi: UserGroupInformation): 
Credentials = {
+ugi.doAs(new PrivilegedExceptionAction[Credentials]() {
+  override def run(): Credentials = {
+val creds = new Credentials()
+val nextRenewal = obtainDelegationTokens(creds)
+
+// Calculate the time when new credentials should be created, 
based on the configured
+// ratio.
+val now = System.currentTimeMillis
+val ratio = sparkConf.get(CREDENTIALS_RENEWAL_INTERVAL_RATIO)
+val adjustedNextRenewal = (now + (ratio * (nextRenewal - 
now))).toLong
+
+scheduleRenewal(adjustedNextRenewal - now)
--- End diff --

you're adding `now` and subtracting it off again, instead you could do

```scala
val adjustedRenewalDelay = (ratio * (nextRenewal - now)).toLong
scheduleRenewal(adjustedRenewalDelay)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22845
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98071/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22844
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22844
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98072/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22845
  
**[Test build #98071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98071/testReport)**
 for PR 22845 at commit 
[`9ddb847`](https://github.com/apache/spark/commit/9ddb8476544fa34b15fbe15387e1b4983d4d76d4).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22844
  
**[Test build #98072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98072/testReport)**
 for PR 22844 at commit 
[`937111f`](https://github.com/apache/spark/commit/937111f7f53744c8fe1a6b4fd0559643743eefae).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22845
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22844
  
**[Test build #98072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98072/testReport)**
 for PR 22844 at commit 
[`937111f`](https://github.com/apache/spark/commit/937111f7f53744c8fe1a6b4fd0559643743eefae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22845
  
**[Test build #98071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98071/testReport)**
 for PR 22845 at commit 
[`9ddb847`](https://github.com/apache/spark/commit/9ddb8476544fa34b15fbe15387e1b4983d4d76d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21588
  
Sounds like we should try this then




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22844
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22845
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22845
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22845
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22845
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22844
  
cc @dongjoon-hyun, @wangyum


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use m...

2018-10-25 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22845
  
cc @dongjoon-hyun, @wangyum


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks t...

2018-10-25 Thread heary-cao

GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/22845

[SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method

## What changes were proposed in this pull request?

use spark-submit:
bin/spark-submit --class  
org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks --jars 
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar 
./sql/catalyst/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks"

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark CSVBenchmarks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22845


commit 9ddb8476544fa34b15fbe15387e1b4983d4d76d4
Author: caoxuewen 
Date:   2018-10-26T04:07:48Z

Refactor CSVBenchmarks to use main method




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22843
  
**[Test build #98070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98070/testReport)**
 for PR 22843 at commit 
[`60c5808`](https://github.com/apache/spark/commit/60c5808ddd72f0f41cb33208268dfac3da5baa03).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22844
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4524/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22844
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22844
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-25 Thread heary-cao

GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/22844

[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method

## What changes were proposed in this pull request?

Refactor JSONBenchmarks to use main method

use spark-submit:
bin/spark-submit --class  
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars 
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar 
./sql/catalyst/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks"
  

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark JSONBenchmarks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22844


commit 937111f7f53744c8fe1a6b4fd0559643743eefae
Author: caoxuewen 
Date:   2018-10-26T03:52:31Z

Refactor JSONBenchmarks to use main method




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22840
  
Oops. My bad. I'll monitor the branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22843
  
**[Test build #98069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98069/testReport)**
 for PR 22843 at commit 
[`3864490`](https://github.com/apache/spark/commit/3864490f9bcb2f30e6508b4ae8a98f5faf910b47).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98069/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Improve BenchmarkWideTab...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22823#discussion_r228400706
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala
 ---
@@ -1,52 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.benchmark
-
-import org.apache.spark.benchmark.Benchmark
-
-/**
- * Benchmark to measure performance for wide table.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.BenchmarkWideTable"
- *
- * Benchmarks in this file are skipped in normal builds.
- */
-class BenchmarkWideTable extends BenchmarkWithCodegen {
-
-  ignore("project on wide table") {
-val N = 1 << 20
-val df = sparkSession.range(N)
-val columns = (0 until 400).map{ i => s"id as id$i"}
-val benchmark = new Benchmark("projection on wide table", N)
-benchmark.addCase("wide table", numIters = 5) { iter =>
-  df.selectExpr(columns : _*).queryExecution.toRdd.count()
-}
-benchmark.run()
-
-/**
- * Here are some numbers with different split threshold:
- *
- *  Split threshold  methods   Rate(M/s)   Per Row(ns)
- *  10   400   0.4 2279
- *  100  200   0.6 1554
- *  1k   370.9 1116
--- End diff --

Hi, @davies and @cloud-fan and @kiszk .

This benchmark is added in [Spark 
2.1.0](https://github.com/apache/spark/commit/8d35a6f68d6d733212674491cbf31bed73fada0f#diff-71964129f49db97eb030a6d7320af314).
 This value `1k` is determined by **manually** changing the split threhold.

This PR wants to [add a configuration in 
CodeGenerator.scala](https://github.com/apache/spark/pull/22823/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR914)
 for testing-purpose only.

1. Is the configuration helpful in general purpose?
2. If then, can we make another PR for that first?
3. If not, is it allowed to add this testing parameter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4523/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22843
  
**[Test build #98069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98069/testReport)**
 for PR 22843 at commit 
[`3864490`](https://github.com/apache/spark/commit/3864490f9bcb2f30e6508b4ae8a98f5faf910b47).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22843
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-25 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/22843

[SPARK-16693][SPARKR] Remove methods deprecated

## What changes were proposed in this pull request?

Remove deprecated functions which includes:
SQLContext/HiveContext stuff
sparkR.init
jsonFile
parquetFile
registerTempTable
saveAsParquetFile
unionAll
createExternalTable
dropTempTable

## How was this patch tested?

jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rrddapi

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22843


commit 3864490f9bcb2f30e6508b4ae8a98f5faf910b47
Author: Felix Cheung 
Date:   2018-10-26T04:04:58Z

remove deprecated




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Improve BenchmarkWideTab...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22823#discussion_r228399871
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -910,12 +910,14 @@ class CodegenContext {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
 var length = 0
+val splitThreshold =
+  SQLConf.get.getConfString("spark.testing.codegen.splitThreshold", 
"1024").toInt
--- End diff --

In this case, we need advice from the right person. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #98068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98068/testReport)**
 for PR 22666 at commit 
[`d876b92`](https://github.com/apache/spark/commit/d876b9270afa9b30defea6d4621bcc63dc61f3e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22840
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22840
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98051/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22840
  
**[Test build #98051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98051/testReport)**
 for PR 22840 at commit 
[`0950bb8`](https://github.com/apache/spark/commit/0950bb86ac028655c665687398c7dcfce1853f04).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22775
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4522/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22775
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22775
  
**[Test build #98067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98067/testReport)**
 for PR 22775 at commit 
[`03f34d9`](https://github.com/apache/spark/commit/03f34d9d86a8087c3de4d5580e2ddf9fba8a8407).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22814: [SPARK-25819][SQL] Support parse mode option for ...

2018-10-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22814


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22815
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4521/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22815
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22775
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22775
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4520/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22820
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4519/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22820
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22820
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4519/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21588
  
So, let's say we decide to only support Hive 2.3.x+, as a precursor to 
this. We could already eliminate a lot of the Hive tests, right? that might be 
useful in its own right as they take time and are a little flaky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22837: [MINOR][TEST][BRANCH-2.4] Regenerate golden file ...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/22837


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22814
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
Yup, it supports Hadoop 3, and other fixes what @wangyum mentioned.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22837: [MINOR][TEST][BRANCH-2.4] Regenerate golden file `dateti...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22837
  
Merged to `branch-2.4`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22837: [MINOR][TEST][BRANCH-2.4] Regenerate golden file `dateti...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22837
  
Thank you for review and approval, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22815: [SPARK-25821][SQL] Remove SQLContext methods deprecated ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22815
  
**[Test build #98065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98065/testReport)**
 for PR 22815 at commit 
[`8199362`](https://github.com/apache/spark/commit/81993625218818a2b9444e5ba11588713eda557f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22775
  
**[Test build #98066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98066/testReport)**
 for PR 22775 at commit 
[`e2ca651`](https://github.com/apache/spark/commit/e2ca6517098adc093f957a6158ed760fb0826f4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22815: [SPARK-25821][SQL] Remove SQLContext methods depr...

2018-10-25 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22815#discussion_r228396856
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -434,6 +388,7 @@ read.orc <- function(path, ...) {
 #' Loads a Parquet file, returning the result as a SparkDataFrame.
 #'
 #' @param path path of file to read. A vector of multiple paths is allowed.
+#' @param ... additional external data source specific named properties.
--- End diff --

Oops, I missed that, sorry. I'll incorporate both changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22840: [SPARK-25840][BUILD] `make-distribution.sh` shoul...

2018-10-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22840


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-25 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21588
  
does  Apache Hive 2.3.2 have all the fixes we need?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22837: [MINOR][TEST][BRANCH-2.4] Regenerate golden file `dateti...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22837
  
Could you review this, @HyukjinKwon ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22820
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4519/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22840
  
Thank you, @srowen and @HyukjinKwon .
Merged to master/branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22814
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98063/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22814
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22814: [SPARK-25819][SQL] Support parse mode option for the fun...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22814
  
**[Test build #98063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98063/testReport)**
 for PR 22814 at commit 
[`b33a5ad`](https://github.com/apache/spark/commit/b33a5ade4b3f091d5e67d3f3bdc47e87f9b37eee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22841
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98059/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22820
  
**[Test build #98064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98064/testReport)**
 for PR 22820 at commit 
[`728d70a`](https://github.com/apache/spark/commit/728d70af1d0917745879362abef2209a760d4f22).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22841
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22841
  
**[Test build #98059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98059/testReport)**
 for PR 22841 at commit 
[`0a49c85`](https://github.com/apache/spark/commit/0a49c859049a376872053dcfaacba81d47070d77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client version to ...

2018-10-25 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22820
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22838: [SPARK-25835][K8s] Create kubernetes-tests profile and u...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22838
  
**[Test build #4397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4397/testReport)**
 for PR 22838 at commit 
[`9b8f6b4`](https://github.com/apache/spark/commit/9b8f6b41cdb0e3139d76a0cfd281c094bcc91469).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22841
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98060/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for class...

2018-10-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22790


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22841
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22841: [SPARK-25842][SQL] Deprecate rangeBetween APIs introduce...

2018-10-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22841
  
**[Test build #98060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98060/testReport)**
 for PR 22841 at commit 
[`45ef16b`](https://github.com/apache/spark/commit/45ef16bac363979d1626824673a41360a3c9648a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 775 matches

Mail list logo