date:20190107

[GitHub] LantaoJin commented on a change in pull request #22874: [SPARK-25865][CORE] Add GC information to ExecutorMetrics

2019-01-07 Thread GitBox

LantaoJin commented on a change in pull request #22874: [SPARK-25865][CORE] Add 
GC information to ExecutorMetrics
URL: https://github.com/apache/spark/pull/22874#discussion_r245867274
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/metrics/ExecutorMetricType.scala
 ##
 @@ -99,6 +102,60 @@ case object ProcessTreeMetrics extends ExecutorMetricType {
   }
 }
 
+case object GarbageCollectionMetrics extends ExecutorMetricType with Logging {
+  import GC_TYPE._
+  override val names = Seq(
+"MinorGCCount",
+"MinorGCTime",
+"MajorGCCount",
+"MajorGCTime"
+  )
+
+  private lazy val youngGenGarbageCollector: Seq[String] = {
+Seq(`copy`, `psScavenge`, `parNew`, `g1Young`) ++ /* additional young gc 
we added */
+  
SparkEnv.get.conf.get(config.EVENT_LOG_ADDITIONAL_YOUNG_GENERATION_GARBAGE_COLLECTORS)
+  }
+
+  private lazy val oldGenGarbageCollector: Seq[String] = {
+Seq(`markSweepCompact`, `psMarkSweep`, `cms`, `g1Old`) ++ /* additional 
old gc we added */
+  
SparkEnv.get.conf.get(config.EVENT_LOG_ADDITIONAL_OLD_GENERATION_GARBAGE_COLLECTORS)
+  }
+
+  override private[spark] def getMetricValues(memoryManager: MemoryManager): 
Array[Long] = {
+val gcMetrics = Array[Long](names.length) // minorCount, minorTime, 
majorCount, majorTime
+if (SparkEnv.get.conf.get(config.EVENT_LOG_GARBAGE_COLLECTION_METRICS)) {
+  ManagementFactory.getGarbageCollectorMXBeans.asScala.foreach { mxBean =>
+if (youngGenGarbageCollector.contains(mxBean.getName)) {
+  gcMetrics(0) = mxBean.getCollectionCount
+  gcMetrics(1) = mxBean.getCollectionTime
+} else if (oldGenGarbageCollector.contains(mxBean.getName)) {
+  gcMetrics(2) = mxBean.getCollectionCount
+  gcMetrics(3) = mxBean.getCollectionTime
+} else {
+  logDebug(s"${mxBean.getName} is an unsupported garbage collector." +
+s"Add it to 
${config.EVENT_LOG_ADDITIONAL_YOUNG_GENERATION_GARBAGE_COLLECTORS} " +
+s"or 
${config.EVENT_LOG_ADDITIONAL_OLD_GENERATION_GARBAGE_COLLECTORS} to enable.")
+}
+  }
+}
+gcMetrics
+  }
+}
+
+private[spark] object GC_TYPE {
+  // Young Gen
+  val copy = "Copy"
+  val psScavenge = "PS Scavenge"
+  val parNew = "ParNew"
+  val g1Young = "G1 Young Generation"
+
+  // Old Gen
+  val markSweepCompact = "MarkSweepCompact"
+  val psMarkSweep = "PS MarkSweep"
+  val cms = "ConcurrentMarkSweep"
+  val g1Old = "G1 Old Generation"
 
 Review comment:
   Yes, I will use two Seqs instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452200388
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6809/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452200385
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

SparkQA commented on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452200447
 
 
   **[Test build #100919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100919/testReport)**
 for PR 23462 at commit 
[`68ec759`](https://github.com/apache/spark/commit/68ec759b1fae6eae19b73dceedcd8b271a730fbf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452200388
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6809/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452200385
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452200077
 
 
   @ueshin Ideally we should provide a config to turn off this check. But I'm 
wondering if there is a proper way to obtain config value at the serializers 
(https://github.com/apache/spark/pull/22807#issuecomment-452177528). Do you 
have some suggestions?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] MaxGekk commented on issue #23462: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-07 Thread GitBox

MaxGekk commented on issue #23462: [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
URL: https://github.com/apache/spark/pull/23462#issuecomment-452199786
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] MaxGekk commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled

2019-01-07 Thread GitBox

MaxGekk commented on issue #23411: [SPARK-26503][CORE] Get rid of 
spark.sql.legacy.timeParser.enabled
URL: https://github.com/apache/spark/pull/23411#issuecomment-452199451
 
 
   > Is that what you're interested in testing?
   
   Yes, thank you.
   
   > If so are we back to favoring removing the old legacy parser?
   
   I would continue with the PR. @hvanhovell WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452197625
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100915/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] 
Keep track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452197622
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452197622
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to ExecutorMetrics

2019-01-07 Thread GitBox

LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to 
ExecutorMetrics
URL: https://github.com/apache/spark/pull/22874#issuecomment-452197644
 
 
   @squito Replaced with a small one:
   
   > 285K Jan  8 12:42 
core/src/test/resources/spark-events/application_1536831636016_59384_1 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452197625
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100915/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

SparkQA removed a comment on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep 
track of nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452160908
 
 
   **[Test build #100915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100915/testReport)**
 for PR 19045 at commit 
[`92f4289`](https://github.com/apache/spark/commit/92f4289a1dd971dadb34e227d532bd43014fa055).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown

2019-01-07 Thread GitBox

SparkQA commented on issue #19045: [WIP][SPARK-20628][CORE][K8S] Keep track of 
nodes (/ spot instances) which are going to be shutdown
URL: https://github.com/apache/spark/pull/19045#issuecomment-452197407
 
 
   **[Test build #100915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100915/testReport)**
 for PR 19045 at commit 
[`92f4289`](https://github.com/apache/spark/commit/92f4289a1dd971dadb34e227d532bd43014fa055).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] ueshin commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

ueshin commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452193929
 
 
   So since pyarrow 0.11, we won't allow nullable integral types by default?
   `ScalarPandasUDFTests.test_vectorized_udf_null_(byte|short|int|long)` will 
fail as per #23305, IIUC.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23488: [SPARK-26564] Fix misleading error message about spark.network.timeout and spark.executor.heartbeatInterval

2019-01-07 Thread GitBox

HyukjinKwon commented on issue #23488: [SPARK-26564] Fix misleading error 
message about spark.network.timeout and spark.executor.heartbeatInterval
URL: https://github.com/apache/spark/pull/23488#issuecomment-452185898
 
 
   It's okay but mind taking a look for similar instances? I think there are 
more.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The worker dir is created when the worker starts. If the directory 
creation fails, the worker will fail to start directly.
   
   private def createWorkDir() {
  workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
  try {
  // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
  // So attempting to create and then check if directory was created or 
not.
  workDir.mkdirs()
  if ( !workDir.exists() || !workDir.isDirectory) {
logError("Failed to create work directory " + workDir)
System.exit(1)
  }
assert (workDir.isDirectory)
  } catch {
case e: Exception =>
logError("Failed to create work directory " + workDir, e)
System.exit(1)
  }
   }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   
   case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) 
=>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then Master 
continue to assign executors to

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
   
   private def createWorkDir() {
  workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
  try {
  // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
  // So attempting to create and then check if directory was created or 
not.
  workDir.mkdirs()
  if ( !workDir.exists() || !workDir.isDirectory) {
logError("Failed to create work directory " + workDir)
System.exit(1)
  }
assert (workDir.isDirectory)
  } catch {
case e: Exception =>
logError("Failed to create work directory " + workDir, e)
System.exit(1)
  }
   }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   
   case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) 
=>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then Master 
continue to assign executors to the

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
   private def createWorkDir() {
  workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
  try {
  // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
  // So attempting to create and then check if directory was created or 
not.
  workDir.mkdirs()
  if ( !workDir.exists() || !workDir.isDirectory) {
logError("Failed to create work directory " + workDir)
System.exit(1)
  }
assert (workDir.isDirectory)
  } catch {
case e: Exception =>
logError("Failed to create work directory " + workDir, e)
System.exit(1)
  }
   }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. The code is as follows，
   
   case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) 
=>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then

[GitHub] HyukjinKwon commented on issue #23487: [SPARK-26563]Fix java `SimpleApp` documentation

2019-01-07 Thread GitBox

HyukjinKwon commented on issue #23487: [SPARK-26563]Fix java `SimpleApp` 
documentation
URL: https://github.com/apache/spark/pull/23487#issuecomment-452185288
 
 
   Master is specified in the submit in the example. `then use the spark-submit 
script to run our program.`
   
   ```bash
   # Use spark-submit to run your application
   $ YOUR_SPARK_HOME/bin/spark-submit \
 --class "SimpleApp" \
 --master local[4] \
 target/scala-2.11/simple-project_2.11-1.0.jar
   ```
   
   That's also explain in the stack overflow link you pointed out:
   
   > To be clear, this is not what you should do in a production environment. 
In a production environment, spark.master should be specified in one of a 
couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is 
where cloudera manager will put it), or on the command line when you submit the 
app. (ex spark-submit --master yarn).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
   `private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
   }`
  
   
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
   private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
  }
  
   
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
   private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
   }
  
   
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   `private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }`
  
   
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
  private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }
   
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated.

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.The code is as follows，
   
   `  private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }`
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated.

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
The code is as follows，
   `  private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }`
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
The code is as follows，
 private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created 
or not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated.

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
The code is as follows，
   //
   private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created or 
not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   //
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then Master 
continue to

[GitHub] httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter edited a comment on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
The code is as follows，
   
   private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created or 
not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
   
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then Master 
continue to assign

[GitHub] httfighter commented on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter commented on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452184077
 
 
   @srowen The main process leading to this problem is analyzed below.
   
   1、The wordir is created when the worker starts. If the directory creation 
fails, the worker will fail to start directly.
The code is as follows，
   private def createWorkDir() {
   workDir = Option(workDirPath).map(new File(_)).getOrElse(new 
File(sparkHome, "work"))
   try {
 // This sporadically fails - not sure why ... !workDir.exists() && 
!workDir.mkdirs()
 // So attempting to create and then check if directory was created or 
not.
 workDir.mkdirs()
 if ( !workDir.exists() || !workDir.isDirectory) {
   logError("Failed to create work directory " + workDir)
   System.exit(1)
 }
 assert (workDir.isDirectory)
   } catch {
 case e: Exception =>
   logError("Failed to create work directory " + workDir, e)
   System.exit(1)
   }
 }
   
   2、When the worker assigns an executor to an application, the application 
directory is created under the worker dir directory. If the application 
directory creation fails, the executor fails to be assigned, and the worker 
returns a failure message to the Master. At this point, these treatments are 
correct. 
   The code is as follows，
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
 if (masterUrl != activeMasterUrl) {
   logWarning("Invalid Master (" + masterUrl + ") attempted to launch 
executor.")
 } else {
   try {
 logInfo("Asked to launch executor %s/%d for %s".format(appId, 
execId, appDesc.name))
   
 // Create the executor's working directory
 val executorDir = new File(workDir, appId + "/" + execId)
 if (!executorDir.mkdirs()) {
   throw new IOException("Failed to create directory " + 
executorDir)
 }
   
 // Create local dirs for the executor. These are passed to the 
executor via the
 // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
 // application finishes.
 val appLocalDirs = appDirectories.getOrElse(appId, {
   val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
   val dirs = localRootDirs.flatMap { dir =>
 try {
   val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
   Utils.chmod700(appDir)
   Some(appDir.getAbsolutePath())
 } catch {
   case e: IOException =>
 logWarning(s"${e.getMessage}. Ignoring this directory.")
 None
 }
   }.toSeq
   if (dirs.isEmpty) {
 throw new IOException("No subfolder can be created in " +
   s"${localRootDirs.mkString(",")}.")
   }
   dirs
 })
 appDirectories(appId) = appLocalDirs
 val manager = new ExecutorRunner(
   appId,
   execId,
   appDesc.copy(command = 
Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
   cores_,
   memory_,
   self,
   workerId,
   host,
   webUi.boundPort,
   publicAddress,
   sparkHome,
   executorDir,
   workerUri,
   conf,
   appLocalDirs, ExecutorState.RUNNING)
 executors(appId + "/" + execId) = manager
 manager.start()
 coresUsed += cores_
 memoryUsed += memory_
 sendToMaster(ExecutorStateChanged(appId, execId, manager.state, 
None, None))
   } catch {
 case e: Exception =>
   logError(s"Failed to launch executor $appId/$execId for 
${appDesc.name}.", e)
   if (executors.contains(appId + "/" + execId)) {
 executors(appId + "/" + execId).kill()
 executors -= appId + "/" + execId
   }
   sendToMaster(ExecutorStateChanged(appId, execId, 
ExecutorState.FAILED,
 Some(e.toString), None))
   }
 }
   
   3、After the Master receives the message, it will do the processing. If the 
application does not have any executors that have been successfully assigned, 
the application will be forced to quit if the number of failed executors has 
reached the maximum number of times. 
   However, the worker dir corruption does not meet the above application exit 
condition, on the worker node with the normal worker dir state, the executor is 
successfully allocated. So the master does not do anything. Then Master 
continue to assign executors to

[GitHub] HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python 
worker reuse take no effect for parallelize xrange
URL: https://github.com/apache/spark/pull/23470#issuecomment-452183761
 
 
   Also, let's fix PR description and title from xrange to range as generator. 
Range in Python 3 is a generator already.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] httfighter commented on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_…

2019-01-07 Thread GitBox

httfighter commented on issue #23437: [SPARK-26524] If the application 
directory fails to be created on the SPARK_WORKER_…
URL: https://github.com/apache/spark/pull/23437#issuecomment-452183521
 
 
   @srowen Thank you for your review.  I am adding new config and state to 
track if the Worker node failed to assign an executor to the application 
continuously，the executor number reaches tens of thousands in a short time. 
This will lead to the Master service  and Worker service has been busy with RPC 
communication, affecting service status. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452183401
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100918/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452183401
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100918/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452183399
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452183399
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178488
 
 
   **[Test build #100918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100918/testReport)**
 for PR 22807 at commit 
[`409569b`](https://github.com/apache/spark/commit/409569b5f3327a7e22afaeda5078f425fe8e72e2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452183236
 
 
   **[Test build #100918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100918/testReport)**
 for PR 22807 at commit 
[`409569b`](https://github.com/apache/spark/commit/409569b5f3327a7e22afaeda5078f425fe8e72e2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182378
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182912
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182975
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182912
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] southernriver commented on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

southernriver commented on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182686
 
 
   @cloud-fan @carsonwang  Could you please have a look at this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490#issuecomment-452182378
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] southernriver opened a new pull request #23490: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

southernriver opened a new pull request #23490: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23490
 
 
   ## What changes were proposed in this pull request?
   
   For SparkSQL ,when we open AE by 'set spark.sql.adapative.enable=true'，the 
ExchangeCoordinator will introduced to determine the number of post-shuffle 
partitions. But in some certain conditions,the coordinator performed not very 
well, there are always some tasks retained and they worked with Shuffle Read 
Size / Records 0.0B/0 ,We could increase the 
spark.sql.adaptive.shuffle.targetPostShuffleInputSize to solve this,but this 
action is unreasonable as targetPostShuffleInputSize Should not be set too 
large. As follow:
   
![image](https://user-images.githubusercontent.com/20614350/50747129-5519cc00-126d-11e9-8511-8cdb324366c9.png)
   
   We could   reproduce this problem easily with the SQL:
   
   `set spark.sql.adaptive.enabled=true；`
   `  spark.sql.shuffle.partitions 100；`
   `  spark.sql.adaptive.shuffle.targetPostShuffleInputSize  33554432 ；`
   `  SELECT a,COUNT(1) FROM TABLE  GROUP BY  a DISTRIBUTE BY cast(rand()* 10 
as bigint)` 
   
   before fix：
   
![image](https://user-images.githubusercontent.com/20614350/50747540-577d2580-126f-11e9-80b0-1b36fc2fd692.png)
   after fix：
   
![image](https://user-images.githubusercontent.com/20614350/50747608-c65a7e80-126f-11e9-9b10-32494232f0f9.png)
   
   
   ## How was this patch tested?
   manual and  unit tests 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen commented on issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.

2019-01-07 Thread GitBox

srowen commented on issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep 
alive on the master RPC connections.
URL: https://github.com/apache/spark/pull/20512#issuecomment-452181557
 
 
   I see, @peshopetrov , could you maybe narrow the scope of the change to only 
affect master RPCs? that seems OK, and seems like your intent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] southernriver closed pull request #23478: [SPARK-26543][SQL] Support the coordinator to determine post-shuffle partitions more reasonably

2019-01-07 Thread GitBox

southernriver closed pull request #23478: [SPARK-26543][SQL] Support the 
coordinator to determine post-shuffle partitions more reasonably
URL: https://github.com/apache/spark/pull/23478
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala
index 78f11ca8d8c78..59418c4fbd81f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala
@@ -177,11 +177,17 @@ class ExchangeCoordinator(
   // If including the nextShuffleInputSize would exceed the target 
partition size, then start a
   // new partition.
   if (i > 0 && postShuffleInputSize + nextShuffleInputSize > 
targetPostShuffleInputSize) {
-partitionStartIndices += i
+if (postShuffleInputSize != 0) {
+  partitionStartIndices += i
+}
 // reset postShuffleInputSize.
 postShuffleInputSize = nextShuffleInputSize
   } else postShuffleInputSize += nextShuffleInputSize
 
+  // filter the last indice which will split the postShuffleInputSize=0
+  if (postShuffleInputSize == 0 && i == numPreShufflePartitions - 1) {
+partitionStartIndices.remove(partitionStartIndices.size - 1)
+  }
   i += 1
 }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
index 737eeb0af586e..2f9fd870cf56f 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
@@ -63,7 +63,7 @@ class ExchangeCoordinatorSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 {
   // All bytes per partition are 0.
   val bytesByPartitionId = Array[Long](0, 0, 0, 0, 0)
-  val expectedPartitionStartIndices = Array[Int](0)
+  val expectedPartitionStartIndices = Array[Int]()
   checkEstimation(coordinator, Array(bytesByPartitionId), 
expectedPartitionStartIndices)
 }
 
@@ -85,7 +85,14 @@ class ExchangeCoordinatorSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 {
   // There are a few large pre-shuffle partitions.
   val bytesByPartitionId = Array[Long](110, 10, 100, 110, 0)
-  val expectedPartitionStartIndices = Array[Int](0, 1, 2, 3, 4)
+  val expectedPartitionStartIndices = Array[Int](0, 1, 2, 3)
+  checkEstimation(coordinator, Array(bytesByPartitionId), 
expectedPartitionStartIndices)
+}
+
+{
+  // There are a few large pre-shuffle partitions and some bytes per 
partition are 0.
+  val bytesByPartitionId = Array[Long](110, 120, 0, 130, 0, 0, 100, 110, 0)
+  val expectedPartitionStartIndices = Array[Int](0, 1, 2, 4, 7)
   checkEstimation(coordinator, Array(bytesByPartitionId), 
expectedPartitionStartIndices)
 }
 
@@ -123,7 +130,7 @@ class ExchangeCoordinatorSuite extends SparkFunSuite with 
BeforeAndAfterAll {
   // All bytes per partition are 0.
   val bytesByPartitionId1 = Array[Long](0, 0, 0, 0, 0)
   val bytesByPartitionId2 = Array[Long](0, 0, 0, 0, 0)
-  val expectedPartitionStartIndices = Array[Int](0)
+  val expectedPartitionStartIndices = Array[Int]()
   checkEstimation(
 coordinator,
 Array(bytesByPartitionId1, bytesByPartitionId2),
@@ -206,7 +213,7 @@ class ExchangeCoordinatorSuite extends SparkFunSuite with 
BeforeAndAfterAll {
   // the size of data is 0.
   val bytesByPartitionId1 = Array[Long](0, 0, 0, 0, 0)
   val bytesByPartitionId2 = Array[Long](0, 0, 0, 0, 0)
-  val expectedPartitionStartIndices = Array[Int](0)
+  val expectedPartitionStartIndices = Array[Int]()
   checkEstimation(
 coordinator,
 Array(bytesByPartitionId1, bytesByPartitionId2),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] coderplay edited a comment on issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.

2019-01-07 Thread GitBox

coderplay edited a comment on issue #20512: [SPARK-23182][CORE] Allow enabling 
TCP keep alive on the master RPC connections.
URL: https://github.com/apache/spark/pull/20512#issuecomment-452178010
 
 
   Clarification:  I am not a spark expert,  just got an invitation from @rxin 
because he think I have some knowledge about linux TCP.
   
   Generally, the patch looks good to me for the purpose of preventing 
inactivity from disconnecting the channel.  But from the diff, looks like this 
commit will impact other RPCs or shuffling transport as well. It's not only for 
master RPCs as the title declared.  
   
   Min
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178556
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178558
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6808/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178558
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6808/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-45213
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6807/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178556
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452178488
 
 
   **[Test build #100918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100918/testReport)**
 for PR 22807 at commit 
[`409569b`](https://github.com/apache/spark/commit/409569b5f3327a7e22afaeda5078f425fe8e72e2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] coderplay commented on issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the master RPC connections.

2019-01-07 Thread GitBox

coderplay commented on issue #20512: [SPARK-23182][CORE] Allow enabling TCP 
keep alive on the master RPC connections.
URL: https://github.com/apache/spark/pull/20512#issuecomment-452178010
 
 
   Clarification:  I am not a spark expert,  just got an invitation from @rxin 
because he think I have some knowledge about linux TCP.
   
   Generally, the patch looks good to me for the purpose of preventing 
inactivity from disconnecting the channel.  But from the diff, looks like this 
commit will impact other RPCs or shuffling transport as well. It's not only for 
master RPCs as the title declared.  The worst case for TCP keep alive is one 
message per client connection which probably is the case of data shuffling or 
some specific RPC types. If we maintain only one switch for ruling everything,  
the config will have negative impact on the performance in those cases.  
   
   
http://blog.catchpoint.com/2011/03/25/relying_on_web_performance_monitoring_to_discover_release_problems/
   
   Min
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-45211
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-45213
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6807/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python 
worker reuse take no effect for parallelize xrange
URL: https://github.com/apache/spark/pull/23470#issuecomment-452177786
 
 
   LGTM too considering it's a quick bandaid fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-45211
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177528
 
 
   > Were you thinking of putting in a spark config to toggle the safe flag 
here or in a follow up? I think we really need that because prior to v0.11.0 it 
allowed unsafe casts, but now raises an error by default.
   
   Do we have proper way to get spark config value at executor side like 
serializers here? I found that in `read_udfs` some configs for pandas_udf 
evaluation are read from stream, is it the only way we have?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on issue #23470: [SPARK-26549][PySpark] Fix for python 
worker reuse take no effect for parallelize xrange
URL: https://github.com/apache/spark/pull/23470#issuecomment-452177584
 
 
   re: https://github.com/apache/spark/pull/23470#issuecomment-452173964
   
   Yea, I think so. I took a look to fix the root cause but it's going to be 
quite invasive from my look. Maybe there's another way I missed. So, the 
current fix is like a bandaid fix .. but I think it's good enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177099
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100917/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452176923
 
 
   **[Test build #100917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100917/testReport)**
 for PR 22807 at commit 
[`a645a90`](https://github.com/apache/spark/commit/a645a90565545235fbe910dcc5698f636cf4b1c1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on a change in pull request #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on a change in pull request #23470: 
[SPARK-26549][PySpark] Fix for python worker reuse take no effect for 
parallelize xrange
URL: https://github.com/apache/spark/pull/23470#discussion_r245878687
 
 

 ##
 File path: python/pyspark/tests/test_worker.py
 ##
 @@ -144,6 +144,15 @@ def test_with_different_versions_of_python(self):
 finally:
 self.sc.pythonVer = version
 
+def test_reuse_worker_of_parallelize_xrange(self):
 
 Review comment:
   I mean, separate class. It's best to avoid reusing Spark context.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #22807: [WIP][SPARK-25811][PySpark] 
Raise a proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177096
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177099
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100917/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177092
 
 
   **[Test build #100917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100917/testReport)**
 for PR 22807 at commit 
[`a645a90`](https://github.com/apache/spark/commit/a645a90565545235fbe910dcc5698f636cf4b1c1).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

viirya commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177047
 
 
   Sorry for not pushing this for a while. Hope we can make it soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a 
proper error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452177096
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] viirya commented on a change in pull request #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

viirya commented on a change in pull request #22807: 
[WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected 
by PyArrow
URL: https://github.com/apache/spark/pull/22807#discussion_r245878431
 
 

 ##
 File path: python/pyspark/sql/tests.py
 ##
 @@ -4961,6 +4961,31 @@ def foofoo(x, y):
 ).collect
 )
 
+def test_pandas_udf_detect_unsafe_type_conversion(self):
+from distutils.version import LooseVersion
+from pyspark.sql.functions import pandas_udf
+import pandas as pd
+import numpy as np
+import pyarrow as pa
+
+values = [1.0] * 3
+pdf = pd.DataFrame({'A': values})
+df = self.spark.createDataFrame(pdf).repartition(1)
+
+@pandas_udf(returnType="int")
+def udf(column):
+return pd.Series(np.linspace(0, 1, 3))
+
+udf_boolean = df.select(['A']).withColumn('udf', udf('A'))
+
+# Since 0.11.0, PyArrow supports the feature to raise an error for 
unsafe cast.
+if LooseVersion(pa.__version__) >= LooseVersion("0.11.0"):
 
 Review comment:
   @BryanCutler Do you mean this same udf raises an error like overflows? I 
tried with 0.8.0 but it doesn't? Am I missing something?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on a change in pull request #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on a change in pull request #23470: 
[SPARK-26549][PySpark] Fix for python worker reuse take no effect for 
parallelize xrange
URL: https://github.com/apache/spark/pull/23470#discussion_r245878479
 
 

 ##
 File path: python/pyspark/tests/test_worker.py
 ##
 @@ -144,6 +144,15 @@ def test_with_different_versions_of_python(self):
 finally:
 self.sc.pythonVer = version
 
+def test_reuse_worker_of_parallelize_xrange(self):
 
 Review comment:
   Can you please make a separate test for such tests?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

SparkQA commented on issue #22807: [WIP][SPARK-25811][PySpark] Raise a proper 
error when unsafe cast is detected by PyArrow
URL: https://github.com/apache/spark/pull/22807#issuecomment-452176923
 
 
   **[Test build #100917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100917/testReport)**
 for PR 22807 at commit 
[`a645a90`](https://github.com/apache/spark/commit/a645a90565545235fbe910dcc5698f636cf4b1c1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] viirya commented on a change in pull request #22807: [WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

2019-01-07 Thread GitBox

viirya commented on a change in pull request #22807: 
[WIP][SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected 
by PyArrow
URL: https://github.com/apache/spark/pull/22807#discussion_r245878286
 
 

 ##
 File path: python/pyspark/serializers.py
 ##
 @@ -284,7 +284,14 @@ def create_array(s, t):
 elif LooseVersion(pa.__version__) < LooseVersion("0.11.0"):
 # TODO: see ARROW-1949. Remove when the minimum PyArrow version 
becomes 0.11.0.
 return pa.Array.from_pandas(s, mask=mask, type=t)
-return pa.Array.from_pandas(s, mask=mask, type=t, safe=False)
+try:
+array = pa.Array.from_pandas(s, mask=mask, type=t, safe=True)
+except pa.ArrowException as e:
 
 Review comment:
   @BryanCutler Now it catches `ArrowException`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on a change in pull request #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

HyukjinKwon commented on a change in pull request #23470: 
[SPARK-26549][PySpark] Fix for python worker reuse take no effect for 
parallelize xrange
URL: https://github.com/apache/spark/pull/23470#discussion_r245877915
 
 

 ##
 File path: python/pyspark/context.py
 ##
 @@ -493,6 +493,10 @@ def getStart(split):
 return start0 + int((split * size / numSlices)) * step
 
 def f(split, iterator):
+# it's an empty iterator here but we need this line for 
triggering the logic of
+# checking END_OF_DATA_SECTION during load iterator in 
runtime, thus make sure
 
 Review comment:
   Can you elaborate `during load iterator in runtime` please? For instance,
   
   `FramedSerializer.load_stream` has its signal handling, for instance, 
`SpecialLengths.END_OF_DATA_SECTION` in `_read_with_length`. Since 
`FramedSerializer.load_stream` produces a generator, the control should at 
least be in that function once. Here we do it by explicitly converting the 
empty iterator to a list.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] BryanCutler commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

BryanCutler commented on issue #23470: [SPARK-26549][PySpark] Fix for python 
worker reuse take no effect for parallelize xrange
URL: https://github.com/apache/spark/pull/23470#issuecomment-452173964
 
 
   Does this mean that the user could also map a function that doesn't consume 
the iterator and inadvertently cause the worker to not be reused? If so, should 
the fix be in `PythonRunner` or worker.py?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] ueshin commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker reuse take no effect for parallelize xrange

2019-01-07 Thread GitBox

ueshin commented on issue #23470: [SPARK-26549][PySpark] Fix for python worker 
reuse take no effect for parallelize xrange
URL: https://github.com/apache/spark/pull/23470#issuecomment-452172770
 
 
   LGTM, too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] linehrr edited a comment on issue #23324: [SPARK-26267][SS]Retry when detecting incorrect offsets from Kafka

2019-01-07 Thread GitBox

linehrr edited a comment on issue #23324: [SPARK-26267][SS]Retry when detecting 
incorrect offsets from Kafka
URL: https://github.com/apache/spark/pull/23324#issuecomment-452172192
 
 
   @zsxwing not too sure if this is actually a bug related to this fix, but I 
can share my stacktrace here: 
   
   ```
   19/01/07 22:56:15 ERROR streaming.MicroBatchExecution: Query [id = 
c46c67ee-3514-4788-8370-a696837b21b1, runId = 
bb52783e-33d6-460b-9aa2-cc5da414531e] terminated with error
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 
164, qa2-hdp-3.acuityads.org, executor 2): java.lang.AssertionError: assertion f
   ailed: latest offset -9223372036854775808 does not equal -1
   at scala.Predef$.assert(Predef.scala:170)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartitionReader.resolveRange(KafkaMicroBatchReader.scala:371)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartitionReader.(KafkaMicroBatchReader.scala:329)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartition.createPartitionReader(KafkaMicroBatchReader.scala:314)
   at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
   at org.apache.spark.scheduler.Task.run(Task.scala:121)
   at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
   
   ```
   for some reason, looks like fetchLatestOffset returned a Long.MIN_VALUE for 
one of the partitions. I checked the structured streaming checkpoint, that was 
correct, it's the currentAvailableOffset was set to Long.MIN_VALUE. 
   
   kafka broker version: 1.1.0. 
   lib we used: 
   ```// 
https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10
   libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0"
   ```
   
   how to reproduce:
   basically we started a structured streamer and subscribed a topic of 4 
partitions. then produced some messages into topic, job crashed and logged the 
stacktrace like above. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] linehrr commented on issue #23324: [SPARK-26267][SS]Retry when detecting incorrect offsets from Kafka

2019-01-07 Thread GitBox

linehrr commented on issue #23324: [SPARK-26267][SS]Retry when detecting 
incorrect offsets from Kafka
URL: https://github.com/apache/spark/pull/23324#issuecomment-452172192
 
 
   @zsxwing not too sure if this is actually a bug related to this fix, but I 
can share my stacktrace here: 
   
   ```
   19/01/07 22:56:15 ERROR streaming.MicroBatchExecution: Query [id = 
c46c67ee-3514-4788-8370-a696837b21b1, runId = 
bb52783e-33d6-460b-9aa2-cc5da414531e] terminated with error
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 
164, qa2-hdp-3.acuityads.org, executor 2): java.lang.AssertionError: assertion f
   ailed: latest offset -9223372036854775808 does not equal -1
   at scala.Predef$.assert(Predef.scala:170)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartitionReader.resolveRange(KafkaMicroBatchReader.scala:371)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartitionReader.(KafkaMicroBatchReader.scala:329)
   at 
org.apache.spark.sql.kafka010.KafkaMicroBatchInputPartition.createPartitionReader(KafkaMicroBatchReader.scala:314)
   at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
   at org.apache.spark.scheduler.Task.run(Task.scala:121)
   at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should 
still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171811
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF 
should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171811
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should 
still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171817
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6806/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF 
should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171817
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6806/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still 
check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171621
 
 
   **[Test build #100916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100916/testReport)**
 for PR 23275 at commit 
[`cec3c0f`](https://github.com/apache/spark/commit/cec3c0f0c109c803bc18bff0d77d2fbc2bd53441).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

cloud-fan commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still 
check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171014
 
 
   HiveClientSuite is really flaky...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

cloud-fan commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still 
check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452171033
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF 
should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452170292
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100914/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF 
should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452170289
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should 
still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452170289
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should 
still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452170292
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100914/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should 
still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452151758
 
 
   **[Test build #100914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100914/testReport)**
 for PR 23275 at commit 
[`cec3c0f`](https://github.com/apache/spark/commit/cec3c0f0c109c803bc18bff0d77d2fbc2bd53441).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2019-01-07 Thread GitBox

SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still 
check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#issuecomment-452170168
 
 
   **[Test build #100914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100914/testReport)**
 for PR 23275 at commit 
[`cec3c0f`](https://github.com/apache/spark/commit/cec3c0f0c109c803bc18bff0d77d2fbc2bd53441).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zhengruifeng commented on a change in pull request #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensitivity in ML

2019-01-07 Thread GitBox

zhengruifeng commented on a change in pull request #23144: 
[SPARK-26172][ML][WIP] Unify String Params' case-insensitivity in ML
URL: https://github.com/apache/spark/pull/23144#discussion_r245871966
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/param/params.scala
 ##
 @@ -524,6 +525,65 @@ class BooleanParam(parent: String, name: String, doc: 
String) // No need for isV
   }
 }
 
+/**
+ * :: DeveloperApi ::
+ * Specialized version of `Param[String]` for Java.
+ */
+@DeveloperApi
 
 Review comment:
   this `normalize` is only added to `StringParam` extending `Param[String]`, 
so it will affect other type of params. 
   OK, I also feel it somewhat complex.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23302: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23302: [SPARK-24522][UI] Create 
filter to apply HTTP security checks consistently.
URL: https://github.com/apache/spark/pull/23302#issuecomment-452167401
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100911/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23302: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23302: [SPARK-24522][UI] Create filter to 
apply HTTP security checks consistently.
URL: https://github.com/apache/spark/pull/23302#issuecomment-452167399
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23302: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.

2019-01-07 Thread GitBox

AmplabJenkins commented on issue #23302: [SPARK-24522][UI] Create filter to 
apply HTTP security checks consistently.
URL: https://github.com/apache/spark/pull/23302#issuecomment-452167401
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100911/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23302: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.

2019-01-07 Thread GitBox

AmplabJenkins removed a comment on issue #23302: [SPARK-24522][UI] Create 
filter to apply HTTP security checks consistently.
URL: https://github.com/apache/spark/pull/23302#issuecomment-452167399
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23302: [SPARK-24522][UI] Create filter to apply HTTP security checks consistently.

2019-01-07 Thread GitBox

SparkQA commented on issue #23302: [SPARK-24522][UI] Create filter to apply 
HTTP security checks consistently.
URL: https://github.com/apache/spark/pull/23302#issuecomment-452166995
 
 
   **[Test build #100911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100911/testReport)**
 for PR 23302 at commit 
[`d185e12`](https://github.com/apache/spark/commit/d185e126a8f06cb3b577fd7622de5a87126e7f81).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 982 matches

Mail list logo