[GitHub] [spark] AmplabJenkins commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-659176704







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455525039



##
File path: 
core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
##
@@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient(
 if (ExecutorState.isFinished(state)) {
   listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, 
workerLost)
 } else if (state == ExecutorState.DECOMMISSIONED) {
-  listener.executorDecommissioned(fullId, message.getOrElse(""))
+  listener.executorDecommissioned(fullId,
+ExecutorDecommissionInfo(message.getOrElse(""), 
isHostDecommissioned = workerLost))

Review comment:
   oh I see https://github.com/apache/spark/pull/29032#discussion_r455401121





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


c21 commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-659174967


   cc @maropu, @cloud-fan, @gatorsmile and @sameeragarwal if you guys can help 
take a look. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455524790



##
File path: 
core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
##
@@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient(
 if (ExecutorState.isFinished(state)) {
   listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, 
workerLost)
 } else if (state == ExecutorState.DECOMMISSIONED) {
-  listener.executorDecommissioned(fullId, message.getOrElse(""))
+  listener.executorDecommissioned(fullId,
+ExecutorDecommissionInfo(message.getOrElse(""), 
isHostDecommissioned = workerLost))

Review comment:
   how is the flag `isHostDecommissioned` actually used?

##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
##
@@ -101,7 +101,8 @@ private[spark] trait TaskScheduler {
   /**
* Process a decommissioning executor.
*/
-  def executorDecommission(executorId: String): Unit
+  def executorDecommission(

Review comment:
   nit: don't leave an empty implementation here.

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##
@@ -191,9 +191,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 
executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecutor))
 removeExecutor(executorId, reason)
 
-  case DecommissionExecutor(executorId) =>
+  case DecommissionExecutor(executorId, decommissionInfo) =>
 logError(s"Received decommission executor message ${executorId}.")

Review comment:
   do we want to also include the decommissionInfo in the error msg?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


c21 opened a new pull request #29130:
URL: https://github.com/apache/spark/pull/29130


   
   
   ### What changes were proposed in this pull request?
   
   
   Currently `ShuffledHashJoin.outputPartitioning` inherits from 
`HashJoin.outputPartitioning`, which only preserves stream side partitioning 
(`HashJoin.scala`):
   
   ```
   override def outputPartitioning: Partitioning = 
streamedPlan.outputPartitioning
   ```
   
   This loses build side partitioning information, and causes extra shuffle if 
there's another join / group-by after this join.
   
   Example:
   
   ```
   withSQLConf(
   SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50",
   SQLConf.SHUFFLE_PARTITIONS.key -> "2",
   SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
 val df1 = spark.range(10).select($"id".as("k1"))
 val df2 = spark.range(30).select($"id".as("k2"))
 Seq("inner", "cross").foreach(joinType => {
   val plan = df1.join(df2, $"k1" === $"k2", 
joinType).groupBy($"k1").count()
 .queryExecution.executedPlan
   assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1)
   // No extra shuffle before aggregate
   assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2)
 })
   }
   ```
   
   Current physical plan (having an extra shuffle on `k1` before aggregate)
   
   ``` 
   *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
count#235L])
   +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117]
  +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
output=[k1#220L, count#239L])
 +- *(3) Project [k1#220L]
+- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
   :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109]
   :  +- *(1) Project [id#218L AS k1#220L]
   : +- *(1) Range (0, 10, step=1, splits=2)
   +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111]
  +- *(2) Project [id#222L AS k2#224L]
 +- *(2) Range (0, 30, step=1, splits=2)
   ``` 
   
   Ideal physical plan (no shuffle on `k1` before aggregate)
   
   ```
   *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
count#235L])
   +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
output=[k1#220L, count#239L])
  +- *(3) Project [k1#220L]
 +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
:- Exchange hashpartitioning(k1#220L, 2), true, [id=#107]
:  +- *(1) Project [id#218L AS k1#220L]
: +- *(1) Range (0, 10, step=1, splits=2)
+- Exchange hashpartitioning(k2#224L, 2), true, [id=#109]
   +- *(2) Project [id#222L AS k2#224L]
  +- *(2) Range (0, 30, step=1, splits=2)
   ``` 
   
   This can be fixed by overriding `outputPartitioning` method in 
`ShuffledHashJoinExec`, similar to `SortMergeJoinExec`.
   In addition, also fix one typo in `HashJoin`, as that code path is shared 
between broadcast hash join and shuffled hash join.
   
   
   ### Why are the changes needed?
   
   To avoid shuffle (for queries having multiple joins or group-by), for saving 
CPU and IO.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added unit test in `JoinSuite`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28961: [SPARK-32143][SQL] Prevent a skewed join from producing too many partition splits

2020-07-15 Thread GitBox


SparkQA commented on pull request #28961:
URL: https://github.com/apache/spark/pull/28961#issuecomment-659172738


   **[Test build #125937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125937/testReport)**
 for PR 28961 at commit 
[`3811ae9`](https://github.com/apache/spark/commit/3811ae93c2966d87619624670e338b7c6d34b7d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659112300


   **[Test build #125925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)**
 for PR 29032 at commit 
[`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659171262


   @viirya  This PR is only on cpu.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659171509


   **[Test build #125925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)**
 for PR 29032 at commit 
[`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-15 Thread GitBox


viirya commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r455521016



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -1099,6 +1101,64 @@ object TypeCoercion {
 DateSub(l, Literal(days))
 }
   }
+
+  /**
+   * Coerces different children of Union to a common set of columns. Note that 
this must be
+   * run before `WidenSetOperationTypes`, because `WidenSetOperationTypes` 
should be run on
+   * correctly resolved column by name.
+   */
+  object UnionCoercion extends TypeCoercionRule {
+private def unionTwoSides(
+left: LogicalPlan, right: LogicalPlan, allowMissingCol: Boolean): 
LogicalPlan = {
+  val resolver = SQLConf.get.resolver
+  val leftOutputAttrs = left.output
+  val rightOutputAttrs = right.output
+
+  // Builds a project list for `right` based on `left` output names
+  val rightProjectList = leftOutputAttrs.map { lattr =>
+rightOutputAttrs.find { rattr => resolver(lattr.name, rattr.name) 
}.getOrElse {
+  if (allowMissingCol) {
+Alias(Literal(null, lattr.dataType), lattr.name)()
+  } else {
+throw new AnalysisException(
+  s"""Cannot resolve column name "${lattr.name}" among """ +
+s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""")
+  }
+}
+  }
+
+  // Delegates failure checks to `CheckAnalysis`
+  val notFoundAttrs = rightOutputAttrs.diff(rightProjectList)
+  val rightChild = Project(rightProjectList ++ notFoundAttrs, right)
+
+  // Builds a project for `logicalPlan` based on `right` output names, if 
allowing
+  // missing columns.
+  val leftChild = if (allowMissingCol) {
+val missingAttrs = notFoundAttrs.map { attr =>
+  Alias(Literal(null, attr.dataType), attr.name)()
+}
+if (missingAttrs.nonEmpty) {
+  Project(leftOutputAttrs ++ missingAttrs, left)
+} else {
+  left
+}
+  } else {
+left
+  }
+  Union(leftChild, rightChild)
+}
+
+override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan 
resolveOperatorsUp {
+  case e if !e.childrenResolved => e
+
+  case Union(children, byName, allowMissingCol)
+  if byName =>
+val union = children.reduceLeft { (left: LogicalPlan, right: 
LogicalPlan) =>
+  unionTwoSides(left, right, allowMissingCol)

Review comment:
   If looks not proper after rethinking, we can also move to other rule or 
create another rule.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


viirya commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659170171


   Is this also memory optimization? But looks like cpu time optimization from 
the description?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r455517786



##
File path: 
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
##
@@ -49,6 +55,26 @@ class MasterWebUI(
   "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
 attachHandler(createRedirectHandler(
   "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+  override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+val hostnames: Seq[String] = Option(req.getParameterValues("host"))
+  .getOrElse(Array[String]()).toSeq
+if (!isDecommissioningRequestAllowed(req)) {
+  resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
+} else {
+  val removedWorkers = 
masterEndpointRef.askSync[Integer](DecommissionHosts(hostnames))
+  logInfo(s"Decommissioning of hosts $hostnames decommissioned 
${removedWorkers} workers")

Review comment:
   nit: `${removedWorkers}` -> `$removedWorkers`

##
File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
##
@@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite
 }
   }
 
+  def testWorkerDecommissioning(
+  numWorkers: Int,
+  numWorkersExpectedToDecom: Int,
+  hostnames: Seq[String]): Unit = {
+val conf = new SparkConf()
+val master = makeAliveMaster(conf)
+val workerRegs = (1 to numWorkers).map{idx =>
+  val worker = new MockWorker(master.self, conf)
+  worker.rpcEnv.setupEndpoint("worker", worker)
+  val workerReg = RegisterWorker(
+worker.id,
+"localhost",
+worker.self.address.port,
+worker.self,
+10,
+1024,
+"http://localhost:8080;,
+RpcAddress("localhost", 1))
+  master.self.send(workerReg)
+  workerReg
+}
+
+eventually(timeout(10.seconds)) {
+  val masterState = 
master.self.askSync[MasterStateResponse](RequestMasterState)
+  assert(masterState.workers.length === numWorkers)
+  assert(masterState.workers.forall(_.state == WorkerState.ALIVE))
+  assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet)
+  masterState.workers

Review comment:
   nit: this is not needed

##
File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
##
@@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite
 }
   }
 
+  def testWorkerDecommissioning(
+  numWorkers: Int,
+  numWorkersExpectedToDecom: Int,
+  hostnames: Seq[String]): Unit = {
+val conf = new SparkConf()
+val master = makeAliveMaster(conf)
+val workerRegs = (1 to numWorkers).map{idx =>
+  val worker = new MockWorker(master.self, conf)
+  worker.rpcEnv.setupEndpoint("worker", worker)
+  val workerReg = RegisterWorker(
+worker.id,
+"localhost",
+worker.self.address.port,
+worker.self,
+10,
+1024,
+"http://localhost:8080;,
+RpcAddress("localhost", 1))
+  master.self.send(workerReg)
+  workerReg
+}
+
+eventually(timeout(10.seconds)) {
+  val masterState = 
master.self.askSync[MasterStateResponse](RequestMasterState)
+  assert(masterState.workers.length === numWorkers)
+  assert(masterState.workers.forall(_.state == WorkerState.ALIVE))
+  assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet)
+  masterState.workers
+}
+
+val decomWorkersCount = 
master.self.askSync[Integer](DecommissionHosts(hostnames))
+assert(decomWorkersCount === numWorkersExpectedToDecom)
+
+// Decommissioning is actually async ... wait for the workers to actually 
be decommissioned by
+// polling the master's state.
+eventually(timeout(10.seconds)) {

Review comment:
   nit: we may want to give a longer timeout to avoid flakyness.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

2020-07-15 Thread GitBox


viirya commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455519563



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
##
@@ -85,7 +85,6 @@ class FMClassifier @Since("3.0.0") (
*/
   @Since("3.0.0")
   def setFactorSize(value: Int): this.type = set(factorSize, value)
-  setDefault(factorSize -> 8)

Review comment:
   Where do the default params of `FMClassifier` move?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-15 Thread GitBox


SparkQA commented on pull request #29126:
URL: https://github.com/apache/spark/pull/29126#issuecomment-659168426


   **[Test build #125936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125936/testReport)**
 for PR 29126 at commit 
[`f943465`](https://github.com/apache/spark/commit/f943465514453ccc7c2ff23965d82baa687cdf9e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659118412


   **[Test build #125926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)**
 for PR 29032 at commit 
[`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659167771


   **[Test build #125926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)**
 for PR 29032 at commit 
[`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

2020-07-15 Thread GitBox


zhengruifeng commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455517498



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
##
@@ -68,6 +68,12 @@ private[clustering] trait BisectingKMeansParams extends 
Params with HasMaxIter
 "The minimum number of points (if >= 1.0) or the minimum proportion " +
   "of points (if < 1.0) of a divisible cluster.", ParamValidators.gt(0.0))
 
+
+  setDefault(

Review comment:
   total nit: make these params in single line, like above places





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659165588


   friendly ping @huaxingao @srowen @viirya 
   
   Different another attempt to save RAM, this should be a clear optimization. 
I found that those methods can not be marked `@tailrec`, so I use while-loop 
instead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162307


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125912/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


SparkQA commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659162409


   **[Test build #125935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125935/testReport)**
 for PR 29015 at commit 
[`3ee87f3`](https://github.com/apache/spark/commit/3ee87f376fe499df8aa710863f5bf6d9648f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659048554


   **[Test build #125912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)**
 for PR 27366 at commit 
[`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


SparkQA commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659161699


   **[Test build #125912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)**
 for PR 27366 at commit 
[`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 edited a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spa

2020-07-15 Thread GitBox


venkata91 edited a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658977388


   > you can make a common function that has most of the code that gets called 
from 2 separate tests. one test passes with dynamic allocation on, the other 
with it off. that will reduce code duplication.
   
   nevermind, I made some changes to the test so that it tests the dynamic 
allocation block of code properly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659159126


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659158107


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual opened a new pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


frankyin-factual opened a new pull request #29129:
URL: https://github.com/apache/spark/pull/29129


   
   
   ### What changes were proposed in this pull request?
   
   put version dependent hive mocks into its own subdirectories. 
   
   ### Why are the changes needed?
   
   Fix broken hive builds
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   This is a fix for tests. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659157488


   @HeartSaVioR @dongjoon-hyun https://github.com/apache/spark/pull/29129



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


agrawaldevesh commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659156968


   Retest this please.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-654385667


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


cloud-fan commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659156705


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


agrawaldevesh commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455424422



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -168,7 +168,10 @@ private[spark] class NettyBlockTransferService(
 // Everything else is encoded using our binary protocol.
 val metadata = 
JavaUtils.bufferToArray(serializer.newInstance().serialize((level, classTag)))
 
-val asStream = blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM)
+// We always transfer shuffle blocks as a stream for simplicity with the 
receiving code since
+// they are always written to disk. Otherwise we check the block size.
+val asStream = (blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) ||

Review comment:
   nit: A parentheses isn't quite needed, but even if it is, then would it 
be easier to read this as:
   
   val asStream = (blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) || blockId.isShuffle

##
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##
@@ -38,7 +38,10 @@ sealed abstract class BlockId {
   // convenience methods
   def asRDDId: Option[RDDBlockId] = if (isRDD) Some(asInstanceOf[RDDBlockId]) 
else None
   def isRDD: Boolean = isInstanceOf[RDDBlockId]
-  def isShuffle: Boolean = isInstanceOf[ShuffleBlockId] || 
isInstanceOf[ShuffleBlockBatchId]
+  def isShuffle: Boolean = {
+(isInstanceOf[ShuffleBlockId] || isInstanceOf[ShuffleBlockBatchId] ||

Review comment:
   nit: Are the parentheses needed ?

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##
@@ -0,0 +1,330 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.util.concurrent.ExecutorService
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.shuffle.{MigratableResolver, ShuffleBlockInfo}
+import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Class to handle block manager decommissioning retries.
+ * It creates a Thread to retry offloading all RDD cache and Shuffle blocks
+ */
+private[storage] class BlockManagerDecommissioner(
+  conf: SparkConf,
+  bm: BlockManager) extends Logging {
+
+  private val maxReplicationFailuresForDecommission =
+conf.get(config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK)
+
+  /**
+   * This runnable consumes any shuffle blocks in the queue for migration. 
This part of a
+   * producer/consumer where the main migration loop updates the queue of 
blocks to be migrated
+   * periodically. On migration failure, the current thread will reinsert the 
block for another
+   * thread to consume. Each thread migrates blocks to a different particular 
executor to avoid
+   * distribute the blocks as quickly as possible without overwhelming any 
particular executor.
+   *
+   * There is no preference for which peer a given block is migrated to.
+   * This is notable different than the RDD cache block migration (further 
down in this file)
+   * which uses the existing priority mechanism for determining where to 
replicate blocks to.
+   * Generally speaking cache blocks are less impactful as they normally 
represent narrow
+   * transformations and we normally have less cache present than shuffle data.
+   *
+   * The producer/consumer model is chosen for shuffle block migration to 
maximize
+   * the chance of migrating all shuffle blocks before the executor is forced 
to exit.
+   */
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(ShuffleBlockInfo, Int)] = None
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659156074


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


agrawaldevesh commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659155613


   jenkins retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun opened a new pull request #29128: [SPARK-XXX][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


williamhyun opened a new pull request #29128:
URL: https://github.com/apache/spark/pull/29128


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659154829


   I am working on a combination of 1) and 2). Will push shortly. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


imback82 commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r455505839



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -60,6 +62,67 @@ case class BroadcastHashJoinExec(
 }
   }
 
+  override lazy val outputPartitioning: Partitioning = {
+joinType match {
+  case _: InnerLike =>
+streamedPlan.outputPartitioning match {
+  case h: HashPartitioning => expandOutputPartitioning(h)
+  case c: PartitioningCollection => expandOutputPartitioning(c)
+  case other => other
+}
+  case _ => streamedPlan.outputPartitioning
+}
+  }
+
+  // An one-to-many mapping from a streamed key to build keys.
+  private lazy val streamedKeyToBuildKeyMapping = {
+val mapping = mutable.Map.empty[Expression, Seq[Expression]]
+streamedKeys.zip(buildKeys).foreach {
+  case (streamedKey, buildKey) =>
+val key = streamedKey.canonicalized
+mapping.get(key) match {
+  case Some(v) => mapping.put(key, v :+ buildKey)
+  case None => mapping.put(key, Seq(buildKey))
+}
+}
+mapping.toMap
+  }
+
+  // Expands the given partitioning collection recursively.
+  private def expandOutputPartitioning(
+  partitioning: PartitioningCollection): PartitioningCollection = {
+PartitioningCollection(partitioning.partitionings.flatMap {
+  case h: HashPartitioning => expandOutputPartitioning(h).partitionings
+  case c: PartitioningCollection => Seq(expandOutputPartitioning(c))
+  case other => Seq(other)
+})
+  }
+
+  // Expands the given hash partitioning by substituting streamed keys with 
build keys.
+  // For example, if the expressions for the given partitioning are Seq("a", 
"b", "c")
+  // where the streamed keys are Seq("b", "c") and the build keys are Seq("x", 
"y"),
+  // the expanded partitioning will have the following expressions:
+  // Seq("a", "b", "c"), Seq("a", "b", "y"), Seq("a", "x", "c"), Seq("a", "x", 
"y").
+  // The expanded expressions are returned as PartitioningCollection.
+  private def expandOutputPartitioning(partitioning: HashPartitioning): 
PartitioningCollection = {
+def generateExprCombinations(
+current: Seq[Expression],
+accumulated: Seq[Expression]): Seq[Seq[Expression]] = {
+  if (current.isEmpty) {
+Seq(accumulated)
+  } else {
+val buildKeys = 
streamedKeyToBuildKeyMapping.get(current.head.canonicalized)
+generateExprCombinations(current.tail, accumulated :+ current.head) ++
+  buildKeys.map { _.flatMap(b => 
generateExprCombinations(current.tail, accumulated :+ b))

Review comment:
   I added a config to limit the expansion. (Please let me know if 
introducing a new config knob is too much. Then I can just have a constant in 
this class - not configurable).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659151759


   Thank you. I'm fine for all combination (including Hive 2.3 only testing). 
Please feel free to choose an option. From my side, this also looks not urgent 
since this is not blocking both GitHub Action and PRBuilder. It has been broken 
over 3 days already. I hope `Hive 1.2` is going to be removed in the near 
future eventually after we build a consensus. Sooner is better.
   
   In short, please proceed toward what you think is right, @HeartSaVioR .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-15 Thread GitBox


HeartSaVioR commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-659149320


   Technically it's a private API, even not tagged as developer API - that 
said, it doesn't break anything in Spark's perspective. If we have confusion 
with availability of `org.apache.spark.sql.execution` package outside of Spark, 
then I'd rather say we may need to reconsider adding `private[execution]` on 
everywhere in the package.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148768


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125918/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-15 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-659148901


   **[Test build #125934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125934/testReport)**
 for PR 27694 at commit 
[`2559928`](https://github.com/apache/spark/commit/2559928be2d7981c2c1c2d9b6111c4449e721310).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spar

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659083280


   **[Test build #125918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125918/testReport)**
 for PR 28287 at commit 
[`e167da5`](https://github.com/apache/spark/commit/e167da53b0b8ffde746c86e0564d7971363f3746).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blac

2020-07-15 Thread GitBox


SparkQA commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148089


   **[Test build #125918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125918/testReport)**
 for PR 28287 at commit 
[`e167da5`](https://github.com/apache/spark/commit/e167da53b0b8ffde746c86e0564d7971363f3746).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659147381


   Thank you for quick updating, @aokolnychyi . Also, thank you all for your 
opinions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659146783


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


HeartSaVioR edited a comment on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659144203


   I guess we have several possible approaches here:
   
   1. place the suite to the Hive-version specific directory (with new config 
on pom.xml to add the test source based on version)
   2. create shim and place the shim to the Hive-version specific directory 
(same)
   3. revert and try to place the suite on the list of running in separate JVM 
(less chance to encounter classloader issue)
   
   I guess the straightforward approach would be 1 - I guess it should work 
smoothly, though if we want to ensure the suite runs in both versions we'll end 
up with duplicating codes. If that doesn't matter much we can just do that, or 
even we can just enable the test on hive 2.3 only (as the suite is technically 
irrelevant to Hive 1.2 vs Hive 2.3).
   
   Less redundant but more complicated would be 2. I'm not 100% sure how much 
complexity is needed to make a shim for the suite, and not sure it worths to do 
instead of simply allowing redundant codes a bit.
   
   The simplest approach would be 3, but it's not guaranteed to fix the 
flakiness. I'd say it'd be better to leave as the last resort.
   
   WDYT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


HeartSaVioR commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659144203


   I guess we have several possible approaches here:
   
   1. place the suite to the Hive-version specific directory (with new config 
on pom.xml to add the test source based on version)
   2. create shim and place the shim to the Hive-version specific directory 
(same)
   3. revert and try to place the suite on the list of running in separate JVM 
(less chance to encounter classloader issue)
   
   I guess the straightforward approach would be 1 - I guess it should work 
smoothly, though if we want to ensure the suite runs in both versions we'll end 
up with duplicating codes. If that doesn't matter much we can just do that, or 
even we can just enable the test on hive 2.3 only (as the suite is technically 
irrelevant to Hive 1.2 vs Hive 2.3).
   
   Less redundant but more complicated would be 2. I'm not 100% sure how much 
complexity is needed to make a shim for the suite, and it worths to do instead 
of simply allowing redundant codes a bit.
   
   The simplest approach would be 3, but it's not guaranteed to fix the 
flakiness. I'd say it'd be better to leave as the last resort.
   
   WDYT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize

2020-07-15 Thread GitBox


leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659144171


   > For example.
   > -- Case 4
   > -- (one column null, other column matches a row in the subquery result -> 
row not returned)
   > SELECT *
   > FROM m
   > WHERE b = 1.0 -- Matches (null, 1.0)
   > AND (a, b) NOT IN (SELECT *
   > FROM s
   > WHERE c IS NOT NULL) -- Matches (0, 1.0), (2, 3.0), (4, null)
   > ;
   > 
   > in this case, i can't not use InternalRow(null, 1.0) to lookup in 
HashedRelation. I need to exclude all null column, and try found match within 
the not null column, which i think HashedRelation is not a suitable structure 
for multi-column support. But if change into multi column and need to deal with 
null column, which means i can't use Hash to lookup, so it will still be a M*N, 
that's no gona help.
   
   ping @maropu on the multi column support conclusion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659142626


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125916/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659142963







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659142963







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-15 Thread GitBox


SparkQA commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-659143085


   **[Test build #125933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125933/testReport)**
 for PR 28904 at commit 
[`247a0a1`](https://github.com/apache/spark/commit/247a0a1259f7a87701b8213d33810d1c63ff1e7b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659142621


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread

2020-07-15 Thread GitBox


SparkQA commented on pull request #29002:
URL: https://github.com/apache/spark/pull/29002#issuecomment-659142838


   **[Test build #125932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125932/testReport)**
 for PR 29002 at commit 
[`d768385`](https://github.com/apache/spark/commit/d768385caac9c79c456de87a4afd72298dda46db).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spar

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659088466


   **[Test build #125921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125921/testReport)**
 for PR 28287 at commit 
[`0d07845`](https://github.com/apache/spark/commit/0d07845459e6e0f606e5f0920c973de256a309e1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659142621







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blac

2020-07-15 Thread GitBox


SparkQA commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659142331


   **[Test build #125921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125921/testReport)**
 for PR 28287 at commit 
[`0d07845`](https://github.com/apache/spark/commit/0d07845459e6e0f606e5f0920c973de256a309e1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659075914


   **[Test build #125916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125916/testReport)**
 for PR 29125 at commit 
[`4518513`](https://github.com/apache/spark/commit/451851373f6eb2db8adffe43669b51be7a30e8c1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


SparkQA commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659142220


   **[Test build #125916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125916/testReport)**
 for PR 29125 at commit 
[`4518513`](https://github.com/apache/spark/commit/451851373f6eb2db8adffe43669b51be7a30e8c1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-15 Thread GitBox


xuanyuanking commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-659141316


   `Well, I guess I already explained why compactLogs is the culprit of the 
memory issue, right? (#28904 (comment))`
   
   Yep that's right. I'm also looking at the code in detail and try to find a 
way both keep this API and have the improvement. If it's hard to achieve, of 
course the improvement has a higher priority. I'll take a closer look today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659141023







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659141023







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-659141249


   @maropu, per the documentation [Spark Project Improvement Proposals 
(SPIP)](http://spark.apache.org/improvement-proposals.html), if you feel like 
it needs an SPIP, it does. I trust your judgement.
   
   I will read it more closely today and provide more feedback. cc @kiszk, 
@srowen, @viirya, @ueshin too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659139670


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125922/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-659139846


   I just saw the comment. Thanks for summarizing @revans2.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659139663


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659139939


   I will also take a look. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


SparkQA commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659139822


   **[Test build #125931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125931/testReport)**
 for PR 29089 at commit 
[`21a84ad`](https://github.com/apache/spark/commit/21a84adb3561788eea0e98c62129127b5bc9d5d5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659139663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659093134


   **[Test build #125922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125922/testReport)**
 for PR 29015 at commit 
[`9ea178b`](https://github.com/apache/spark/commit/9ea178bb5091651f590017cb2b86225ec6f648c0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-659139079







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


SparkQA commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659139325


   **[Test build #125922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125922/testReport)**
 for PR 29015 at commit 
[`9ea178b`](https://github.com/apache/spark/commit/9ea178bb5091651f590017cb2b86225ec6f648c0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-659139079







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >