[GitHub] [spark] AmplabJenkins commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning
AmplabJenkins commented on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-659176704 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
jiangxb1987 commented on a change in pull request #29032: URL: https://github.com/apache/spark/pull/29032#discussion_r455525039 ## File path: core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala ## @@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient( if (ExecutorState.isFinished(state)) { listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, workerLost) } else if (state == ExecutorState.DECOMMISSIONED) { - listener.executorDecommissioned(fullId, message.getOrElse("")) + listener.executorDecommissioned(fullId, +ExecutorDecommissionInfo(message.getOrElse(""), isHostDecommissioned = workerLost)) Review comment: oh I see https://github.com/apache/spark/pull/29032#discussion_r455401121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning
c21 commented on pull request #29130: URL: https://github.com/apache/spark/pull/29130#issuecomment-659174967 cc @maropu, @cloud-fan, @gatorsmile and @sameeragarwal if you guys can help take a look. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
jiangxb1987 commented on a change in pull request #29032: URL: https://github.com/apache/spark/pull/29032#discussion_r455524790 ## File path: core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala ## @@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient( if (ExecutorState.isFinished(state)) { listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, workerLost) } else if (state == ExecutorState.DECOMMISSIONED) { - listener.executorDecommissioned(fullId, message.getOrElse("")) + listener.executorDecommissioned(fullId, +ExecutorDecommissionInfo(message.getOrElse(""), isHostDecommissioned = workerLost)) Review comment: how is the flag `isHostDecommissioned` actually used? ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ## @@ -101,7 +101,8 @@ private[spark] trait TaskScheduler { /** * Process a decommissioning executor. */ - def executorDecommission(executorId: String): Unit + def executorDecommission( Review comment: nit: don't leave an empty implementation here. ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -191,9 +191,9 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecutor)) removeExecutor(executorId, reason) - case DecommissionExecutor(executorId) => + case DecommissionExecutor(executorId, decommissionInfo) => logError(s"Received decommission executor message ${executorId}.") Review comment: do we want to also include the decommissionInfo in the error msg? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning
c21 opened a new pull request #29130: URL: https://github.com/apache/spark/pull/29130 ### What changes were proposed in this pull request? Currently `ShuffledHashJoin.outputPartitioning` inherits from `HashJoin.outputPartitioning`, which only preserves stream side partitioning (`HashJoin.scala`): ``` override def outputPartitioning: Partitioning = streamedPlan.outputPartitioning ``` This loses build side partitioning information, and causes extra shuffle if there's another join / group-by after this join. Example: ``` withSQLConf( SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50", SQLConf.SHUFFLE_PARTITIONS.key -> "2", SQLConf.PREFER_SORTMERGEJOIN.key -> "false") { val df1 = spark.range(10).select($"id".as("k1")) val df2 = spark.range(30).select($"id".as("k2")) Seq("inner", "cross").foreach(joinType => { val plan = df1.join(df2, $"k1" === $"k2", joinType).groupBy($"k1").count() .queryExecution.executedPlan assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1) // No extra shuffle before aggregate assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2) }) } ``` Current physical plan (having an extra shuffle on `k1` before aggregate) ``` *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, count#235L]) +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117] +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], output=[k1#220L, count#239L]) +- *(3) Project [k1#220L] +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109] : +- *(1) Project [id#218L AS k1#220L] : +- *(1) Range (0, 10, step=1, splits=2) +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111] +- *(2) Project [id#222L AS k2#224L] +- *(2) Range (0, 30, step=1, splits=2) ``` Ideal physical plan (no shuffle on `k1` before aggregate) ``` *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, count#235L]) +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], output=[k1#220L, count#239L]) +- *(3) Project [k1#220L] +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft :- Exchange hashpartitioning(k1#220L, 2), true, [id=#107] : +- *(1) Project [id#218L AS k1#220L] : +- *(1) Range (0, 10, step=1, splits=2) +- Exchange hashpartitioning(k2#224L, 2), true, [id=#109] +- *(2) Project [id#222L AS k2#224L] +- *(2) Range (0, 30, step=1, splits=2) ``` This can be fixed by overriding `outputPartitioning` method in `ShuffledHashJoinExec`, similar to `SortMergeJoinExec`. In addition, also fix one typo in `HashJoin`, as that code path is shared between broadcast hash join and shuffled hash join. ### Why are the changes needed? To avoid shuffle (for queries having multiple joins or group-by), for saving CPU and IO. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit test in `JoinSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28961: [SPARK-32143][SQL] Prevent a skewed join from producing too many partition splits
SparkQA commented on pull request #28961: URL: https://github.com/apache/spark/pull/28961#issuecomment-659172738 **[Test build #125937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125937/testReport)** for PR 28961 at commit [`3811ae9`](https://github.com/apache/spark/commit/3811ae93c2966d87619624670e338b7c6d34b7d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching
AmplabJenkins commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching
AmplabJenkins removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659112300 **[Test build #125925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)** for PR 29032 at commit [`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-659171262 @viirya This PR is only on cpu. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659171509 **[Test build #125925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)** for PR 29032 at commit [`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase
viirya commented on a change in pull request #29107: URL: https://github.com/apache/spark/pull/29107#discussion_r455521016 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ## @@ -1099,6 +1101,64 @@ object TypeCoercion { DateSub(l, Literal(days)) } } + + /** + * Coerces different children of Union to a common set of columns. Note that this must be + * run before `WidenSetOperationTypes`, because `WidenSetOperationTypes` should be run on + * correctly resolved column by name. + */ + object UnionCoercion extends TypeCoercionRule { +private def unionTwoSides( +left: LogicalPlan, right: LogicalPlan, allowMissingCol: Boolean): LogicalPlan = { + val resolver = SQLConf.get.resolver + val leftOutputAttrs = left.output + val rightOutputAttrs = right.output + + // Builds a project list for `right` based on `left` output names + val rightProjectList = leftOutputAttrs.map { lattr => +rightOutputAttrs.find { rattr => resolver(lattr.name, rattr.name) }.getOrElse { + if (allowMissingCol) { +Alias(Literal(null, lattr.dataType), lattr.name)() + } else { +throw new AnalysisException( + s"""Cannot resolve column name "${lattr.name}" among """ + +s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""") + } +} + } + + // Delegates failure checks to `CheckAnalysis` + val notFoundAttrs = rightOutputAttrs.diff(rightProjectList) + val rightChild = Project(rightProjectList ++ notFoundAttrs, right) + + // Builds a project for `logicalPlan` based on `right` output names, if allowing + // missing columns. + val leftChild = if (allowMissingCol) { +val missingAttrs = notFoundAttrs.map { attr => + Alias(Literal(null, attr.dataType), attr.name)() +} +if (missingAttrs.nonEmpty) { + Project(leftOutputAttrs ++ missingAttrs, left) +} else { + left +} + } else { +left + } + Union(leftChild, rightChild) +} + +override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp { + case e if !e.childrenResolved => e + + case Union(children, byName, allowMissingCol) + if byName => +val union = children.reduceLeft { (left: LogicalPlan, right: LogicalPlan) => + unionTwoSides(left, right, allowMissingCol) Review comment: If looks not proper after rethinking, we can also move to other rule or create another rule. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
viirya commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-659170171 Is this also memory optimization? But looks like cpu time optimization from the description? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching
AmplabJenkins removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching
AmplabJenkins commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
jiangxb1987 commented on a change in pull request #29015: URL: https://github.com/apache/spark/pull/29015#discussion_r455517786 ## File path: core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala ## @@ -49,6 +55,26 @@ class MasterWebUI( "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = Set("POST"))) attachHandler(createRedirectHandler( "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = Set("POST"))) +attachHandler(createServletHandler("/workers/kill", new HttpServlet { + override def doPost(req: HttpServletRequest, resp: HttpServletResponse): Unit = { +val hostnames: Seq[String] = Option(req.getParameterValues("host")) + .getOrElse(Array[String]()).toSeq +if (!isDecommissioningRequestAllowed(req)) { + resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED) +} else { + val removedWorkers = masterEndpointRef.askSync[Integer](DecommissionHosts(hostnames)) + logInfo(s"Decommissioning of hosts $hostnames decommissioned ${removedWorkers} workers") Review comment: nit: `${removedWorkers}` -> `$removedWorkers` ## File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ## @@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite } } + def testWorkerDecommissioning( + numWorkers: Int, + numWorkersExpectedToDecom: Int, + hostnames: Seq[String]): Unit = { +val conf = new SparkConf() +val master = makeAliveMaster(conf) +val workerRegs = (1 to numWorkers).map{idx => + val worker = new MockWorker(master.self, conf) + worker.rpcEnv.setupEndpoint("worker", worker) + val workerReg = RegisterWorker( +worker.id, +"localhost", +worker.self.address.port, +worker.self, +10, +1024, +"http://localhost:8080;, +RpcAddress("localhost", 1)) + master.self.send(workerReg) + workerReg +} + +eventually(timeout(10.seconds)) { + val masterState = master.self.askSync[MasterStateResponse](RequestMasterState) + assert(masterState.workers.length === numWorkers) + assert(masterState.workers.forall(_.state == WorkerState.ALIVE)) + assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet) + masterState.workers Review comment: nit: this is not needed ## File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ## @@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite } } + def testWorkerDecommissioning( + numWorkers: Int, + numWorkersExpectedToDecom: Int, + hostnames: Seq[String]): Unit = { +val conf = new SparkConf() +val master = makeAliveMaster(conf) +val workerRegs = (1 to numWorkers).map{idx => + val worker = new MockWorker(master.self, conf) + worker.rpcEnv.setupEndpoint("worker", worker) + val workerReg = RegisterWorker( +worker.id, +"localhost", +worker.self.address.port, +worker.self, +10, +1024, +"http://localhost:8080;, +RpcAddress("localhost", 1)) + master.self.send(workerReg) + workerReg +} + +eventually(timeout(10.seconds)) { + val masterState = master.self.askSync[MasterStateResponse](RequestMasterState) + assert(masterState.workers.length === numWorkers) + assert(masterState.workers.forall(_.state == WorkerState.ALIVE)) + assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet) + masterState.workers +} + +val decomWorkersCount = master.self.askSync[Integer](DecommissionHosts(hostnames)) +assert(decomWorkersCount === numWorkersExpectedToDecom) + +// Decommissioning is actually async ... wait for the workers to actually be decommissioned by +// polling the master's state. +eventually(timeout(10.seconds)) { Review comment: nit: we may want to give a longer timeout to avoid flakyness. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm
viirya commented on a change in pull request #29112: URL: https://github.com/apache/spark/pull/29112#discussion_r455519563 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -85,7 +85,6 @@ class FMClassifier @Since("3.0.0") ( */ @Since("3.0.0") def setFactorSize(value: Int): this.type = set(factorSize, value) - setDefault(factorSize -> 8) Review comment: Where do the default params of `FMClassifier` move? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
AmplabJenkins commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view
SparkQA commented on pull request #29126: URL: https://github.com/apache/spark/pull/29126#issuecomment-659168426 **[Test build #125936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125936/testReport)** for PR 29126 at commit [`f943465`](https://github.com/apache/spark/commit/f943465514453ccc7c2ff23965d82baa687cdf9e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA removed a comment on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659118412 **[Test build #125926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)** for PR 29032 at commit [`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
SparkQA commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659167771 **[Test build #125926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)** for PR 29032 at commit [`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm
zhengruifeng commented on a change in pull request #29112: URL: https://github.com/apache/spark/pull/29112#discussion_r455517498 ## File path: mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ## @@ -68,6 +68,12 @@ private[clustering] trait BisectingKMeansParams extends Params with HasMaxIter "The minimum number of points (if >= 1.0) or the minimum proportion " + "of points (if < 1.0) of a divisible cluster.", ParamValidators.gt(0.0)) + + setDefault( Review comment: total nit: make these params in single line, like above places This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-659165588 friendly ping @huaxingao @srowen @viirya Different another attempt to save RAM, this should be a clear optimization. I found that those methods can not be marked `@tailrec`, so I use while-loop instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
AmplabJenkins removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659162307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125912/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
AmplabJenkins removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
SparkQA commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659162409 **[Test build #125935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125935/testReport)** for PR 29015 at commit [`3ee87f3`](https://github.com/apache/spark/commit/3ee87f376fe499df8aa710863f5bf6d9648f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
AmplabJenkins commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
SparkQA removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659048554 **[Test build #125912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)** for PR 27366 at commit [`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
SparkQA commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659161699 **[Test build #125912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)** for PR 27366 at commit [`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 edited a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spa
venkata91 edited a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-658977388 > you can make a common function that has most of the code that gets called from 2 separate tests. one test passes with dynamic allocation on, the other with it off. that will reduce code duplication. nevermind, I made some changes to the test so that it tests the dynamic allocation block of code properly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659159126 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory
AmplabJenkins commented on pull request #29129: URL: https://github.com/apache/spark/pull/29129#issuecomment-659158107 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory
AmplabJenkins removed a comment on pull request #29129: URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory
AmplabJenkins commented on pull request #29129: URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual opened a new pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory
frankyin-factual opened a new pull request #29129: URL: https://github.com/apache/spark/pull/29129 ### What changes were proposed in this pull request? put version dependent hive mocks into its own subdirectories. ### Why are the changes needed? Fix broken hive builds ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is a fix for tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
frankyin-factual commented on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659157488 @HeartSaVioR @dongjoon-hyun https://github.com/apache/spark/pull/29129 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
agrawaldevesh commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659156968 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-654385667 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
cloud-fan commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-659156705 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
agrawaldevesh commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r455424422 ## File path: core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala ## @@ -168,7 +168,10 @@ private[spark] class NettyBlockTransferService( // Everything else is encoded using our binary protocol. val metadata = JavaUtils.bufferToArray(serializer.newInstance().serialize((level, classTag))) -val asStream = blockData.size() > conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) +// We always transfer shuffle blocks as a stream for simplicity with the receiving code since +// they are always written to disk. Otherwise we check the block size. +val asStream = (blockData.size() > conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) || Review comment: nit: A parentheses isn't quite needed, but even if it is, then would it be easier to read this as: val asStream = (blockData.size() > conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) || blockId.isShuffle ## File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala ## @@ -38,7 +38,10 @@ sealed abstract class BlockId { // convenience methods def asRDDId: Option[RDDBlockId] = if (isRDD) Some(asInstanceOf[RDDBlockId]) else None def isRDD: Boolean = isInstanceOf[RDDBlockId] - def isShuffle: Boolean = isInstanceOf[ShuffleBlockId] || isInstanceOf[ShuffleBlockBatchId] + def isShuffle: Boolean = { +(isInstanceOf[ShuffleBlockId] || isInstanceOf[ShuffleBlockBatchId] || Review comment: nit: Are the parentheses needed ? ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala ## @@ -0,0 +1,330 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.util.concurrent.ExecutorService + +import scala.collection.JavaConverters._ +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.shuffle.{MigratableResolver, ShuffleBlockInfo} +import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock +import org.apache.spark.util.ThreadUtils + +/** + * Class to handle block manager decommissioning retries. + * It creates a Thread to retry offloading all RDD cache and Shuffle blocks + */ +private[storage] class BlockManagerDecommissioner( + conf: SparkConf, + bm: BlockManager) extends Logging { + + private val maxReplicationFailuresForDecommission = +conf.get(config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK) + + /** + * This runnable consumes any shuffle blocks in the queue for migration. This part of a + * producer/consumer where the main migration loop updates the queue of blocks to be migrated + * periodically. On migration failure, the current thread will reinsert the block for another + * thread to consume. Each thread migrates blocks to a different particular executor to avoid + * distribute the blocks as quickly as possible without overwhelming any particular executor. + * + * There is no preference for which peer a given block is migrated to. + * This is notable different than the RDD cache block migration (further down in this file) + * which uses the existing priority mechanism for determining where to replicate blocks to. + * Generally speaking cache blocks are less impactful as they normally represent narrow + * transformations and we normally have less cache present than shuffle data. + * + * The producer/consumer model is chosen for shuffle block migration to maximize + * the chance of migrating all shuffle blocks before the executor is forced to exit. + */ + private class ShuffleMigrationRunnable(peer: BlockManagerId) extends Runnable { +@volatile var running = true +override def run(): Unit = { + var migrating: Option[(ShuffleBlockInfo, Int)] = None + logInfo(s"Starting migration thread for ${peer}") + // Once a block fails to transfer to an executor stop trying to transfer more blocks + try { +
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
AmplabJenkins removed a comment on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
AmplabJenkins commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659156074 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
agrawaldevesh commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659155613 jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
AmplabJenkins commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun opened a new pull request #29128: [SPARK-XXX][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES
williamhyun opened a new pull request #29128: URL: https://github.com/apache/spark/pull/29128 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
frankyin-factual commented on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659154829 I am working on a combination of 1) and 2). Will push shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
AmplabJenkins removed a comment on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
AmplabJenkins commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning
imback82 commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455505839 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +62,67 @@ case class BroadcastHashJoinExec( } } + override lazy val outputPartitioning: Partitioning = { +joinType match { + case _: InnerLike => +streamedPlan.outputPartitioning match { + case h: HashPartitioning => expandOutputPartitioning(h) + case c: PartitioningCollection => expandOutputPartitioning(c) + case other => other +} + case _ => streamedPlan.outputPartitioning +} + } + + // An one-to-many mapping from a streamed key to build keys. + private lazy val streamedKeyToBuildKeyMapping = { +val mapping = mutable.Map.empty[Expression, Seq[Expression]] +streamedKeys.zip(buildKeys).foreach { + case (streamedKey, buildKey) => +val key = streamedKey.canonicalized +mapping.get(key) match { + case Some(v) => mapping.put(key, v :+ buildKey) + case None => mapping.put(key, Seq(buildKey)) +} +} +mapping.toMap + } + + // Expands the given partitioning collection recursively. + private def expandOutputPartitioning( + partitioning: PartitioningCollection): PartitioningCollection = { +PartitioningCollection(partitioning.partitionings.flatMap { + case h: HashPartitioning => expandOutputPartitioning(h).partitionings + case c: PartitioningCollection => Seq(expandOutputPartitioning(c)) + case other => Seq(other) +}) + } + + // Expands the given hash partitioning by substituting streamed keys with build keys. + // For example, if the expressions for the given partitioning are Seq("a", "b", "c") + // where the streamed keys are Seq("b", "c") and the build keys are Seq("x", "y"), + // the expanded partitioning will have the following expressions: + // Seq("a", "b", "c"), Seq("a", "b", "y"), Seq("a", "x", "c"), Seq("a", "x", "y"). + // The expanded expressions are returned as PartitioningCollection. + private def expandOutputPartitioning(partitioning: HashPartitioning): PartitioningCollection = { +def generateExprCombinations( +current: Seq[Expression], +accumulated: Seq[Expression]): Seq[Seq[Expression]] = { + if (current.isEmpty) { +Seq(accumulated) + } else { +val buildKeys = streamedKeyToBuildKeyMapping.get(current.head.canonicalized) +generateExprCombinations(current.tail, accumulated :+ current.head) ++ + buildKeys.map { _.flatMap(b => generateExprCombinations(current.tail, accumulated :+ b)) Review comment: I added a config to limit the expansion. (Please let me know if introducing a new config knob is too much. Then I can just have a constant in this class - not configurable). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
dongjoon-hyun commented on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659151759 Thank you. I'm fine for all combination (including Hive 2.3 only testing). Please feel free to choose an option. From my side, this also looks not urgent since this is not blocking both GitHub Action and PRBuilder. It has been broken over 3 days already. I hope `Hive 1.2` is going to be removed in the near future eventually after we build a consensus. Sooner is better. In short, please proceed toward what you think is right, @HeartSaVioR . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
AmplabJenkins removed a comment on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
AmplabJenkins commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
HeartSaVioR commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659149320 Technically it's a private API, even not tagged as developer API - that said, it doesn't break anything in Spark's perspective. If we have confusion with availability of `org.apache.spark.sql.execution` package outside of Spark, then I'd rather say we may need to reconsider adding `private[execution]` on everywhere in the package. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t
AmplabJenkins removed a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659148768 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125918/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-659148901 **[Test build #125934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125934/testReport)** for PR 27694 at commit [`2559928`](https://github.com/apache/spark/commit/2559928be2d7981c2c1c2d9b6111c4449e721310). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t
AmplabJenkins removed a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'
AmplabJenkins commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spar
SparkQA removed a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659083280 **[Test build #125918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125918/testReport)** for PR 28287 at commit [`e167da5`](https://github.com/apache/spark/commit/e167da53b0b8ffde746c86e0564d7971363f3746). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blac
SparkQA commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659148089 **[Test build #125918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125918/testReport)** for PR 28287 at commit [`e167da5`](https://github.com/apache/spark/commit/e167da53b0b8ffde746c86e0564d7971363f3746). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-659147381 Thank you for quick updating, @aokolnychyi . Also, thank you all for your opinions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
HyukjinKwon commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659146783 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
HeartSaVioR edited a comment on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659144203 I guess we have several possible approaches here: 1. place the suite to the Hive-version specific directory (with new config on pom.xml to add the test source based on version) 2. create shim and place the shim to the Hive-version specific directory (same) 3. revert and try to place the suite on the list of running in separate JVM (less chance to encounter classloader issue) I guess the straightforward approach would be 1 - I guess it should work smoothly, though if we want to ensure the suite runs in both versions we'll end up with duplicating codes. If that doesn't matter much we can just do that, or even we can just enable the test on hive 2.3 only (as the suite is technically irrelevant to Hive 1.2 vs Hive 2.3). Less redundant but more complicated would be 2. I'm not 100% sure how much complexity is needed to make a shim for the suite, and not sure it worths to do instead of simply allowing redundant codes a bit. The simplest approach would be 3, but it's not guaranteed to fix the flakiness. I'd say it'd be better to leave as the last resort. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
HeartSaVioR commented on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659144203 I guess we have several possible approaches here: 1. place the suite to the Hive-version specific directory (with new config on pom.xml to add the test source based on version) 2. create shim and place the shim to the Hive-version specific directory (same) 3. revert and try to place the suite on the list of running in separate JVM (less chance to encounter classloader issue) I guess the straightforward approach would be 1 - I guess it should work smoothly, though if we want to ensure the suite runs in both versions we'll end up with duplicating codes. If that doesn't matter much we can just do that, or even we can just enable the test on hive 2.3 only (as the suite is technically irrelevant to Hive 1.2 vs Hive 2.3). Less redundant but more complicated would be 2. I'm not 100% sure how much complexity is needed to make a shim for the suite, and it worths to do instead of simply allowing redundant codes a bit. The simplest approach would be 3, but it's not guaranteed to fix the flakiness. I'd say it'd be better to leave as the last resort. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] NotInSubquery SingleColumn Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-659144171 > For example. > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) > AND (a, b) NOT IN (SELECT * > FROM s > WHERE c IS NOT NULL) -- Matches (0, 1.0), (2, 3.0), (4, null) > ; > > in this case, i can't not use InternalRow(null, 1.0) to lookup in HashedRelation. I need to exclude all null column, and try found match within the not null column, which i think HashedRelation is not a suitable structure for multi-column support. But if change into multi column and need to deal with null column, which means i can't use Hash to lookup, so it will still be a M*N, that's no gona help. ping @maropu on the multi column support conclusion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
AmplabJenkins removed a comment on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659142626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125916/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t
AmplabJenkins removed a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659142963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'
AmplabJenkins commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659142963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
SparkQA commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659143085 **[Test build #125933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125933/testReport)** for PR 28904 at commit [`247a0a1`](https://github.com/apache/spark/commit/247a0a1259f7a87701b8213d33810d1c63ff1e7b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
AmplabJenkins removed a comment on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659142621 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29002: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
SparkQA commented on pull request #29002: URL: https://github.com/apache/spark/pull/29002#issuecomment-659142838 **[Test build #125932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125932/testReport)** for PR 29002 at commit [`d768385`](https://github.com/apache/spark/commit/d768385caac9c79c456de87a4afd72298dda46db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spar
SparkQA removed a comment on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659088466 **[Test build #125921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125921/testReport)** for PR 28287 at commit [`0d07845`](https://github.com/apache/spark/commit/0d07845459e6e0f606e5f0920c973de256a309e1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
AmplabJenkins commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659142621 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blac
SparkQA commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659142331 **[Test build #125921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125921/testReport)** for PR 28287 at commit [`0d07845`](https://github.com/apache/spark/commit/0d07845459e6e0f606e5f0920c973de256a309e1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
SparkQA removed a comment on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659075914 **[Test build #125916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125916/testReport)** for PR 29125 at commit [`4518513`](https://github.com/apache/spark/commit/451851373f6eb2db8adffe43669b51be7a30e8c1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
SparkQA commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659142220 **[Test build #125916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125916/testReport)** for PR 29125 at commit [`4518513`](https://github.com/apache/spark/commit/451851373f6eb2db8adffe43669b51be7a30e8c1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
xuanyuanking commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659141316 `Well, I guess I already explained why compactLogs is the culprit of the memory issue, right? (#28904 (comment))` Yep that's right. I'm also looking at the code in detail and try to find a way both keep this API and have the improvement. If it's hard to achieve, of course the improvement has a higher priority. I'll take a closer look today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659141023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659141023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
HyukjinKwon commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659141249 @maropu, per the documentation [Spark Project Improvement Proposals (SPIP)](http://spark.apache.org/improvement-proposals.html), if you feel like it needs an SPIP, it does. I trust your judgement. I will read it more closely today and provide more feedback. cc @kiszk, @srowen, @viirya, @ueshin too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
AmplabJenkins removed a comment on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659139670 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125922/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
HyukjinKwon commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659139846 I just saw the comment. Thanks for summarizing @revans2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
AmplabJenkins removed a comment on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659139663 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite
frankyin-factual commented on pull request #29069: URL: https://github.com/apache/spark/pull/29069#issuecomment-659139939 I will also take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
SparkQA commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-659139822 **[Test build #125931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125931/testReport)** for PR 29089 at commit [`21a84ad`](https://github.com/apache/spark/commit/21a84adb3561788eea0e98c62129127b5bc9d5d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
AmplabJenkins commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659139663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
SparkQA removed a comment on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659093134 **[Test build #125922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125922/testReport)** for PR 29015 at commit [`9ea178b`](https://github.com/apache/spark/commit/9ea178bb5091651f590017cb2b86225ec6f648c0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group
AmplabJenkins removed a comment on pull request #28977: URL: https://github.com/apache/spark/pull/28977#issuecomment-659139079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
SparkQA commented on pull request #29015: URL: https://github.com/apache/spark/pull/29015#issuecomment-659139325 **[Test build #125922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125922/testReport)** for PR 29015 at commit [`9ea178b`](https://github.com/apache/spark/commit/9ea178bb5091651f590017cb2b86225ec6f648c0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group
AmplabJenkins commented on pull request #28977: URL: https://github.com/apache/spark/pull/28977#issuecomment-659139079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org