[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
SparkQA commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621686236 **[Test build #122121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122121/testReport)** for PR 28379 at commit [`232be9c`](https://github.com/apache/spark/commit/232be9c5e6021996e45b3217d8ed78e468b11106). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621686317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
gatorsmile commented on a change in pull request #28407: URL: https://github.com/apache/spark/pull/28407#discussion_r417823026 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -159,17 +146,36 @@ object CTESubstitution extends Rule[LogicalPlan] { } } + private def resolveCTERelations( + relations: Seq[(String, SubqueryAlias)], + isLegacy: Boolean): Seq[(String, LogicalPlan)] = { +val resolvedCTERelations = new mutable.ArrayBuffer[(String, LogicalPlan)](relations.size) +for ((name, relation) <- relations) { + val innerCTEResolved = if (isLegacy) { +// In legacy mode, outer CTE relations take precedence, so substitute relations later. +relation + } else { +// A CTE definition might contain an inner CTE that has priority, so traverse and Review comment: "has priority" -> "has a higher priority" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
AmplabJenkins removed a comment on pull request #28420: URL: https://github.com/apache/spark/pull/28420#issuecomment-621675201 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
yaooqinn commented on a change in pull request #28420: URL: https://github.com/apache/spark/pull/28420#discussion_r417827135 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -1206,7 +1207,7 @@ case class DatetimeSub( interval: Expression, child: Expression) extends RuntimeReplaceable { override def toString: String = s"$start - $interval" - override def sql: String = s"${start.sql} - ${interval.sql}" + override def sql: String = s"${toPrettySQL(start)} - ${toPrettySQL(interval)}" Review comment: OK, let me see ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -1206,7 +1207,7 @@ case class DatetimeSub( interval: Expression, child: Expression) extends RuntimeReplaceable { override def toString: String = s"$start - $interval" - override def sql: String = s"${start.sql} - ${interval.sql}" + override def sql: String = s"${toPrettySQL(start)} - ${toPrettySQL(interval)}" Review comment: OK, let me see This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
SparkQA commented on pull request #28420: URL: https://github.com/apache/spark/pull/28420#issuecomment-621678519 **[Test build #122123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122123/testReport)** for PR 28420 at commit [`3e66c25`](https://github.com/apache/spark/commit/3e66c25c82ca2408139e38466410717a212e737d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
cloud-fan commented on a change in pull request #28420: URL: https://github.com/apache/spark/pull/28420#discussion_r417825867 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -1206,7 +1207,7 @@ case class DatetimeSub( interval: Expression, child: Expression) extends RuntimeReplaceable { override def toString: String = s"$start - $interval" - override def sql: String = s"${start.sql} - ${interval.sql}" + override def sql: String = s"${toPrettySQL(start)} - ${toPrettySQL(interval)}" Review comment: This means `RuntimeReplaceable.sql` always use pretty sql, while other expressions only use pretty sql when the caller side needs to (by calling `toPrettySQL`). Is it possible to make them consistent? Like make `start` and `interval` as `innerChildren` and handle them in `usePrettyExpression`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
viirya commented on a change in pull request #28407: URL: https://github.com/apache/spark/pull/28407#discussion_r417825103 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -159,17 +146,36 @@ object CTESubstitution extends Rule[LogicalPlan] { } } + private def resolveCTERelations( + relations: Seq[(String, SubqueryAlias)], + isLegacy: Boolean): Seq[(String, LogicalPlan)] = { +val resolvedCTERelations = new mutable.ArrayBuffer[(String, LogicalPlan)](relations.size) +for ((name, relation) <- relations) { + val innerCTEResolved = if (isLegacy) { +// In legacy mode, outer CTE relations take precedence, so substitute relations later. +relation + } else { +// A CTE definition might contain an inner CTE that has priority, so traverse and +// substitute CTE defined in `relation` first. +traverseAndSubstituteCTE(relation) + } + // CTE definition can reference a previous one + resolvedCTERelations += (name -> substituteCTE(innerCTEResolved, resolvedCTERelations)) Review comment: For legacy case, `innerCTEResolved` might contain an inner `WITH`, but seems `substituteCTE` doesn't remove `WITH`. Then in later `substituteCTE`s, will we result some untouched `WITH`s in the final query plan ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
viirya commented on a change in pull request #28407: URL: https://github.com/apache/spark/pull/28407#discussion_r417825103 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -159,17 +146,36 @@ object CTESubstitution extends Rule[LogicalPlan] { } } + private def resolveCTERelations( + relations: Seq[(String, SubqueryAlias)], + isLegacy: Boolean): Seq[(String, LogicalPlan)] = { +val resolvedCTERelations = new mutable.ArrayBuffer[(String, LogicalPlan)](relations.size) +for ((name, relation) <- relations) { + val innerCTEResolved = if (isLegacy) { +// In legacy mode, outer CTE relations take precedence, so substitute relations later. +relation + } else { +// A CTE definition might contain an inner CTE that has priority, so traverse and +// substitute CTE defined in `relation` first. +traverseAndSubstituteCTE(relation) + } + // CTE definition can reference a previous one + resolvedCTERelations += (name -> substituteCTE(innerCTEResolved, resolvedCTERelations)) Review comment: For legacy case, `innerCTEResolved` might contain an inner `WITH`, but seems `substituteCTE` doesn't remove `WITH`. Then in later `substituteCTE`s, will we result some `WITH`s in the final query plan ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
AmplabJenkins commented on pull request #28420: URL: https://github.com/apache/spark/pull/28420#issuecomment-621675201 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
SparkQA commented on pull request #28420: URL: https://github.com/apache/spark/pull/28420#issuecomment-621674699 **[Test build #122122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122122/testReport)** for PR 28420 at commit [`c0fd2a6`](https://github.com/apache/spark/commit/c0fd2a620d86aa1977db008e313bfaccdf54cff9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeRepl…
yaooqinn opened a new pull request #28420: URL: https://github.com/apache/spark/pull/28420 …aceable expressions ### What changes were proposed in this pull request? The RuntimeReplaceable ones are runtime replaceable, thus, their original parameters are not going to be resolved to PrettyAttribute and remain debug style string if we directly implement their `sql` methods with their parameters' `sql` methods. This PR is raised with suggestions by @maropu and @cloud-fan https://github.com/apache/spark/pull/28402/files#r417656589. In this PR, we re-implement the `sql` methods of the RuntimeReplaceable ones with toPettySQL ### Why are the changes needed? Consistency of schema output between RuntimeReplaceable expressions and normal ones. For example, `date_format` vs `to_timestamp`, before this PR, they output differently Before ```sql select date_format(timestamp '2019-10-06', '-MM-dd ') struct select to_timestamp("2019-10-06S10:11:12.12345", "-MM-dd'S'HH:mm:ss.SS") struct ``` After ```sql select date_format(timestamp '2019-10-06', '-MM-dd ') struct select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''") struct ### Does this PR introduce _any_ user-facing change? Yes, the schema output style changed for the runtime replaceable expressions as shown in the above example ### How was this patch tested? regenerate all related tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins removed a comment on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621664257 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins removed a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-621664205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-621664205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621664257 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
dilipbiswal commented on a change in pull request #28407: URL: https://github.com/apache/spark/pull/28407#discussion_r417810157 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -159,17 +146,36 @@ object CTESubstitution extends Rule[LogicalPlan] { } } + private def resolveCTERelations( + relations: Seq[(String, SubqueryAlias)], + isLegacy: Boolean): Seq[(String, LogicalPlan)] = { +val resolvedCTERelations = new mutable.ArrayBuffer[(String, LogicalPlan)](relations.size) +for ((name, relation) <- relations) { + val innerCTEResolved = if (isLegacy) { Review comment: @cloud-fan Just trying to understand. innerCTEResolved indicates a already resolved CTE or the one we are going to resolve in the subsequent call to substituteCTE ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
SparkQA commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-621663547 **[Test build #122120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122120/testReport)** for PR 28395 at commit [`bdd77fe`](https://github.com/apache/spark/commit/bdd77fecc886a7b94a66c8c3dfd27f8923c22015). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
SparkQA commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621663562 **[Test build #122121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122121/testReport)** for PR 28379 at commit [`232be9c`](https://github.com/apache/spark/commit/232be9c5e6021996e45b3217d8ed78e468b11106). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
zhengruifeng commented on a change in pull request #28349: URL: https://github.com/apache/spark/pull/28349#discussion_r417808922 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ## @@ -154,31 +156,56 @@ class LinearSVC @Since("2.2.0") ( def setAggregationDepth(value: Int): this.type = set(aggregationDepth, value) setDefault(aggregationDepth -> 2) + /** + * Set block size for stacking input data in matrices. Review comment: The choice of size needs tuning, it depends on dataset sparsity and numFeatures, Increasing it may not always increases performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417809101 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +1, +1, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + + } + + class MDCAwareRunnable(proxy: Runnable) extends Runnable { +val callerThreadMDC: util.Map[String, String] = getMDCMap + +@inline +private def getMDCMap: util.Map[String, String] = { + org.slf4j.MDC.getCopyOfContextMap match { +case null => new util.HashMap[String, String]() +case m => m + } +} + +override def run(): Unit = { + val threadMDC = getMDCMap + org.slf4j.MDC.setContextMap(callerThreadMDC) + try { +proxy.run() + } finally { +org.slf4j.MDC.setContextMap(threadMDC) Review comment: Yep This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417808971 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417808501 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417808048 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -104,7 +104,7 @@ private[spark] class Executor( .setNameFormat("Executor task launch worker-%d") .setThreadFactory((r: Runnable) => new UninterruptibleThread(r, "unused")) .build() - Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor] +ThreadUtils.newCachedThreadPool(threadFactory) Review comment: 1. I think in any case it better to reuse the same code in all places. 2. in case later someone will add MDC set in the caller thread it will not pass without that. so I think it good to use it here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string
xuanyuanking commented on a change in pull request #28393: URL: https://github.com/apache/spark/pull/28393#discussion_r417806531 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -519,13 +520,13 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { for (index <- 0 until line.length) { if (line.charAt(index) == '\'' && !insideComment) { // take a look to see if it is escaped Review comment: Yep, maybe we can rephrase [this comment](https://github.com/apache/spark/pull/28393/files#diff-f7aac41bf732c1ba1edbac436d331a55R510) here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-621660613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
AmplabJenkins removed a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-621660613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster
SparkQA commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-621660096 **[Test build #122119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122119/testReport)** for PR 28412 at commit [`7e3c39e`](https://github.com/apache/spark/commit/7e3c39eda716d51c13e594487443262c515b07d0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621657311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621657311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
AmplabJenkins commented on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621657317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
AmplabJenkins removed a comment on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621657317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621656650 **[Test build #122118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122118/testReport)** for PR 28123 at commit [`148a6fc`](https://github.com/apache/spark/commit/148a6fced050bdf138634c589608396ccdd430ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
SparkQA commented on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621656609 **[Test build #122117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122117/testReport)** for PR 28408 at commit [`e966895`](https://github.com/apache/spark/commit/e966895be7044b7ad8bf7e4582497efcb60d113d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621655308 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28419: [R] small tidying of sh scripts for R
AmplabJenkins removed a comment on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621653502 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122116/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621653771 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122102/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units
AmplabJenkins removed a comment on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-621653466 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122113/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
AmplabJenkins removed a comment on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621653656 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122108/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins removed a comment on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621653443 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122115/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins removed a comment on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621653434 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
cloud-fan commented on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621654109 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28419: [R] small tidying of sh scripts for R
SparkQA removed a comment on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621647256 **[Test build #122116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122116/testReport)** for PR 28419 at commit [`fb355ea`](https://github.com/apache/spark/commit/fb355ea9f385df33c83fb84388eead89867d2724). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
SparkQA removed a comment on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621605413 **[Test build #122108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122108/testReport)** for PR 28408 at commit [`e966895`](https://github.com/apache/spark/commit/e966895be7044b7ad8bf7e4582497efcb60d113d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621653766 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units
AmplabJenkins removed a comment on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-621653451 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28419: [R] small tidying of sh scripts for R
AmplabJenkins removed a comment on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621653497 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units
SparkQA removed a comment on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-621631387 **[Test build #122113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122113/testReport)** for PR 28418 at commit [`cf42504`](https://github.com/apache/spark/commit/cf425045cc83a31af45ec325c9bf1c5cb5b73eee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
AmplabJenkins removed a comment on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621653648 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
cloud-fan edited a comment on pull request #28407: URL: https://github.com/apache/spark/pull/28407#issuecomment-621642692 > Just a side note that due to its eager way of substitution it can also cause performance degradation with queries where a CTE is defined but never actually used. Yea I thought about it as well. It's still doable if I change the map type to `Map[String, PlanHolder]` where `PlanHolder` can lazily calculate the plan. However, I feel it's too rare to have CTE relations defined but not used, and may not worth it. And CTE relation itself should not be very complex, so even if we do a substitution unnecessarily, mostly it doesn't matter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
SparkQA removed a comment on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621644153 **[Test build #122115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122115/testReport)** for PR 28379 at commit [`3e3f68a`](https://github.com/apache/spark/commit/3e3f68ae4d62adeb775aa404dcb13332f567cb5e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
SparkQA removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621581634 **[Test build #122102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122102/testReport)** for PR 28123 at commit [`148a6fc`](https://github.com/apache/spark/commit/148a6fced050bdf138634c589608396ccdd430ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
AmplabJenkins commented on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621653648 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621653766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-621653362 **[Test build #122102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122102/testReport)** for PR 28123 at commit [`148a6fc`](https://github.com/apache/spark/commit/148a6fced050bdf138634c589608396ccdd430ff). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28419: [R] small tidying of sh scripts for R
AmplabJenkins commented on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621653497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28408: [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
SparkQA commented on pull request #28408: URL: https://github.com/apache/spark/pull/28408#issuecomment-621653368 **[Test build #122108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122108/testReport)** for PR 28408 at commit [`e966895`](https://github.com/apache/spark/commit/e966895be7044b7ad8bf7e4582497efcb60d113d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
SparkQA commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621653370 **[Test build #122115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122115/testReport)** for PR 28379 at commit [`3e3f68a`](https://github.com/apache/spark/commit/3e3f68ae4d62adeb775aa404dcb13332f567cb5e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621653434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units
AmplabJenkins commented on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-621653451 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417797723 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +1, +1, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + + } + + class MDCAwareRunnable(proxy: Runnable) extends Runnable { +val callerThreadMDC: util.Map[String, String] = getMDCMap + +@inline +private def getMDCMap: util.Map[String, String] = { + org.slf4j.MDC.getCopyOfContextMap match { +case null => new util.HashMap[String, String]() +case m => m + } +} + +override def run(): Unit = { + val threadMDC = getMDCMap + org.slf4j.MDC.setContextMap(callerThreadMDC) + try { +proxy.run() + } finally { +org.slf4j.MDC.setContextMap(threadMDC) Review comment: so `setContextMap` doesn't accept null? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests
xuanyuanking commented on pull request #28390: URL: https://github.com/apache/spark/pull/28390#issuecomment-621653199 Thanks for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28419: [R] small tidying of sh scripts for R
SparkQA commented on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621653372 **[Test build #122116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122116/testReport)** for PR 28419 at commit [`fb355ea`](https://github.com/apache/spark/commit/fb355ea9f385df33c83fb84388eead89867d2724). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units
SparkQA commented on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-621653364 **[Test build #122113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122113/testReport)** for PR 28418 at commit [`cf42504`](https://github.com/apache/spark/commit/cf425045cc83a31af45ec325c9bf1c5cb5b73eee). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417797121 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( Review comment: ditto ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417796843 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,101 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + new MDCAwareThreadPoolExecutor( Review comment: Can we add a comment to say: this needs to be synced with `Executors.newCachedThreadPool`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417795307 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -674,6 +677,18 @@ private[spark] class Executor( } } + private def setMDCForTask(taskDescription: TaskDescription): Unit = { +val properties = taskDescription.properties + +org.slf4j.MDC.put("appId", properties.getProperty("spark.app.id")) +org.slf4j.MDC.put("appName", properties.getProperty("spark.app.name")) Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][core] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r417795158 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -104,7 +104,7 @@ private[spark] class Executor( .setNameFormat("Executor task launch worker-%d") .setThreadFactory((r: Runnable) => new UninterruptibleThread(r, "unused")) .build() - Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor] +ThreadUtils.newCachedThreadPool(threadFactory) Review comment: Do we need this change? This thread pool is used to execute `TaskRunner`, which already sets MDC by itself, in https://github.com/apache/spark/pull/26624/files#diff-5a0de266c82b95adb47d9bca714e1f1bR380 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string
dilipbiswal commented on a change in pull request #28393: URL: https://github.com/apache/spark/pull/28393#discussion_r417794434 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -519,13 +520,13 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { for (index <- 0 until line.length) { if (line.charAt(index) == '\'' && !insideComment) { // take a look to see if it is escaped -if (!escape) { +if (!escape && !insideDoubleQuote) { // flip the boolean variable insideSingleQuote = !insideSingleQuote } } else if (line.charAt(index) == '\"' && !insideComment) { // take a look to see if it is escaped Review comment: @adrian-wang Same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string
dilipbiswal commented on a change in pull request #28393: URL: https://github.com/apache/spark/pull/28393#discussion_r417794207 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -519,13 +520,13 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { for (index <- 0 until line.length) { if (line.charAt(index) == '\'' && !insideComment) { // take a look to see if it is escaped Review comment: @adrian-wang Should we update the comment to reflect the newly added condition? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28419: [R] small tidying of sh scripts for R
AmplabJenkins removed a comment on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621647788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28419: [R] small tidying of sh scripts for R
AmplabJenkins commented on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621647788 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28419: [R] small tidying of sh scripts for R
SparkQA commented on pull request #28419: URL: https://github.com/apache/spark/pull/28419#issuecomment-621647256 **[Test build #122116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122116/testReport)** for PR 28419 at commit [`fb355ea`](https://github.com/apache/spark/commit/fb355ea9f385df33c83fb84388eead89867d2724). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MichaelChirico opened a new pull request #28419: [R] small tidying of sh scripts for R
MichaelChirico opened a new pull request #28419: URL: https://github.com/apache/spark/pull/28419 ### What changes were proposed in this pull request? Some tidying of `sh` scripts in `R/` ### Why are the changes needed? Not strictly needed, but the `'devtools' %in% installed.packages()` line in particular is "improper" / proabbly slow ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins removed a comment on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621644697 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on a change in pull request #28280: [SPARK-31438][CORE][WIP] Support JobCleaned Status in SparkListener
stczwd commented on a change in pull request #28280: URL: https://github.com/apache/spark/pull/28280#discussion_r417788796 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ## @@ -99,6 +99,12 @@ case class InsertIntoHiveTable( try { processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, tmpLocation, child) +} catch { + case e: Throwable => +// Make sure tmp path deleted while getting Exception before sc.runJob +deleteExternalTmpPath(hadoopConf) +throw new SparkException( + s"Failed inserting ubti table ${table.identifier.quotedString}", e) Review comment: sorry, wrong word This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on a change in pull request #28280: [SPARK-31438][CORE][WIP] Support JobCleaned Status in SparkListener
stczwd commented on a change in pull request #28280: URL: https://github.com/apache/spark/pull/28280#discussion_r417788682 ## File path: core/src/main/scala/org/apache/spark/scheduler/JobCleanedHookListener.scala ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import scala.collection.mutable.HashMap + +import org.apache.spark.internal.Logging + +/** + * JobCleanedHookListener is a basic job cleaned listener. It holds jobCleanedHooks for + * jobs and run cleaned hook after a job is cleaned. + */ +class JobCleanedHookListener extends SparkListener with Logging { Review comment: Yes, I am doing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
AmplabJenkins commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621644697 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
AmplabJenkins removed a comment on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-621644053 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
AmplabJenkins commented on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-621644053 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3
SparkQA commented on pull request #28379: URL: https://github.com/apache/spark/pull/28379#issuecomment-621644153 **[Test build #122115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122115/testReport)** for PR 28379 at commit [`3e3f68a`](https://github.com/apache/spark/commit/3e3f68ae4d62adeb775aa404dcb13332f567cb5e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
SparkQA removed a comment on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-621585838 **[Test build #122103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122103/testReport)** for PR 28349 at commit [`e02a86e`](https://github.com/apache/spark/commit/e02a86e3182028a0f1278079eb9c6cfa8d73f916). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
SparkQA commented on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-621643162 **[Test build #122103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122103/testReport)** for PR 28349 at commit [`e02a86e`](https://github.com/apache/spark/commit/e02a86e3182028a0f1278079eb9c6cfa8d73f916). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
cloud-fan edited a comment on pull request #28407: URL: https://github.com/apache/spark/pull/28407#issuecomment-621642692 > Just a side note that due to its eager way of substitution it can also cause performance degradation with queries where a CTE is defined but never actually used. Yea I thought about it as well. It's still doable if I change the map type to `Map[String, PlanHolder]` where `PlanHolder` can lazily calculate the plan. However, I feel it's too rare to have CTE relations defined but not used, and may not worth it. And CTE relation itself should not be very complex, so even if we do a substitution necessarily, mostly it doesn't matter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28407: [SPARK-31607][SQL] Improve the perf of CTESubstitution
cloud-fan commented on pull request #28407: URL: https://github.com/apache/spark/pull/28407#issuecomment-621642692 > Just a side note that due to its eager way of substitution it can also cause performance degradation with queries where a CTE is defined but never actually used. Yea I thought about it as well. It's still doable if I change the map type to `Map[String, PlanHolder]` where `PlanHolder` can lazily calculate the plan. However, I feel it's too rare to have CTE relations defined but not used, and may not worth it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #28329: [SPARK-31554][SQL][TESTS] Retry flaky tests from CliSuite
MaxGekk commented on pull request #28329: URL: https://github.com/apache/spark/pull/28329#issuecomment-621642370 It seems @juliuszsompolski 's PR https://github.com/apache/spark/pull/28156 fixed the issue. I am closing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
AmplabJenkins removed a comment on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621637681 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests
cloud-fan commented on pull request #28390: URL: https://github.com/apache/spark/pull/28390#issuecomment-621638010 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
SparkQA commented on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621637596 **[Test build #122114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122114/testReport)** for PR 28417 at commit [`5b8b221`](https://github.com/apache/spark/commit/5b8b221745cea747e087ccdcc870296d8b88bda5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
SparkQA removed a comment on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621633791 **[Test build #122114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122114/testReport)** for PR 28417 at commit [`5b8b221`](https://github.com/apache/spark/commit/5b8b221745cea747e087ccdcc870296d8b88bda5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
AmplabJenkins commented on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621637681 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28402: [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations
yaooqinn commented on a change in pull request #28402: URL: https://github.com/apache/spark/pull/28402#discussion_r417778568 ## File path: sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out ## @@ -689,7 +689,7 @@ select interval '2-2' year to month + dateval from interval_arithmetic -- !query schema -struct +struct Review comment: thank you all, I will fix this ASAP. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27978: [SPARK-31127][ML] Implement abstract Selector
AmplabJenkins removed a comment on pull request #27978: URL: https://github.com/apache/spark/pull/27978#issuecomment-621634148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122106/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
AmplabJenkins removed a comment on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621634224 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
AmplabJenkins commented on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621634224 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27978: [SPARK-31127][ML] Implement abstract Selector
AmplabJenkins removed a comment on pull request #27978: URL: https://github.com/apache/spark/pull/27978#issuecomment-621634142 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector
AmplabJenkins commented on pull request #27978: URL: https://github.com/apache/spark/pull/27978#issuecomment-621634142 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27978: [SPARK-31127][ML] Implement abstract Selector
SparkQA removed a comment on pull request #27978: URL: https://github.com/apache/spark/pull/27978#issuecomment-621596096 **[Test build #122106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122106/testReport)** for PR 27978 at commit [`363d382`](https://github.com/apache/spark/commit/363d382c82cc28037e38c71d7d804cc3c325be98). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up
SparkQA commented on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-621633791 **[Test build #122114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122114/testReport)** for PR 28417 at commit [`5b8b221`](https://github.com/apache/spark/commit/5b8b221745cea747e087ccdcc870296d8b88bda5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27978: [SPARK-31127][ML] Implement abstract Selector
SparkQA commented on pull request #27978: URL: https://github.com/apache/spark/pull/27978#issuecomment-621633725 **[Test build #122106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122106/testReport)** for PR 27978 at commit [`363d382`](https://github.com/apache/spark/commit/363d382c82cc28037e38c71d7d804cc3c325be98). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org