[GitHub] [spark] AmplabJenkins removed a comment on pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause
AmplabJenkins removed a comment on pull request #28836: URL: https://github.com/apache/spark/pull/28836#issuecomment-644553430 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause
SparkQA removed a comment on pull request #28836: URL: https://github.com/apache/spark/pull/28836#issuecomment-644538711 **[Test build #124097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124097/testReport)** for PR 28836 at commit [`2555c49`](https://github.com/apache/spark/commit/2555c497b501f371d2531deb5d8e8f2d0311d5cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec
SparkQA commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-644557745 **[Test build #124100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124100/testReport)** for PR 28715 at commit [`a85f511`](https://github.com/apache/spark/commit/a85f51121ede087d6a943275e832ba290caa0966). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
maropu commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644563828 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on a change in pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
dbtsai commented on a change in pull request #28788: URL: https://github.com/apache/spark/pull/28788#discussion_r440618141 ## File path: docs/running-on-yarn.md ## @@ -82,6 +82,19 @@ In `cluster` mode, the driver runs on a different machine than the client, so `S Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. Binary distributions can be downloaded from the [downloads page](https://spark.apache.org/downloads.html) of the project website. +There are two variants of Spark binary distributions you can download. One is pre-built with a certain +version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it `with-hadoop` Spark +distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution +doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately. +We call this variant `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, since +it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not +populate Yarn's classpath into Spark. To override this behavior, you can set spark.yarn.populateHadoopClasspath=true. +For `no-hadoop` Spark distribution, Spark will populate Yarn's classpath by default in order to get Hadoop runtime. Note that some features such +as Hive support are not available in `no-hadoop` Spark distribution. For `with-hadoop` Spark distribution, Review comment: Maybe I'm wrong, but I got the impression from @dongjoon-hyun that no-hadoop Spark distribution doesn't support Hive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
AmplabJenkins commented on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644566910 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext
AmplabJenkins commented on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-644567025 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
AmplabJenkins commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644566932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644566919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext
SparkQA removed a comment on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-68725 **[Test build #124079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124079/testReport)** for PR 28784 at commit [`1177e9a`](https://github.com/apache/spark/commit/1177e9a2ab79d90cbf5a494f581ad650c5a21ed3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins removed a comment on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644566919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
AmplabJenkins removed a comment on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644566932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext
AmplabJenkins removed a comment on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-644567025 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
cloud-fan commented on a change in pull request #28824: URL: https://github.com/apache/spark/pull/28824#discussion_r440626757 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -416,38 +418,39 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { // clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. // In this way, the overlap happened w/o Daylight Saving Time. val hkZid = getZoneId("Asia/Hong_Kong") +var expected = "1945-11-18 01:30:00.0" +var ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) +var earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) +var laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) +var overlapInterval = MICROS_PER_HOUR +if (earlierMicros + overlapInterval != laterMicros) { + // Old JDK might have an outdated time zone database. + // See https://bugs.openjdk.java.net/browse/JDK-8228469: "Hong Kong ... Its 1945 transition + // from JST to HKT was on 11-18 at 02:00, not 09-15 at 00:00" + expected = "1945-09-14 23:30:00.0" + ldt = LocalDateTime.of(1945, 9, 14, 23, 30, 0) + earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + // If time zone db doesn't have overlapping at all, set the overlap interval to zero. + overlapInterval = laterMicros - earlierMicros +} +val hkTz = TimeZone.getTimeZone(hkZid) +val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkTz, earlierMicros) +val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkTz, laterMicros) +assert(rebasedEarlierMicros + overlapInterval === rebasedLaterMicros) withDefaultTimeZone(hkZid) { - var expected = "1945-11-18 01:30:00.0" - var ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) - var earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) - var laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) - var overlapInterval = MICROS_PER_HOUR - if (earlierMicros + overlapInterval != laterMicros) { -// Old JDK might have an outdated time zone database. -// See https://bugs.openjdk.java.net/browse/JDK-8228469: "Hong Kong ... Its 1945 transition -// from JST to HKT was on 11-18 at 02:00, not 09-15 at 00:00" -expected = "1945-09-14 23:30:00.0" -ldt = LocalDateTime.of(1945, 9, 14, 23, 30, 0) -earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) -laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) -// If time zone db doesn't have overlapping at all, set the overlap interval to zero. -overlapInterval = laterMicros - earlierMicros - } - val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) - val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString assert(toTsStr(rebasedEarlierMicros) === expected) assert(toTsStr(rebasedLaterMicros) === expected) - assert(rebasedEarlierMicros + overlapInterval === rebasedLaterMicros) // Check optimized rebasing assert(rebaseGregorianToJulianMicros(earlierMicros) === rebasedEarlierMicros) assert(rebaseGregorianToJulianMicros(laterMicros) === rebasedLaterMicros) // Check reverse rebasing assert(rebaseJulianToGregorianMicros(rebasedEarlierMicros) === earlierMicros) assert(rebaseJulianToGregorianMicros(rebasedLaterMicros) === laterMicros) // Check reverse not-optimized rebasing Review comment: does this rely on JVM timezone? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
gengliangwang commented on a change in pull request #28810: URL: https://github.com/apache/spark/pull/28810#discussion_r440629435 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1334,13 +1340,13 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { (rightFilterConditions ++ commonFilterCondition). reduceLeftOption(And).map(Filter(_, newJoin)).getOrElse(newJoin) -case FullOuter => f // DO Nothing for Full Outer Join -case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") -case UsingJoin(_, _) => sys.error("Untransformed Using join node") + +case jt => Review comment: Nit: jt => other ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushCNFPredicateThroughJoin.scala ## @@ -53,9 +60,8 @@ object PushCNFPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelpe Join(newLeft, right, RightOuter, Some(joinCondition), hint) case LeftOuter | LeftAnti | ExistenceJoin(_) => Join(left, newRight, joinType, Some(joinCondition), hint) - case FullOuter => j - case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") - case UsingJoin(_, _) => sys.error("Untransformed Using join node") + case jt => Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
AmplabJenkins commented on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644574932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
SparkQA commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644574869 **[Test build #124104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124104/testReport)** for PR 28810 at commit [`1251b68`](https://github.com/apache/spark/commit/1251b68044611148dcb840d3efa3c39ed9aee18e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
SparkQA removed a comment on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644566345 **[Test build #124103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124103/testReport)** for PR 28824 at commit [`381e950`](https://github.com/apache/spark/commit/381e950d6447238a41af6c766f1c7eae81e585a7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
SparkQA removed a comment on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644566350 **[Test build #124104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124104/testReport)** for PR 28810 at commit [`1251b68`](https://github.com/apache/spark/commit/1251b68044611148dcb840d3efa3c39ed9aee18e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype
SparkQA commented on pull request #28833: URL: https://github.com/apache/spark/pull/28833#issuecomment-644574858 **[Test build #124098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124098/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28799: [SPARK-31871][CORE][WEBUI][2.4] Display the canvas element icon for sorting column
SparkQA commented on pull request #28799: URL: https://github.com/apache/spark/pull/28799#issuecomment-644574872 **[Test build #124099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124099/testReport)** for PR 28799 at commit [`7969e44`](https://github.com/apache/spark/commit/7969e44a0c1de6155d5183717cb2c740094b41e7). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause
AmplabJenkins commented on pull request #28836: URL: https://github.com/apache/spark/pull/28836#issuecomment-644574849 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
SparkQA commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644574873 **[Test build #124101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124101/testReport)** for PR 28838 at commit [`03bf55f`](https://github.com/apache/spark/commit/03bf55fe38a2a70b1b04326fd053c98e82b92d09). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
SparkQA commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644574855 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array types of c
SparkQA commented on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644574859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec
SparkQA commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-644574864 **[Test build #124100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124100/testReport)** for PR 28715 at commit [`a85f511`](https://github.com/apache/spark/commit/a85f51121ede087d6a943275e832ba290caa0966). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause
AmplabJenkins removed a comment on pull request #28836: URL: https://github.com/apache/spark/pull/28836#issuecomment-644574849 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28746: [SPARK-31922][CORE] Fix "RpcEnv already stopped" error when exit spark-shell with local-cluster mode
SparkQA commented on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-644574870 **[Test build #124084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124084/testReport)** for PR 28746 at commit [`a3ed5c1`](https://github.com/apache/spark/commit/a3ed5c1e213e67b6ba84a2e9b3d487cf766c2704). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
AmplabJenkins commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644574926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28799: [SPARK-31871][CORE][WEBUI][2.4] Display the canvas element icon for sorting column
AmplabJenkins commented on pull request #28799: URL: https://github.com/apache/spark/pull/28799#issuecomment-644575068 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array type
AmplabJenkins commented on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644575049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644574884 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec
AmplabJenkins commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-644574965 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype
AmplabJenkins commented on pull request #28833: URL: https://github.com/apache/spark/pull/28833#issuecomment-644574966 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
AmplabJenkins commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644574984 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28837: [SPARK-31996][BUILD] Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1
sarutak commented on pull request #28837: URL: https://github.com/apache/spark/pull/28837#issuecomment-644575986 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec
maropu commented on a change in pull request #28715: URL: https://github.com/apache/spark/pull/28715#discussion_r440636308 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ## @@ -51,6 +51,86 @@ class WholeStageCodegenSuite extends QueryTest with SharedSparkSession assert(df.collect() === Array(Row(9, 4.5))) } + test("SPARK-31897: GenerateExec should be included in WholeStageCodegen") { +import testImplicits._ +val arrayData = Seq(("James", Seq("Java", "Scala"), Map("hair" -> "black", "eye" -> "brown"))) +val df = arrayData.toDF("name", "knownLanguages", "properties") + +// Array - explode +var expDF = df.select($"name", explode($"knownLanguages"), $"properties") +var plan = expDF.queryExecution.executedPlan +assert(plan.find { + case stage: WholeStageCodegenExec => +stage.find(_.isInstanceOf[GenerateExec]).isDefined + case _ => false +}.isDefined) +checkAnswer(expDF, Array(Row("James", "Java", Map("hair" -> "black", "eye" -> "brown")), + Row("James", "Scala", Map("hair" -> "black", "eye" -> "brown" + + +// Map - explode +expDF = df.select($"name", $"knownLanguages", explode($"properties")) +plan = expDF.queryExecution.executedPlan +assert(plan.find { + case stage: WholeStageCodegenExec => +stage.find(_.isInstanceOf[GenerateExec]).isDefined + case _ => false +}.isDefined) +checkAnswer(expDF, + Array(Row("James", List("Java", "Scala"), "hair", "black"), +Row("James", List("Java", "Scala"), "eye", "brown"))) + +// Array - posexplode +expDF = df.select($"name", posexplode($"knownLanguages")) +plan = expDF.queryExecution.executedPlan +assert(plan.find { + case stage: WholeStageCodegenExec => +stage.find(_.isInstanceOf[GenerateExec]).isDefined + case _ => false +}.isDefined) +checkAnswer(expDF, + Array(Row("James", 0, "Java"), Row("James", 1, "Scala"))) + +// Map - posexplode +expDF = df.select($"name", posexplode($"properties")) +plan = expDF.queryExecution.executedPlan +assert(plan.find { + case stage: WholeStageCodegenExec => +stage.find(_.isInstanceOf[GenerateExec]).isDefined + case _ => false +}.isDefined) +checkAnswer(expDF, + Array(Row("James", 0, "hair", "black"), Row("James", 1, "eye", "brown"))) + Review comment: nit the single blank is enough? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
LuciferYang commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644611045 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644629157 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numRowsDroppedByWatermark"
HyukjinKwon commented on pull request #28828: URL: https://github.com/apache/spark/pull/28828#issuecomment-644591847 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numRowsDroppedByWatermark"
HyukjinKwon closed pull request #28828: URL: https://github.com/apache/spark/pull/28828 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
GuoPhilipse commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r440651295 ## File path: sql/hive/src/test/resources/golden/timestamp cast #3-0-732ed232ac592c5e7f7c913a88874fd2 ## @@ -1 +0,0 @@ -1.2 Review comment: Ah.i have removed it this early morning just fews minutes after your comment. it has disappeared in last commit.and the following test also has been finished This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins removed a comment on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644625214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644625214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] prakharjain09 commented on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors
prakharjain09 commented on pull request #28619: URL: https://github.com/apache/spark/pull/28619#issuecomment-644630254 @holdenk @cloud-fan @Dooyoung-Hwang Please review the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add refresh function command
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-644641581 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add refresh function command
AmplabJenkins commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-644641581 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins removed a comment on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644663829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644663829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cchighman opened a new pull request #28841: [SPARK-31962] - Provide option to load files after a specified date when reading from a folder path
cchighman opened a new pull request #28841: URL: https://github.com/apache/spark/pull/28841 ### What changes were proposed in this pull request? A new option, _fileModifiedDate_ , is provided expecting a value in '-MM-DD HH:mm:ss' format. _InMemoryFileIndex_ considers this option during the process of checking checking for files, just before considering applied _PathFilters_. In order to filter file results, a new PathFilter class was derived for this purpose. General house-keeping around classes extending PathFilter was performed for neatness. It became apparent support was needed to handle multiple potential path filters. Logic was introduced for this purpose and the associated tests written. A new method signature was created in order to maintain backwards compatibility and ensure safety of other features. This PR presents a very clean way to minimize complexity under various file data source loading scenarios. It's also compatible with structured streaming requiring just a handful of small additions to move forward there. Looking to complete that in a separate PR. Example Usage: spark.read.format("csv").option("fileModifiedDate","2020-06-15T05:00:00") ### Why are the changes needed? When loading files from a data source, there can often times be thousands of file within a respective file path. In many cases I've seen, we want to start loading from a folder path and ideally be able to begin loading files having modification dates past a certain point. This would mean out of thousands of potential files, only the ones with modification dates greater than the specified timestamp would be considered. This saves a ton of time automatically and reduces significant complexity managing this in code. ### Does this PR introduce _any_ user-facing change? This PR introduces an option that can be used with Spark file data sources similar to the _latestFirst_ option in structured streaming. An documentation update was made to reflect an example and usage of the new data source option. ### How was this patch tested? A handful of new unit tests were written and passing. The package was tested locally as well as in a live Databricks environment as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
yaooqinn commented on a change in pull request #28835: URL: https://github.com/apache/spark/pull/28835#discussion_r440743596 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala ## @@ -94,6 +95,12 @@ private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLC initCompositeService(hiveConf) } + /** + * the super class [[CLIService#start]] starts a useless dummy metastore client, skip it and call + * the ancestor [[CompositeService#start]] directly. + */ + override def start(): Unit = startCompositeService() Review comment: Here we bypass it and start the registered services as CompositeService does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644674346 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
AmplabJenkins removed a comment on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644674346 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
HyukjinKwon commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644674956 Merged to master, branch-3.0 and branch-2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
SparkQA commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644683567 **[Test build #124120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124120/testReport)** for PR 28810 at commit [`f4497a6`](https://github.com/apache/spark/commit/f4497a66375531989e4f6c9c7c38457c2753e799). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype
SparkQA commented on pull request #28833: URL: https://github.com/apache/spark/pull/28833#issuecomment-644683524 **[Test build #124121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124121/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28780: [SPARK-31952][SQL]Fix incorrect memory spill metric when doing Aggregate
cloud-fan commented on pull request #28780: URL: https://github.com/apache/spark/pull/28780#issuecomment-644697788 I'm not familiar with this part. what does "spill (memory)" mean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session
cloud-fan commented on pull request #28778: URL: https://github.com/apache/spark/pull/28778#issuecomment-644710864 Parallelism is a physical concept already. Can you explain more about how you are going to tune the file partition split? what are the problems you hit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440809637 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -539,3 +541,82 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces field in struct by name. + */ +case class WithFields( + structExpr: Expression, + nameExprs: Seq[Expression], + valExprs: Seq[Expression]) extends Unevaluable { + + override def checkInputDataTypes(): TypeCheckResult = { +val expectedStructType = StructType(Nil).typeName +if (structExpr.dataType.typeName != expectedStructType) { + TypeCheckResult.TypeCheckFailure( +"struct argument should be struct type, got: " + structExpr.dataType.catalogString) +} else if (!nameExprs.forall(e => e.foldable && e.dataType == StringType)) { + TypeCheckResult.TypeCheckFailure( +s"nameExprs argument should contain only foldable ${StringType.catalogString} expressions") +} else if (nameExprs.length != valExprs.length) { Review comment: Can we add an assert at the beginning of the class body? This is not a user-facing error that can be triggered by users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
SparkQA removed a comment on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644576078 **[Test build #124105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124105/testReport)** for PR 28824 at commit [`9f4f286`](https://github.com/apache/spark/commit/9f4f2863212ec85dcc89de85d65aa9f40217c4be). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
AmplabJenkins commented on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644731927 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array ty
SparkQA removed a comment on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644583204 **[Test build #124111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124111/testReport)** for PR 28831 at commit [`d75ad5a`](https://github.com/apache/spark/commit/d75ad5aee7d24f4037b950137d529ed2ed3fb1cb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array type
AmplabJenkins commented on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644737298 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/ar
AmplabJenkins removed a comment on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644737298 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
SparkQA commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644636496 **[Test build #124112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124112/testReport)** for PR 28838 at commit [`03bf55f`](https://github.com/apache/spark/commit/03bf55fe38a2a70b1b04326fd053c98e82b92d09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
SparkQA removed a comment on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644616402 **[Test build #124112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124112/testReport)** for PR 28838 at commit [`03bf55f`](https://github.com/apache/spark/commit/03bf55fe38a2a70b1b04326fd053c98e82b92d09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait
AmplabJenkins commented on pull request #28710: URL: https://github.com/apache/spark/pull/28710#issuecomment-644650499 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait
AmplabJenkins removed a comment on pull request #28710: URL: https://github.com/apache/spark/pull/28710#issuecomment-644650499 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
SparkQA commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644663229 **[Test build #124118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124118/testReport)** for PR 28839 at commit [`5dc2886`](https://github.com/apache/spark/commit/5dc28867fa7f1b943acfe4a23a277f987b177af6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28824: [SPARK-31984][SQL] Make micros rebasing functions via local timestamps pure
HyukjinKwon commented on pull request #28824: URL: https://github.com/apache/spark/pull/28824#issuecomment-644680044 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable
cloud-fan commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r440783817 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule coalesces one side of the `SortMergeJoin` if the following conditions are met: + * - Two bucketed tables are joined. + * - The larger bucket number is divisible by the smaller bucket number. + * - COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED is set to true. + * - The ratio of the number of buckets is less than the value set in + * COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO. + */ +case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan] { + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if (large % small == 0 && + large / small <= conf.getConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO)) { + Some(small) +} else { + None +} + } + + private def updateNumCoalescedBuckets(plan: SparkPlan, numCoalescedBuckets: Int): SparkPlan = { +plan.transformUp { + case f: FileSourceScanExec => +f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) +} + } + + def apply(plan: SparkPlan): SparkPlan = { +if (!conf.getConf(SQLConf.COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED)) { + return plan +} + +plan transform { + case ExtractSortMergeJoinWithBuckets(smj, numLeftBuckets, numRightBuckets) +if numLeftBuckets != numRightBuckets => +mayCoalesce(numLeftBuckets, numRightBuckets, conf).map { numCoalescedBuckets => + if (numCoalescedBuckets != numLeftBuckets) { +smj.copy(left = updateNumCoalescedBuckets(smj.left, numCoalescedBuckets)) + } else { +smj.copy(right = updateNumCoalescedBuckets(smj.right, numCoalescedBuckets)) + } +}.getOrElse(smj) + case other => other +} + } +} + +/** + * An extractor that extracts `SortMergeJoinExec` where both sides of the join have the bucketed + * tables and are consisted of only the scan operation. + */ +object ExtractSortMergeJoinWithBuckets { + private def isScanOperation(plan: SparkPlan): Boolean = plan match { +case f: FilterExec => isScanOperation(f.child) +case p: ProjectExec => isScanOperation(p.child) Review comment: @viirya this is a good point! We should apply this optimizer rule more conservatively. For a sort-merge join with join keys `[k1, k2, ...]`, we should coalesce the buckets if the bucket keys are also `[k1, k2, ...]`. The keys can be renamed by Project and we should take care of it. Examples: `t1(bucket by a, b) JOIN t2(bucket by c, d) ON a = c AND b = d` should apply `t1(bucket by a, b) JOIN (SELECT c AS x, d AS y FROM t2(bucket by c, d)) ON a = x AND b = y` should apply `t1(bucket by a) JOIN t2(bucket by c) ON b = d` should not apply This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail:
[GitHub] [spark] SparkQA commented on pull request #28799: [SPARK-31871][CORE][WEBUI][2.4] Display the canvas element icon for sorting column
SparkQA commented on pull request #28799: URL: https://github.com/apache/spark/pull/28799#issuecomment-644717976 **[Test build #124117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124117/testReport)** for PR 28799 at commit [`7969e44`](https://github.com/apache/spark/commit/7969e44a0c1de6155d5183717cb2c740094b41e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bart-samwel commented on pull request #28841: [SPARK-31962][SQL] Provide option to load files after a specified date when reading from a folder path
bart-samwel commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-644719642 The option `fileModifiedDate` doesn't say at all that it's a minimum modified date. I can imagine use cases for lower bounds, upper bounds, ranges. That requires at least two options, e.g. `filesModifiedAfter` and `filesModifiedBefore`. There's also option `pathGlobFilter` which only supports globs, but there as well there may be other use cases, e.g. "files with path names lexicographically larger than a file name", or "files with names that, after parsing, satisfy some interesting condition". It seems to me that this is asking for some more generic filtering functionality. E.g. something like `.fileFilter(lambda)`, where the lambda receives an object argument that has not only the path but also things like the modification date. That said, specific options may be pushed down into the data source (e.g. S3 supports prefix filters and `start-from`), so it would make sense to keep things as options when pushdown might be possible. Based on weighing the options, I would suggest using two options, for min and max. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array types of c
SparkQA commented on pull request #28831: URL: https://github.com/apache/spark/pull/28831#issuecomment-644736363 **[Test build #124111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124111/testReport)** for PR 28831 at commit [`d75ad5a`](https://github.com/apache/spark/commit/d75ad5aee7d24f4037b950137d529ed2ed3fb1cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
SparkQA commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644741067 **[Test build #124124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124124/testReport)** for PR 28839 at commit [`5dc2886`](https://github.com/apache/spark/commit/5dc28867fa7f1b943acfe4a23a277f987b177af6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440823257 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -539,3 +541,82 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces field in struct by name. + */ +case class WithFields( + structExpr: Expression, + nameExprs: Seq[Expression], Review comment: can this be `Seq[String]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644741981 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440823047 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -539,3 +541,82 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces field in struct by name. Review comment: IIUC, now this expression can only add/replace top-level fields? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins removed a comment on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644646541 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
SparkQA removed a comment on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644645074 **[Test build #124116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124116/testReport)** for PR 28839 at commit [`5dc2886`](https://github.com/apache/spark/commit/5dc28867fa7f1b943acfe4a23a277f987b177af6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
AmplabJenkins removed a comment on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644646553 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124116/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28837: [SPARK-31996][BUILD] Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1
SparkQA commented on pull request #28837: URL: https://github.com/apache/spark/pull/28837#issuecomment-644648104 **[Test build #124106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124106/testReport)** for PR 28837 at commit [`6a1e8c0`](https://github.com/apache/spark/commit/6a1e8c0c4f861d52a7408d0f72373099a59455c0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
sarutak commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644662270 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide option to load files after a specified date when reading from a folder path
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-644667095 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide option to load files after a specified date when reading from a folder path
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-64455 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28810: [SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins
SparkQA commented on pull request #28810: URL: https://github.com/apache/spark/pull/28810#issuecomment-644686945 **[Test build #124122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124122/testReport)** for PR 28810 at commit [`6dbff0c`](https://github.com/apache/spark/commit/6dbff0c941df5935103ba2979e4b88723569b3f9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28842: [WIP][SQL] Create date/timestamp formatters once before collect in `hiveResultString()`
SparkQA commented on pull request #28842: URL: https://github.com/apache/spark/pull/28842#issuecomment-644699370 **[Test build #124123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124123/testReport)** for PR 28842 at commit [`a152d94`](https://github.com/apache/spark/commit/a152d94485b9d53521175b5b6574c8233d25e744). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28842: [WIP][SQL] Create date/timestamp formatters once before collect in `hiveResultString()`
AmplabJenkins removed a comment on pull request #28842: URL: https://github.com/apache/spark/pull/28842#issuecomment-644696793 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440806575 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -539,3 +541,82 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces field in struct by name. + */ +case class WithFields( + structExpr: Expression, Review comment: nit: 4 space indentation for class constructor parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440806288 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -22,7 +22,9 @@ import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion} import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.{FUNC_ALIAS, FunctionBuilder} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.catalyst.parser.CatalystSqlParser Review comment: nit: this import is not used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.
SparkQA commented on pull request #28839: URL: https://github.com/apache/spark/pull/28839#issuecomment-644733596 **[Test build #124118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124118/testReport)** for PR 28839 at commit [`5dc2886`](https://github.com/apache/spark/commit/5dc28867fa7f1b943acfe4a23a277f987b177af6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column
cloud-fan commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r440826242 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ## @@ -39,7 +41,18 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] { // Remove redundant field extraction. case GetStructField(createNamedStruct: CreateNamedStruct, ordinal, _) => createNamedStruct.valExprs(ordinal) - + case GetStructField(WithFields(struct, nameExprs, valExprs), ordinal, maybeName) => +val extractFieldName = maybeName.getOrElse( + struct.dataType.asInstanceOf[StructType](ordinal).name) +val resolver = SQLConf.get.resolver +val names = nameExprs.map(e => e.eval().toString) +if (names.exists(n => resolver(n, extractFieldName))) { Review comment: to be more conservative: ``` val matches = names.zip(valExprs).filter { case (name, _) => resolver(name, extractFieldName) } if (matches.length == 1) { matches.head._2 } else { GetStructField(struct, ordinal, Some(extractFieldName)) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28838: [SPARK-31997][SQL][TESTS] Drop test_udtf table when SingleSessionSuite test completed
AmplabJenkins commented on pull request #28838: URL: https://github.com/apache/spark/pull/28838#issuecomment-644636724 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28799: [SPARK-31871][CORE][WEBUI][2.4] Display the canvas element icon for sorting column
SparkQA commented on pull request #28799: URL: https://github.com/apache/spark/pull/28799#issuecomment-644639301 **[Test build #124109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124109/testReport)** for PR 28799 at commit [`7969e44`](https://github.com/apache/spark/commit/7969e44a0c1de6155d5183717cb2c740094b41e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28837: [SPARK-31996][BUILD] Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1
SparkQA removed a comment on pull request #28837: URL: https://github.com/apache/spark/pull/28837#issuecomment-644579363 **[Test build #124106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124106/testReport)** for PR 28837 at commit [`6a1e8c0`](https://github.com/apache/spark/commit/6a1e8c0c4f861d52a7408d0f72373099a59455c0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28837: [SPARK-31996][BUILD] Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1
AmplabJenkins commented on pull request #28837: URL: https://github.com/apache/spark/pull/28837#issuecomment-644648779 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28837: [SPARK-31996][BUILD] Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1
AmplabJenkins removed a comment on pull request #28837: URL: https://github.com/apache/spark/pull/28837#issuecomment-644648779 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
yaooqinn commented on a change in pull request #28835: URL: https://github.com/apache/spark/pull/28835#discussion_r440743061 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala ## @@ -94,6 +95,12 @@ private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, sqlContext: SQLC initCompositeService(hiveConf) } + /** + * the super class [[CLIService#start]] starts a useless dummy metastore client, skip it and call + * the ancestor [[CompositeService#start]] directly. + */ + override def start(): Unit = startCompositeService() Review comment: CLIService will create a metastore connection during start, which is useless for our dummy execution hive conf and will cause class cast issue though different classloader This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore
SparkQA commented on pull request #28835: URL: https://github.com/apache/spark/pull/28835#issuecomment-644673755 **[Test build #124119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124119/testReport)** for PR 28835 at commit [`9cdd7fa`](https://github.com/apache/spark/commit/9cdd7fac27148308a78bd59d39954b8f86984493). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28842: [WIP][SQL] Create date/timestamp formatters once before collect in `hiveResultString()`
AmplabJenkins commented on pull request #28842: URL: https://github.com/apache/spark/pull/28842#issuecomment-644696793 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org