[GitHub] [spark] JkSelf commented on a change in pull request #31994: [SPARK-34899][SQL] Use origin plan if we can not coalesce shuffle partition
JkSelf commented on a change in pull request #31994: URL: https://github.com/apache/spark/pull/31994#discussion_r603801610 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1543,4 +1543,27 @@ class AdaptiveQueryExecSuite assert(materializeLogs(0).startsWith("Materialize query stage BroadcastQueryStageExec")) assert(materializeLogs(1).startsWith("Materialize query stage ShuffleQueryStageExec")) } + + test("SPARK-34899: Use origin plan if we can not coalesce shuffle partition") { +def check(ds: Dataset[Row], origin: ShuffleOrigin): Unit = { + ds.collect() + val plan = ds.queryExecution.executedPlan.asInstanceOf[AdaptiveSparkPlanExec].executedPlan + assert(collect(plan) { +case c: CustomShuffleReaderExec => c + }.isEmpty) + assert(collect(plan) { +case s: ShuffleExchangeExec if s.shuffleOrigin == origin && s.numPartitions == 3 => s + }.size == 1) + checkAnswer(ds, testData) +} + +withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", + SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true", + SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "10", + SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1", + SQLConf.SHUFFLE_PARTITIONS.key -> "3") { + check(testData.repartition(), REPARTITION) Review comment: It is better to add the partition size of each partition in the comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
AmplabJenkins commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809933240 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136694/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AngersZh commented on a change in pull request #30212: URL: https://github.com/apache/spark/pull/30212#discussion_r603801288 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -907,29 +907,74 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } /** - * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan. + * Add an [[Aggregate]] to a logical plan. */ private def withAggregationClause( ctx: AggregationClauseContext, selectExpressions: Seq[NamedExpression], query: LogicalPlan): LogicalPlan = withOrigin(ctx) { -val groupByExpressions = expressionList(ctx.groupingExpressions) - -if (ctx.GROUPING != null) { - // GROUP BY GROUPING SETS (...) - val selectedGroupByExprs = -ctx.groupingSet.asScala.map(_.expression.asScala.map(e => expression(e)).toSeq) - GroupingSets(selectedGroupByExprs.toSeq, groupByExpressions, query, selectExpressions) -} else { - // GROUP BY (WITH CUBE | WITH ROLLUP)? - val mappedGroupByExpressions = if (ctx.CUBE != null) { -Seq(Cube(groupByExpressions)) - } else if (ctx.ROLLUP != null) { -Seq(Rollup(groupByExpressions)) +if (ctx.groupingExpressionsWithGroupingAnalytics.isEmpty) { + val groupByExpressions = expressionList(ctx.groupingExpressions) + if (ctx.GROUPING != null) { +// GROUP BY GROUPING SETS (...) +// `GROUP BY warehouse, product GROUPING SETS((warehouse, producets), (warehouse))` is +// semantically equivalent to `GROUP BY GROUPING SETS((warehouse, produce), (warehouse))`. +// Under this grammar, the fields appearing in `GROUPING SETS`'s groupingSets must be a +// subset of the columns appearing in group by expression. +val groupingSets = + ctx.groupingSet.asScala.map(_.expression.asScala.map(e => expression(e)).toSeq) +Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)), + selectExpressions, query) } else { -groupByExpressions +// GROUP BY (WITH CUBE | WITH ROLLUP)? +val mappedGroupByExpressions = if (ctx.CUBE != null) { + Seq(Cube(groupByExpressions.map(Seq(_ +} else if (ctx.ROLLUP != null) { + Seq(Rollup(groupByExpressions.map(Seq(_ +} else { + groupByExpressions +} +Aggregate(mappedGroupByExpressions, selectExpressions, query) + } +} else { + val groupByExpressions = +ctx.groupingExpressionsWithGroupingAnalytics.asScala + .map(groupByExpr => { +val groupingAnalytics = groupByExpr.groupingAnalytics +if (groupingAnalytics != null) { + val groupingSets = groupingAnalytics.groupingSet.asScala +.map(_.expression.asScala.map(e => expression(e)).toSeq) + if (groupingAnalytics.CUBE != null) { +// CUBE(A, B, (A, B), ()) is not supported. +if (groupingSets.exists(_.isEmpty)) { + throw new ParseException("Empty set in CUBE grouping sets is not supported.", +groupingAnalytics) +} +Cube(groupingSets.toSeq) + } else if (groupingAnalytics.ROLLUP != null) { +// ROLLUP(A, B, (A, B), ()) is not supported. +if (groupingSets.exists(_.isEmpty)) { + throw new ParseException("Empty set in ROLLUP grouping sets is not supported.", +groupingAnalytics) +} +Rollup(groupingSets.toSeq) + } else { +assert(groupingAnalytics.GROUPING != null && groupingAnalytics.SETS != null) +GroupingSets(groupingSets.toSeq, + groupingSets.flatten.distinct.toSeq) + } +} else { + expression(groupByExpr.expression) +} + }) + val (groupingSet, expressions) = groupByExpressions.partition(_.isInstanceOf[GroupingSet]) + if ((expressions.nonEmpty && groupingSet.nonEmpty) || groupingSet.size > 1) { +throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " + Review comment: > could you handle the two error cases separately? Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
SparkQA removed a comment on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809930044 **[Test build #136694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)** for PR 31680 at commit [`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
SparkQA commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809933218 **[Test build #136694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)** for PR 31680 at commit [`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
AmplabJenkins commented on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809931402 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136677/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins removed a comment on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809930849 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136678/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
SparkQA removed a comment on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809839890 **[Test build #136677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136677/testReport)** for PR 31986 at commit [`3e8dd5c`](https://github.com/apache/spark/commit/3e8dd5ccd8c2c136f5c3a4ff64269267edbdf81e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
SparkQA commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809930044 **[Test build #136694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)** for PR 31680 at commit [`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
AmplabJenkins removed a comment on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809931402 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136677/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
SparkQA commented on pull request #32001: URL: https://github.com/apache/spark/pull/32001#issuecomment-809928523 **[Test build #136693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136693/testReport)** for PR 32001 at commit [`b1cc8ce`](https://github.com/apache/spark/commit/b1cc8ce7fdd345ff841c676afd33cf2e47984588). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809930849 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136678/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
HyukjinKwon commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809930745 @wangyum, don't worry. There are few more issues to address (https://github.com/apache/spark/pull/31886#discussion_r603307756). It will take more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
SparkQA commented on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809930442 **[Test build #136677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136677/testReport)** for PR 31986 at commit [`3e8dd5c`](https://github.com/apache/spark/commit/3e8dd5ccd8c2c136f5c3a4ff64269267edbdf81e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UpdatingSessionsExec(` * `class UpdatingSessionsIterator(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
SparkQA removed a comment on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809839949 **[Test build #136678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136678/testReport)** for PR 31886 at commit [`0a19585`](https://github.com/apache/spark/commit/0a195850bc0e7ad97efd245ebcfc438ea604e770). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
SparkQA commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809929804 **[Test build #136678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136678/testReport)** for PR 31886 at commit [`0a19585`](https://github.com/apache/spark/commit/0a195850bc0e7ad97efd245ebcfc438ea604e770). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
AngersZh commented on a change in pull request #31680: URL: https://github.com/apache/spark/pull/31680#discussion_r603797915 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ## @@ -34,7 +34,7 @@ class HiveContext private[hive](_sparkSession: SparkSession) self => def this(sc: SparkContext) = { - this(SparkSession.builder().sparkContext(HiveUtils.withHiveExternalCatalog(sc)).getOrCreate()) Review comment: > `withHiveExternalCatalog` is used only for tests after this merged, so could you move it into a suitable place? Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
AngersZh commented on pull request #32001: URL: https://github.com/apache/spark/pull/32001#issuecomment-809928048 FYI @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
wangyum commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809927983 Please wait me a few hours. I will verify the result first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType
AngersZh opened a new pull request #32001: URL: https://github.com/apache/spark/pull/32001 ### What changes were proposed in this pull request? Before this pr, if we want to cast LongType to DayTimeIntervalType will got error ``` [info] org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(a AS DAY-TIME INTERVAL)' due to data type mismatch: cannot cast bigint to day-time interval; [info] 'Project [cast(a#4L as day-time interval) AS b#6] [info] +- Project [value#1L AS a#4L] [info]+- LocalRelation [value#1L] ``` Since DayTimeIntervalType store value as Long and YearMonthIntervalType store value as Int, in this pr we support cast between them. ### Why are the changes needed? User can cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType ``` SELECT cast(123L to day-time interval) ``` ### Does this PR introduce _any_ user-facing change? Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType ### How was this patch tested? added UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
AmplabJenkins removed a comment on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809926279 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41267/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809926265 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41267/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
AmplabJenkins commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809926279 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41267/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
AmplabJenkins removed a comment on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-809926081 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41268/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
HyukjinKwon commented on a change in pull request #31886: URL: https://github.com/apache/spark/pull/31886#discussion_r603795209 ## File path: .github/workflows/build_and_test.yml ## @@ -428,3 +428,41 @@ jobs: - name: Build with SBT run: | ./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Phadoop-2.7 compile test:compile + + tpcds-1g: +name: Run TPC-DS queries with SF=1 +runs-on: ubuntu-20.04 +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 +- name: Checkout TPC-DS (SF=1) generated data repository + uses: actions/checkout@v2 + with: +repository: maropu/spark-tpcds-sf-1 Review comment: Thanks @maropu. Just to clarify, do you need https://github.com/databricks/spark-sql-perf/pull/196 too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
SparkQA commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809926159 **[Test build #136692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136692/testReport)** for PR 31680 at commit [`858390b`](https://github.com/apache/spark/commit/858390be282e313af43c3ac7c7ef9f87d4972b0a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
AmplabJenkins commented on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-809926081 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41268/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint
SparkQA commented on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-809925995 **[Test build #136691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136691/testReport)** for PR 31984 at commit [`7edd25c`](https://github.com/apache/spark/commit/7edd25c82e3919f4d7f738f39e3f147aa5a0a849). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
SparkQA commented on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-809926064 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41268/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809925942 **[Test build #136690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136690/testReport)** for PR 31987 at commit [`4d9c724`](https://github.com/apache/spark/commit/4d9c724a20cb799e229b80133a6bfd8887990d5a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec
SparkQA commented on pull request #32000: URL: https://github.com/apache/spark/pull/32000#issuecomment-809925892 **[Test build #136689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136689/testReport)** for PR 32000 at commit [`a5b32b4`](https://github.com/apache/spark/commit/a5b32b46b54c5b021e8f50f0c3759f9669f43fe8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
AmplabJenkins removed a comment on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-809925234 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136680/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809925235 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41269/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
AmplabJenkins removed a comment on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809925236 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41270/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AmplabJenkins removed a comment on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809925237 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136684/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown
AmplabJenkins removed a comment on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-809925233 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136682/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809925235 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41269/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
AmplabJenkins commented on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809925236 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41270/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AmplabJenkins commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809925237 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136684/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
AmplabJenkins commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-809925234 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136680/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown
AmplabJenkins commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-809925233 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136682/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
maropu commented on a change in pull request #31680: URL: https://github.com/apache/spark/pull/31680#discussion_r603794244 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ## @@ -34,7 +34,7 @@ class HiveContext private[hive](_sparkSession: SparkSession) self => def this(sc: SparkContext) = { - this(SparkSession.builder().sparkContext(HiveUtils.withHiveExternalCatalog(sc)).getOrCreate()) Review comment: `withHiveExternalCatalog` is used only for tests after this merged, so could you move it into a suitable place? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
SparkQA commented on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-809922595 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41268/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809922449 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41267/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
SparkQA commented on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-80994 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41270/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809920635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA removed a comment on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809881217 **[Test build #136684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136684/testReport)** for PR 30212 at commit [`9b658b3`](https://github.com/apache/spark/commit/9b658b3ab4617b748b2e954a1ef30e67ea9abb22). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809920231 **[Test build #136684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136684/testReport)** for PR 30212 at commit [`9b658b3`](https://github.com/apache/spark/commit/9b658b3ab4617b748b2e954a1ef30e67ea9abb22). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
SparkQA removed a comment on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-809860209 **[Test build #136680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136680/testReport)** for PR 31992 at commit [`02ad9de`](https://github.com/apache/spark/commit/02ad9de7ece49e294c618b4067d3cd5017ba68ca). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately
SparkQA commented on pull request #31992: URL: https://github.com/apache/spark/pull/31992#issuecomment-809918966 **[Test build #136680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136680/testReport)** for PR 31992 at commit [`02ad9de`](https://github.com/apache/spark/commit/02ad9de7ece49e294c618b4067d3cd5017ba68ca). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31983: [WIP][SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
maropu commented on pull request #31983: URL: https://github.com/apache/spark/pull/31983#issuecomment-809918862 cc: @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31983: [WIP][SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates
maropu commented on pull request #31983: URL: https://github.com/apache/spark/pull/31983#issuecomment-809918608 branch-2.4/3.0/3.1 has the same issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown
SparkQA removed a comment on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-809862832 **[Test build #136682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136682/testReport)** for PR 29642 at commit [`2310a69`](https://github.com/apache/spark/commit/2310a69cc30dda338e4c5c7f4d1ca2ca03371c30). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown
SparkQA commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-809917727 **[Test build #136682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136682/testReport)** for PR 29642 at commit [`2310a69`](https://github.com/apache/spark/commit/2310a69cc30dda338e4c5c7f4d1ca2ca03371c30). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #31958: [SPARK-34862][SQL] Support nested column in ORC vectorized reader
c21 commented on pull request #31958: URL: https://github.com/apache/spark/pull/31958#issuecomment-809915080 @cloud-fan and @viirya could you help take a look? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
maropu commented on a change in pull request #30212: URL: https://github.com/apache/spark/pull/30212#discussion_r603786655 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -907,29 +907,74 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } /** - * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan. + * Add an [[Aggregate]] to a logical plan. */ private def withAggregationClause( ctx: AggregationClauseContext, selectExpressions: Seq[NamedExpression], query: LogicalPlan): LogicalPlan = withOrigin(ctx) { -val groupByExpressions = expressionList(ctx.groupingExpressions) - -if (ctx.GROUPING != null) { - // GROUP BY GROUPING SETS (...) - val selectedGroupByExprs = -ctx.groupingSet.asScala.map(_.expression.asScala.map(e => expression(e)).toSeq) - GroupingSets(selectedGroupByExprs.toSeq, groupByExpressions, query, selectExpressions) -} else { - // GROUP BY (WITH CUBE | WITH ROLLUP)? - val mappedGroupByExpressions = if (ctx.CUBE != null) { -Seq(Cube(groupByExpressions)) - } else if (ctx.ROLLUP != null) { -Seq(Rollup(groupByExpressions)) +if (ctx.groupingExpressionsWithGroupingAnalytics.isEmpty) { + val groupByExpressions = expressionList(ctx.groupingExpressions) + if (ctx.GROUPING != null) { +// GROUP BY GROUPING SETS (...) +// `GROUP BY warehouse, product GROUPING SETS((warehouse, producets), (warehouse))` is +// semantically equivalent to `GROUP BY GROUPING SETS((warehouse, produce), (warehouse))`. +// Under this grammar, the fields appearing in `GROUPING SETS`'s groupingSets must be a +// subset of the columns appearing in group by expression. +val groupingSets = + ctx.groupingSet.asScala.map(_.expression.asScala.map(e => expression(e)).toSeq) +Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)), + selectExpressions, query) } else { -groupByExpressions +// GROUP BY (WITH CUBE | WITH ROLLUP)? +val mappedGroupByExpressions = if (ctx.CUBE != null) { + Seq(Cube(groupByExpressions.map(Seq(_ +} else if (ctx.ROLLUP != null) { + Seq(Rollup(groupByExpressions.map(Seq(_ +} else { + groupByExpressions +} +Aggregate(mappedGroupByExpressions, selectExpressions, query) + } +} else { + val groupByExpressions = +ctx.groupingExpressionsWithGroupingAnalytics.asScala + .map(groupByExpr => { +val groupingAnalytics = groupByExpr.groupingAnalytics +if (groupingAnalytics != null) { + val groupingSets = groupingAnalytics.groupingSet.asScala +.map(_.expression.asScala.map(e => expression(e)).toSeq) + if (groupingAnalytics.CUBE != null) { +// CUBE(A, B, (A, B), ()) is not supported. +if (groupingSets.exists(_.isEmpty)) { + throw new ParseException("Empty set in CUBE grouping sets is not supported.", +groupingAnalytics) +} +Cube(groupingSets.toSeq) + } else if (groupingAnalytics.ROLLUP != null) { +// ROLLUP(A, B, (A, B), ()) is not supported. +if (groupingSets.exists(_.isEmpty)) { + throw new ParseException("Empty set in ROLLUP grouping sets is not supported.", +groupingAnalytics) +} +Rollup(groupingSets.toSeq) + } else { +assert(groupingAnalytics.GROUPING != null && groupingAnalytics.SETS != null) +GroupingSets(groupingSets.toSeq, + groupingSets.flatten.distinct.toSeq) + } +} else { + expression(groupByExpr.expression) +} + }) + val (groupingSet, expressions) = groupByExpressions.partition(_.isInstanceOf[GroupingSet]) + if ((expressions.nonEmpty && groupingSet.nonEmpty) || groupingSet.size > 1) { +throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " + Review comment: could you handle the two error cases separately? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation
wangyum commented on a change in pull request #31984: URL: https://github.com/apache/spark/pull/31984#discussion_r603786414 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala ## @@ -247,13 +246,15 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper with Join // otherwise the pruning will not trigger var partScan = getPartitionTableScan(l, left) if (partScan.isDefined && canPruneLeft(joinType) && -hasPartitionPruningFilter(right)) { +hasPartitionPruningFilter(right) && +(canBroadcastBySize(right, conf) || hintToBroadcastRight(hint))) { Review comment: Actually we already have a test: https://github.com/apache/spark/blob/56b18386ab71566480777660f52d4913aa319a2f/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala#L1117 Anyway, I added a new test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
MaxGekk commented on a change in pull request #31996: URL: https://github.com/apache/spark/pull/31996#discussion_r603786193 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2350,6 +2350,13 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled") Review comment: The question is more complex because there is no feature parity of new intervals with the existing one. We cannot disable `CalendarIntervalType` completely at the moment. Also the legacy config means that we cannot expose it to users and mention in our public docs. But I guess we will have to mention it in docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec
c21 commented on pull request #32000: URL: https://github.com/apache/spark/pull/32000#issuecomment-809911684 cc @HyukjinKwon could you help take a look when you have time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec
c21 opened a new pull request #32000: URL: https://github.com/apache/spark/pull/32000 ### What changes were proposed in this pull request? This PR is a followup change to address comments in https://github.com/apache/spark/pull/31413#discussion_r603280965 and https://github.com/apache/spark/pull/31413#discussion_r603296475 . Minor change in `FileSourceScanExec`. No actual logic change here. ### Why are the changes needed? Better readability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
AngersZh commented on a change in pull request #31996: URL: https://github.com/apache/spark/pull/31996#discussion_r603781518 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2350,6 +2350,13 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled") Review comment: +1, Looks like only spark have type `CalendarIntervalType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
AngersZh commented on a change in pull request #31996: URL: https://github.com/apache/spark/pull/31996#discussion_r603781518 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2350,6 +2350,13 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled") Review comment: +1, Looks like only spark have type `CalendarIntervalType`. In other engine like hive, presto etc, they use day time interval and year month interval too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation
AmplabJenkins removed a comment on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-809905612 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41265/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins removed a comment on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809905627 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136675/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation
AmplabJenkins commented on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-809905612 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41265/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
AmplabJenkins commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809905627 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136675/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation
SparkQA commented on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-809905601 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41265/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
cloud-fan commented on a change in pull request #31996: URL: https://github.com/apache/spark/pull/31996#discussion_r603779653 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2350,6 +2350,13 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled") Review comment: I think the question is if we should use the new interval types by default or not. If not, I'd prefer to use the ansi flag directly. If yes, I'd prefer to create a new legacy config and disable the legacy behavior by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
SparkQA removed a comment on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809818101 **[Test build #136675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136675/testReport)** for PR 31886 at commit [`2d891e9`](https://github.com/apache/spark/commit/2d891e925878c092eb24611899d6f0f2d13b9260). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries
SparkQA commented on pull request #31886: URL: https://github.com/apache/spark/pull/31886#issuecomment-809904904 **[Test build #136675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136675/testReport)** for PR 31886 at commit [`2d891e9`](https://github.com/apache/spark/commit/2d891e925878c092eb24611899d6f0f2d13b9260). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #31937: [SPARK-10816][SS] Support session window natively
HeartSaVioR commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-809904742 UPDATE: I've added test suite for MergingSessionIterator and updated the relevant PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements
SparkQA commented on pull request #31986: URL: https://github.com/apache/spark/pull/31986#issuecomment-809903940 **[Test build #136688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136688/testReport)** for PR 31986 at commit [`9c36bde`](https://github.com/apache/spark/commit/9c36bdea0ad10b59296ed1b1d002ad0cf8420867). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation
SparkQA commented on pull request #31984: URL: https://github.com/apache/spark/pull/31984#issuecomment-809903677 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41265/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809902845 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136687/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809902096 **[Test build #136687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)** for PR 31517 at commit [`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
AmplabJenkins commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809902845 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136687/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809902830 **[Test build #136687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)** for PR 31517 at commit [`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA removed a comment on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809901935 **[Test build #136685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)** for PR 31987 at commit [`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
AmplabJenkins removed a comment on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809902693 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136685/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
AmplabJenkins commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809902693 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136685/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809902679 **[Test build #136685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)** for PR 31987 at commit [`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-809902096 **[Test build #136687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)** for PR 31517 at commit [`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
SparkQA commented on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-809901965 **[Test build #136686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136686/testReport)** for PR 31974 at commit [`275524a`](https://github.com/apache/spark/commit/275524a34324b0848cccd3764038d783ba4be901). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session
SparkQA commented on pull request #31987: URL: https://github.com/apache/spark/pull/31987#issuecomment-809901935 **[Test build #136685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)** for PR 31987 at commit [`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AmplabJenkins removed a comment on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809901029 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41266/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs
AmplabJenkins removed a comment on pull request #31871: URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling
AmplabJenkins removed a comment on pull request #31932: URL: https://github.com/apache/spark/pull/31932#issuecomment-809901028 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136674/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
AmplabJenkins removed a comment on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809901032 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41263/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs
AmplabJenkins commented on pull request #31871: URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession
AmplabJenkins commented on pull request #31680: URL: https://github.com/apache/spark/pull/31680#issuecomment-809901032 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41263/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling
AmplabJenkins commented on pull request #31932: URL: https://github.com/apache/spark/pull/31932#issuecomment-809901028 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136674/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
AmplabJenkins commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809901029 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41266/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ben-manes commented on a change in pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
ben-manes commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r603774600 ## File path: core/src/test/scala/org/apache/spark/LocalCacheBenchmark.scala ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.util.Random + +import com.github.benmanes.caffeine.cache.{CacheLoader => CaffeineCacheLoader, Caffeine} +import com.github.benmanes.caffeine.guava.CaffeinatedGuava +import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache} + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} + +/** + * Benchmark for Guava Cache vs Caffeine. + * To run this benchmark: + * {{{ + * 1. without sbt: + * bin/spark-submit --class --jars + * 2. build/sbt "core/test:runMain " + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/test:runMain " + * Results will be written to "benchmarks/KryoBenchmark-results.txt". + * }}} + */ +object LocalCacheBenchmark extends BenchmarkBase { + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +runBenchmark("Loading Cache") { + val size = 1 + val parallelism = 8 + val guavaCacheConcurrencyLevel = 8 + val dataset = (1 to parallelism) +.map(_ => Random.shuffle(List.range(0, size))) +.map(list => list.map(i => TestData(i))) Review comment: I think your code is fine as is. Maybe just document the simplification? I mostly wanted to let you know since writing a good benchmark is hard, not that you should change it. Your code served its purpose, and you might not get much more out of improving it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction
MaxGekk commented on pull request #31996: URL: https://github.com/apache/spark/pull/31996#issuecomment-809900772 @cloud-fan @yaooqinn @AngersZh Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache
LuciferYang commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r603773282 ## File path: core/src/test/scala/org/apache/spark/LocalCacheBenchmark.scala ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.util.Random + +import com.github.benmanes.caffeine.cache.{CacheLoader => CaffeineCacheLoader, Caffeine} +import com.github.benmanes.caffeine.guava.CaffeinatedGuava +import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache} + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} + +/** + * Benchmark for Guava Cache vs Caffeine. + * To run this benchmark: + * {{{ + * 1. without sbt: + * bin/spark-submit --class --jars + * 2. build/sbt "core/test:runMain " + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/test:runMain " + * Results will be written to "benchmarks/KryoBenchmark-results.txt". + * }}} + */ +object LocalCacheBenchmark extends BenchmarkBase { + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +runBenchmark("Loading Cache") { + val size = 1 + val parallelism = 8 + val guavaCacheConcurrencyLevel = 8 + val dataset = (1 to parallelism) +.map(_ => Random.shuffle(List.range(0, size))) +.map(list => list.map(i => TestData(i))) Review comment: Thank you for your advice. I think we should avoid introducing more dependencies, so I'll try to implement this data generator in spark code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #31738: [SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct
HyukjinKwon closed pull request #31738: URL: https://github.com/apache/spark/pull/31738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31738: [SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct
HyukjinKwon commented on pull request #31738: URL: https://github.com/apache/spark/pull/31738#issuecomment-809898936 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics
SparkQA commented on pull request #30212: URL: https://github.com/apache/spark/pull/30212#issuecomment-809898473 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41266/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling
SparkQA removed a comment on pull request #31932: URL: https://github.com/apache/spark/pull/31932#issuecomment-809801061 **[Test build #136674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136674/testReport)** for PR 31932 at commit [`3c0e507`](https://github.com/apache/spark/commit/3c0e5077722d39b25680870ba9d435aafc62466c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling
SparkQA commented on pull request #31932: URL: https://github.com/apache/spark/pull/31932#issuecomment-809897848 **[Test build #136674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136674/testReport)** for PR 31932 at commit [`3c0e507`](https://github.com/apache/spark/commit/3c0e5077722d39b25680870ba9d435aafc62466c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class UnaryExpression extends Expression with UnaryLike[Expression] ` * `abstract class BinaryExpression extends Expression with BinaryLike[Expression] ` * `abstract class TernaryExpression extends Expression with TernaryLike[Expression] ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org