[GitHub] [spark] JkSelf commented on a change in pull request #31994: [SPARK-34899][SQL] Use origin plan if we can not coalesce shuffle partition

2021-03-29 Thread GitBox


JkSelf commented on a change in pull request #31994:
URL: https://github.com/apache/spark/pull/31994#discussion_r603801610



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1543,4 +1543,27 @@ class AdaptiveQueryExecSuite
 assert(materializeLogs(0).startsWith("Materialize query stage 
BroadcastQueryStageExec"))
 assert(materializeLogs(1).startsWith("Materialize query stage 
ShuffleQueryStageExec"))
   }
+
+  test("SPARK-34899: Use origin plan if we can not coalesce shuffle 
partition") {
+def check(ds: Dataset[Row], origin: ShuffleOrigin): Unit = {
+  ds.collect()
+  val plan = 
ds.queryExecution.executedPlan.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
+  assert(collect(plan) {
+case c: CustomShuffleReaderExec => c
+  }.isEmpty)
+  assert(collect(plan) {
+case s: ShuffleExchangeExec if s.shuffleOrigin == origin && 
s.numPartitions == 3 => s
+  }.size == 1)
+  checkAnswer(ds, testData)
+}
+
+withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true",
+  SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES.key -> "10",
+  SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM.key -> "1",
+  SQLConf.SHUFFLE_PARTITIONS.key -> "3") {
+  check(testData.repartition(), REPARTITION)

Review comment:
   It is better to add the partition size of each partition in the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809933240


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


AngersZh commented on a change in pull request #30212:
URL: https://github.com/apache/spark/pull/30212#discussion_r603801288



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -907,29 +907,74 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   }
 
   /**
-   * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan.
+   * Add an [[Aggregate]] to a logical plan.
*/
   private def withAggregationClause(
   ctx: AggregationClauseContext,
   selectExpressions: Seq[NamedExpression],
   query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
-val groupByExpressions = expressionList(ctx.groupingExpressions)
-
-if (ctx.GROUPING != null) {
-  // GROUP BY  GROUPING SETS (...)
-  val selectedGroupByExprs =
-ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
-  GroupingSets(selectedGroupByExprs.toSeq, groupByExpressions, query, 
selectExpressions)
-} else {
-  // GROUP BY  (WITH CUBE | WITH ROLLUP)?
-  val mappedGroupByExpressions = if (ctx.CUBE != null) {
-Seq(Cube(groupByExpressions))
-  } else if (ctx.ROLLUP != null) {
-Seq(Rollup(groupByExpressions))
+if (ctx.groupingExpressionsWithGroupingAnalytics.isEmpty) {
+  val groupByExpressions = expressionList(ctx.groupingExpressions)
+  if (ctx.GROUPING != null) {
+// GROUP BY  GROUPING SETS (...)
+// `GROUP BY warehouse, product GROUPING SETS((warehouse, producets), 
(warehouse))` is
+// semantically equivalent to `GROUP BY GROUPING SETS((warehouse, 
produce), (warehouse))`.
+// Under this grammar, the fields appearing in `GROUPING SETS`'s 
groupingSets must be a
+// subset of the columns appearing in group by expression.
+val groupingSets =
+  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
+Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)),
+  selectExpressions, query)
   } else {
-groupByExpressions
+// GROUP BY  (WITH CUBE | WITH ROLLUP)?
+val mappedGroupByExpressions = if (ctx.CUBE != null) {
+  Seq(Cube(groupByExpressions.map(Seq(_
+} else if (ctx.ROLLUP != null) {
+  Seq(Rollup(groupByExpressions.map(Seq(_
+} else {
+  groupByExpressions
+}
+Aggregate(mappedGroupByExpressions, selectExpressions, query)
+  }
+} else {
+  val groupByExpressions =
+ctx.groupingExpressionsWithGroupingAnalytics.asScala
+  .map(groupByExpr => {
+val groupingAnalytics = groupByExpr.groupingAnalytics
+if (groupingAnalytics != null) {
+  val groupingSets = groupingAnalytics.groupingSet.asScala
+.map(_.expression.asScala.map(e => expression(e)).toSeq)
+  if (groupingAnalytics.CUBE != null) {
+// CUBE(A, B, (A, B), ()) is not supported.
+if (groupingSets.exists(_.isEmpty)) {
+  throw new ParseException("Empty set in CUBE grouping sets is 
not supported.",
+groupingAnalytics)
+}
+Cube(groupingSets.toSeq)
+  } else if (groupingAnalytics.ROLLUP != null) {
+// ROLLUP(A, B, (A, B), ()) is not supported.
+if (groupingSets.exists(_.isEmpty)) {
+  throw new ParseException("Empty set in ROLLUP grouping sets 
is not supported.",
+groupingAnalytics)
+}
+Rollup(groupingSets.toSeq)
+  } else {
+assert(groupingAnalytics.GROUPING != null && 
groupingAnalytics.SETS != null)
+GroupingSets(groupingSets.toSeq,
+  groupingSets.flatten.distinct.toSeq)
+  }
+} else {
+  expression(groupByExpr.expression)
+}
+  })
+  val (groupingSet, expressions) = 
groupByExpressions.partition(_.isInstanceOf[GroupingSet])
+  if ((expressions.nonEmpty && groupingSet.nonEmpty) || groupingSet.size > 
1) {
+throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " +

Review comment:
   > could you handle the two error cases separately?
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809930044


   **[Test build #136694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)**
 for PR 31680 at commit 
[`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809933218


   **[Test build #136694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)**
 for PR 31680 at commit 
[`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809931402


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136677/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809930849


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136678/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809839890


   **[Test build #136677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136677/testReport)**
 for PR 31986 at commit 
[`3e8dd5c`](https://github.com/apache/spark/commit/3e8dd5ccd8c2c136f5c3a4ff64269267edbdf81e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809930044


   **[Test build #136694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136694/testReport)**
 for PR 31680 at commit 
[`7030e5c`](https://github.com/apache/spark/commit/7030e5c8e94c35c4731fa9aaddea0a55b696d50d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809931402


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136677/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType

2021-03-29 Thread GitBox


SparkQA commented on pull request #32001:
URL: https://github.com/apache/spark/pull/32001#issuecomment-809928523


   **[Test build #136693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136693/testReport)**
 for PR 32001 at commit 
[`b1cc8ce`](https://github.com/apache/spark/commit/b1cc8ce7fdd345ff841c676afd33cf2e47984588).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809930849


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136678/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


HyukjinKwon commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809930745


   @wangyum, don't worry. There are few more issues to address 
(https://github.com/apache/spark/pull/31886#discussion_r603307756). It will 
take more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


SparkQA commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809930442


   **[Test build #136677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136677/testReport)**
 for PR 31986 at commit 
[`3e8dd5c`](https://github.com/apache/spark/commit/3e8dd5ccd8c2c136f5c3a4ff64269267edbdf81e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class UpdatingSessionsExec(`
 * `class UpdatingSessionsIterator(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809839949


   **[Test build #136678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136678/testReport)**
 for PR 31886 at commit 
[`0a19585`](https://github.com/apache/spark/commit/0a195850bc0e7ad97efd245ebcfc438ea604e770).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


SparkQA commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809929804


   **[Test build #136678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136678/testReport)**
 for PR 31886 at commit 
[`0a19585`](https://github.com/apache/spark/commit/0a195850bc0e7ad97efd245ebcfc438ea604e770).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


AngersZh commented on a change in pull request #31680:
URL: https://github.com/apache/spark/pull/31680#discussion_r603797915



##
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
##
@@ -34,7 +34,7 @@ class HiveContext private[hive](_sparkSession: SparkSession)
   self =>
 
   def this(sc: SparkContext) = {
-
this(SparkSession.builder().sparkContext(HiveUtils.withHiveExternalCatalog(sc)).getOrCreate())

Review comment:
   > `withHiveExternalCatalog` is used only for tests after this merged, so 
could you move it into a suitable place?
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType

2021-03-29 Thread GitBox


AngersZh commented on pull request #32001:
URL: https://github.com/apache/spark/pull/32001#issuecomment-809928048


   FYI @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


wangyum commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809927983


   Please wait me a few hours. I will verify the result first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #32001: [SPARK-34902][SQL] Support cast between LongType & DayTimeIntervalType and IntegerType & YearMonthIntervalType

2021-03-29 Thread GitBox


AngersZh opened a new pull request #32001:
URL: https://github.com/apache/spark/pull/32001


   ### What changes were proposed in this pull request?
   Before this pr, if we want to cast LongType to DayTimeIntervalType will got 
error
   ```
   [info]   org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(a AS 
DAY-TIME INTERVAL)' due to data type mismatch: cannot cast bigint to day-time 
interval;
   [info] 'Project [cast(a#4L as day-time interval) AS b#6]
   [info] +- Project [value#1L AS a#4L]
   [info]+- LocalRelation [value#1L]
   ```
   
   Since DayTimeIntervalType store value as Long and  YearMonthIntervalType 
store value as Int, in this pr we support cast between them.
   
   ### Why are the changes needed?
   User can  cast between LongType & DayTimeIntervalType and IntegerType & 
YearMonthIntervalType
   
   ```
   SELECT cast(123L to day-time interval)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Support cast between LongType & DayTimeIntervalType and IntegerType & 
YearMonthIntervalType
   
   
   ### How was this patch tested?
   added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809926279


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809926265


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809926279


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-809926081


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41268/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


HyukjinKwon commented on a change in pull request #31886:
URL: https://github.com/apache/spark/pull/31886#discussion_r603795209



##
File path: .github/workflows/build_and_test.yml
##
@@ -428,3 +428,41 @@ jobs:
 - name: Build with SBT
   run: |
 ./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver 
-Phadoop-cloud -Pkinesis-asl -Phadoop-2.7 compile test:compile
+
+  tpcds-1g:
+name: Run TPC-DS queries with SF=1
+runs-on: ubuntu-20.04
+steps:
+- name: Checkout Spark repository
+  uses: actions/checkout@v2
+- name: Checkout TPC-DS (SF=1) generated data repository
+  uses: actions/checkout@v2
+  with:
+repository: maropu/spark-tpcds-sf-1

Review comment:
   Thanks @maropu. Just to clarify, do you need 
https://github.com/databricks/spark-sql-perf/pull/196 too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


SparkQA commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809926159


   **[Test build #136692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136692/testReport)**
 for PR 31680 at commit 
[`858390b`](https://github.com/apache/spark/commit/858390be282e313af43c3ac7c7ef9f87d4972b0a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-809926081


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41268/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint

2021-03-29 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809925995


   **[Test build #136691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136691/testReport)**
 for PR 31984 at commit 
[`7edd25c`](https://github.com/apache/spark/commit/7edd25c82e3919f4d7f738f39e3f147aa5a0a849).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-03-29 Thread GitBox


SparkQA commented on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-809926064


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41268/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809925942


   **[Test build #136690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136690/testReport)**
 for PR 31987 at commit 
[`4d9c724`](https://github.com/apache/spark/commit/4d9c724a20cb799e229b80133a6bfd8887990d5a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec

2021-03-29 Thread GitBox


SparkQA commented on pull request #32000:
URL: https://github.com/apache/spark/pull/32000#issuecomment-809925892


   **[Test build #136689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136689/testReport)**
 for PR 32000 at commit 
[`a5b32b4`](https://github.com/apache/spark/commit/a5b32b46b54c5b021e8f50f0c3759f9669f43fe8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31992:
URL: https://github.com/apache/spark/pull/31992#issuecomment-809925234


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136680/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809925235


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41269/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809925236


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41270/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809925237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136684/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-809925233


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136682/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809925235


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41269/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809925236


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41270/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809925237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136684/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31992:
URL: https://github.com/apache/spark/pull/31992#issuecomment-809925234


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136680/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-809925233


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136682/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


maropu commented on a change in pull request #31680:
URL: https://github.com/apache/spark/pull/31680#discussion_r603794244



##
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
##
@@ -34,7 +34,7 @@ class HiveContext private[hive](_sparkSession: SparkSession)
   self =>
 
   def this(sc: SparkContext) = {
-
this(SparkSession.builder().sparkContext(HiveUtils.withHiveExternalCatalog(sc)).getOrCreate())

Review comment:
   `withHiveExternalCatalog` is used only for tests after this merged, so 
could you move it into a suitable place?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-03-29 Thread GitBox


SparkQA commented on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-809922595


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41268/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809922449


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


SparkQA commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-80994


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41270/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809920635






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809881217


   **[Test build #136684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136684/testReport)**
 for PR 30212 at commit 
[`9b658b3`](https://github.com/apache/spark/commit/9b658b3ab4617b748b2e954a1ef30e67ea9abb22).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809920231


   **[Test build #136684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136684/testReport)**
 for PR 30212 at commit 
[`9b658b3`](https://github.com/apache/spark/commit/9b658b3ab4617b748b2e954a1ef30e67ea9abb22).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31992:
URL: https://github.com/apache/spark/pull/31992#issuecomment-809860209


   **[Test build #136680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136680/testReport)**
 for PR 31992 at commit 
[`02ad9de`](https://github.com/apache/spark/commit/02ad9de7ece49e294c618b4067d3cd5017ba68ca).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31992: [SPARK-34898][CORE] We should send SparkListenerExecutorMetricsUpdateEventLog of `driver` appropriately

2021-03-29 Thread GitBox


SparkQA commented on pull request #31992:
URL: https://github.com/apache/spark/pull/31992#issuecomment-809918966


   **[Test build #136680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136680/testReport)**
 for PR 31992 at commit 
[`02ad9de`](https://github.com/apache/spark/commit/02ad9de7ece49e294c618b4067d3cd5017ba68ca).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31983: [WIP][SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-29 Thread GitBox


maropu commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809918862


   cc: @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31983: [WIP][SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates

2021-03-29 Thread GitBox


maropu commented on pull request #31983:
URL: https://github.com/apache/spark/pull/31983#issuecomment-809918608


   branch-2.4/3.0/3.1 has the same issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-809862832


   **[Test build #136682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136682/testReport)**
 for PR 29642 at commit 
[`2310a69`](https://github.com/apache/spark/commit/2310a69cc30dda338e4c5c7f4d1ca2ca03371c30).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29642: [SPARK-32792][SQL] Improve InSet filter pushdown

2021-03-29 Thread GitBox


SparkQA commented on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-809917727


   **[Test build #136682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136682/testReport)**
 for PR 29642 at commit 
[`2310a69`](https://github.com/apache/spark/commit/2310a69cc30dda338e4c5c7f4d1ca2ca03371c30).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #31958: [SPARK-34862][SQL] Support nested column in ORC vectorized reader

2021-03-29 Thread GitBox


c21 commented on pull request #31958:
URL: https://github.com/apache/spark/pull/31958#issuecomment-809915080


   @cloud-fan and @viirya could you help take a look? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


maropu commented on a change in pull request #30212:
URL: https://github.com/apache/spark/pull/30212#discussion_r603786655



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -907,29 +907,74 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   }
 
   /**
-   * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan.
+   * Add an [[Aggregate]] to a logical plan.
*/
   private def withAggregationClause(
   ctx: AggregationClauseContext,
   selectExpressions: Seq[NamedExpression],
   query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
-val groupByExpressions = expressionList(ctx.groupingExpressions)
-
-if (ctx.GROUPING != null) {
-  // GROUP BY  GROUPING SETS (...)
-  val selectedGroupByExprs =
-ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
-  GroupingSets(selectedGroupByExprs.toSeq, groupByExpressions, query, 
selectExpressions)
-} else {
-  // GROUP BY  (WITH CUBE | WITH ROLLUP)?
-  val mappedGroupByExpressions = if (ctx.CUBE != null) {
-Seq(Cube(groupByExpressions))
-  } else if (ctx.ROLLUP != null) {
-Seq(Rollup(groupByExpressions))
+if (ctx.groupingExpressionsWithGroupingAnalytics.isEmpty) {
+  val groupByExpressions = expressionList(ctx.groupingExpressions)
+  if (ctx.GROUPING != null) {
+// GROUP BY  GROUPING SETS (...)
+// `GROUP BY warehouse, product GROUPING SETS((warehouse, producets), 
(warehouse))` is
+// semantically equivalent to `GROUP BY GROUPING SETS((warehouse, 
produce), (warehouse))`.
+// Under this grammar, the fields appearing in `GROUPING SETS`'s 
groupingSets must be a
+// subset of the columns appearing in group by expression.
+val groupingSets =
+  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
+Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)),
+  selectExpressions, query)
   } else {
-groupByExpressions
+// GROUP BY  (WITH CUBE | WITH ROLLUP)?
+val mappedGroupByExpressions = if (ctx.CUBE != null) {
+  Seq(Cube(groupByExpressions.map(Seq(_
+} else if (ctx.ROLLUP != null) {
+  Seq(Rollup(groupByExpressions.map(Seq(_
+} else {
+  groupByExpressions
+}
+Aggregate(mappedGroupByExpressions, selectExpressions, query)
+  }
+} else {
+  val groupByExpressions =
+ctx.groupingExpressionsWithGroupingAnalytics.asScala
+  .map(groupByExpr => {
+val groupingAnalytics = groupByExpr.groupingAnalytics
+if (groupingAnalytics != null) {
+  val groupingSets = groupingAnalytics.groupingSet.asScala
+.map(_.expression.asScala.map(e => expression(e)).toSeq)
+  if (groupingAnalytics.CUBE != null) {
+// CUBE(A, B, (A, B), ()) is not supported.
+if (groupingSets.exists(_.isEmpty)) {
+  throw new ParseException("Empty set in CUBE grouping sets is 
not supported.",
+groupingAnalytics)
+}
+Cube(groupingSets.toSeq)
+  } else if (groupingAnalytics.ROLLUP != null) {
+// ROLLUP(A, B, (A, B), ()) is not supported.
+if (groupingSets.exists(_.isEmpty)) {
+  throw new ParseException("Empty set in ROLLUP grouping sets 
is not supported.",
+groupingAnalytics)
+}
+Rollup(groupingSets.toSeq)
+  } else {
+assert(groupingAnalytics.GROUPING != null && 
groupingAnalytics.SETS != null)
+GroupingSets(groupingSets.toSeq,
+  groupingSets.flatten.distinct.toSeq)
+  }
+} else {
+  expression(groupByExpr.expression)
+}
+  })
+  val (groupingSet, expressions) = 
groupByExpressions.partition(_.isInstanceOf[GroupingSet])
+  if ((expressions.nonEmpty && groupingSet.nonEmpty) || groupingSet.size > 
1) {
+throw new ParseException("Partial CUBE/ROLLUP/GROUPING SETS like " +

Review comment:
   could you handle the two error cases separately?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-29 Thread GitBox


wangyum commented on a change in pull request #31984:
URL: https://github.com/apache/spark/pull/31984#discussion_r603786414



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##
@@ -247,13 +246,15 @@ object PartitionPruning extends Rule[LogicalPlan] with 
PredicateHelper with Join
 // otherwise the pruning will not trigger
 var partScan = getPartitionTableScan(l, left)
 if (partScan.isDefined && canPruneLeft(joinType) &&
-hasPartitionPruningFilter(right)) {
+hasPartitionPruningFilter(right) &&
+(canBroadcastBySize(right, conf) || 
hintToBroadcastRight(hint))) {

Review comment:
   Actually we already have a test:
   
https://github.com/apache/spark/blob/56b18386ab71566480777660f52d4913aa319a2f/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala#L1117
   Anyway, I added a new test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction

2021-03-29 Thread GitBox


MaxGekk commented on a change in pull request #31996:
URL: https://github.com/apache/spark/pull/31996#discussion_r603786193



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2350,6 +2350,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled")

Review comment:
   The question is more complex because there is no feature parity of new 
intervals with the existing one. We cannot disable `CalendarIntervalType` 
completely at the moment. Also the legacy config means that we cannot expose it 
to users and mention in our public docs. But I guess we will have to mention it 
in docs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec

2021-03-29 Thread GitBox


c21 commented on pull request #32000:
URL: https://github.com/apache/spark/pull/32000#issuecomment-809911684


   cc @HyukjinKwon could you help take a look when you have time, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #32000: [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec

2021-03-29 Thread GitBox


c21 opened a new pull request #32000:
URL: https://github.com/apache/spark/pull/32000


   
   
   ### What changes were proposed in this pull request?
   
   This PR is a followup change to address comments in 
https://github.com/apache/spark/pull/31413#discussion_r603280965 and 
https://github.com/apache/spark/pull/31413#discussion_r603296475 . Minor change 
in `FileSourceScanExec`. No actual logic change here.
   
   ### Why are the changes needed?
   
   Better readability.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction

2021-03-29 Thread GitBox


AngersZh commented on a change in pull request #31996:
URL: https://github.com/apache/spark/pull/31996#discussion_r603781518



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2350,6 +2350,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled")

Review comment:
   +1, Looks like only spark have type  `CalendarIntervalType`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction

2021-03-29 Thread GitBox


AngersZh commented on a change in pull request #31996:
URL: https://github.com/apache/spark/pull/31996#discussion_r603781518



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2350,6 +2350,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled")

Review comment:
   +1, Looks like only spark have type  `CalendarIntervalType`. In other 
engine like hive, presto etc, they use day time interval and year month 
interval too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809905612


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809905627


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136675/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809905612


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809905627


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136675/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-29 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809905601


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction

2021-03-29 Thread GitBox


cloud-fan commented on a change in pull request #31996:
URL: https://github.com/apache/spark/pull/31996#discussion_r603779653



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2350,6 +2350,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ANSI_INTERVALS_ENABLED = buildConf("spark.sql.ansi.intervals.enabled")

Review comment:
   I think the question is if we should use the new interval types by 
default or not. If not, I'd prefer to use the ansi flag directly. If yes, I'd 
prefer to create a new legacy config and disable the legacy behavior by default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809818101


   **[Test build #136675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136675/testReport)**
 for PR 31886 at commit 
[`2d891e9`](https://github.com/apache/spark/commit/2d891e925878c092eb24611899d6f0f2d13b9260).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31886: [SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries

2021-03-29 Thread GitBox


SparkQA commented on pull request #31886:
URL: https://github.com/apache/spark/pull/31886#issuecomment-809904904


   **[Test build #136675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136675/testReport)**
 for PR 31886 at commit 
[`2d891e9`](https://github.com/apache/spark/commit/2d891e925878c092eb24611899d6f0f2d13b9260).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-29 Thread GitBox


HeartSaVioR commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-809904742


   UPDATE: I've added test suite for MergingSessionIterator and updated the 
relevant PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31986: [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements

2021-03-29 Thread GitBox


SparkQA commented on pull request #31986:
URL: https://github.com/apache/spark/pull/31986#issuecomment-809903940


   **[Test build #136688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136688/testReport)**
 for PR 31986 at commit 
[`9c36bde`](https://github.com/apache/spark/commit/9c36bdea0ad10b59296ed1b1d002ad0cf8420867).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31984: [SPARK-34884][SQL] Improve dynamic partition pruning evaluation

2021-03-29 Thread GitBox


SparkQA commented on pull request #31984:
URL: https://github.com/apache/spark/pull/31984#issuecomment-809903677


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809902845


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136687/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809902096


   **[Test build #136687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)**
 for PR 31517 at commit 
[`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809902845


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136687/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809902830


   **[Test build #136687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)**
 for PR 31517 at commit 
[`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809901935


   **[Test build #136685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)**
 for PR 31987 at commit 
[`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809902693


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809902693


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136685/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809902679


   **[Test build #136685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)**
 for PR 31987 at commit 
[`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-809902096


   **[Test build #136687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136687/testReport)**
 for PR 31517 at commit 
[`3fad6ef`](https://github.com/apache/spark/commit/3fad6efa16ff78bd3c88a3d27164aa9e14e0f870).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-03-29 Thread GitBox


SparkQA commented on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-809901965


   **[Test build #136686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136686/testReport)**
 for PR 31974 at commit 
[`275524a`](https://github.com/apache/spark/commit/275524a34324b0848cccd3764038d783ba4be901).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31987: [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session

2021-03-29 Thread GitBox


SparkQA commented on pull request #31987:
URL: https://github.com/apache/spark/pull/31987#issuecomment-809901935


   **[Test build #136685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136685/testReport)**
 for PR 31987 at commit 
[`4f13b35`](https://github.com/apache/spark/commit/4f13b35ba740a2438795f877ac98d10b7eb357b2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809901029


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41266/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31932:
URL: https://github.com/apache/spark/pull/31932#issuecomment-809901028


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136674/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


AmplabJenkins removed a comment on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809901032


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41263/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31871: [SPARK-34779][CORE] ExecutoMetricsPoller should keep stage entry in stageTCMP until a heartbeat occurs

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31871:
URL: https://github.com/apache/spark/pull/31871#issuecomment-809901030


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136679/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31680: [SPARK-34568][SQL] We should respect enableHiveSupport when initialize SparkSession

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31680:
URL: https://github.com/apache/spark/pull/31680#issuecomment-809901032


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41263/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #31932:
URL: https://github.com/apache/spark/pull/31932#issuecomment-809901028


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136674/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


AmplabJenkins commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809901029


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41266/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ben-manes commented on a change in pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


ben-manes commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r603774600



##
File path: core/src/test/scala/org/apache/spark/LocalCacheBenchmark.scala
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import scala.util.Random
+
+import com.github.benmanes.caffeine.cache.{CacheLoader => CaffeineCacheLoader, 
Caffeine}
+import com.github.benmanes.caffeine.guava.CaffeinatedGuava
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+
+/**
+ * Benchmark for Guava Cache vs Caffeine.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars 
+ *   2. build/sbt "core/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/test:runMain "
+ *  Results will be written to "benchmarks/KryoBenchmark-results.txt".
+ * }}}
+ */
+object LocalCacheBenchmark extends BenchmarkBase {
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+runBenchmark("Loading Cache") {
+  val size = 1
+  val parallelism = 8
+  val guavaCacheConcurrencyLevel = 8
+  val dataset = (1 to parallelism)
+.map(_ => Random.shuffle(List.range(0, size)))
+.map(list => list.map(i => TestData(i)))

Review comment:
   I think your code is fine as is. Maybe just document the simplification? 
I mostly wanted to let you know since writing a good benchmark is hard, not 
that you should change it. Your code served its purpose, and you might not get 
much more out of improving it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #31996: [SPARK-34896][SQL] Return day-time interval from dates subtraction

2021-03-29 Thread GitBox


MaxGekk commented on pull request #31996:
URL: https://github.com/apache/spark/pull/31996#issuecomment-809900772


   @cloud-fan @yaooqinn @AngersZh Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-03-29 Thread GitBox


LuciferYang commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r603773282



##
File path: core/src/test/scala/org/apache/spark/LocalCacheBenchmark.scala
##
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import scala.util.Random
+
+import com.github.benmanes.caffeine.cache.{CacheLoader => CaffeineCacheLoader, 
Caffeine}
+import com.github.benmanes.caffeine.guava.CaffeinatedGuava
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+
+/**
+ * Benchmark for Guava Cache vs Caffeine.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars 
+ *   2. build/sbt "core/test:runMain "
+ *   3. generate result:
+ *  SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/test:runMain "
+ *  Results will be written to "benchmarks/KryoBenchmark-results.txt".
+ * }}}
+ */
+object LocalCacheBenchmark extends BenchmarkBase {
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+runBenchmark("Loading Cache") {
+  val size = 1
+  val parallelism = 8
+  val guavaCacheConcurrencyLevel = 8
+  val dataset = (1 to parallelism)
+.map(_ => Random.shuffle(List.range(0, size)))
+.map(list => list.map(i => TestData(i)))

Review comment:
   Thank you for your advice. I think we should avoid introducing more 
dependencies, so I'll try to implement this data generator in spark code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #31738: [SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct

2021-03-29 Thread GitBox


HyukjinKwon closed pull request #31738:
URL: https://github.com/apache/spark/pull/31738


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31738: [SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct

2021-03-29 Thread GitBox


HyukjinKwon commented on pull request #31738:
URL: https://github.com/apache/spark/pull/31738#issuecomment-809898936


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30212: [SPARK-33308][SQL] Refactor current grouping analytics

2021-03-29 Thread GitBox


SparkQA commented on pull request #30212:
URL: https://github.com/apache/spark/pull/30212#issuecomment-809898473


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41266/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling

2021-03-29 Thread GitBox


SparkQA removed a comment on pull request #31932:
URL: https://github.com/apache/spark/pull/31932#issuecomment-809801061


   **[Test build #136674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136674/testReport)**
 for PR 31932 at commit 
[`3c0e507`](https://github.com/apache/spark/commit/3c0e5077722d39b25680870ba9d435aafc62466c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31932: [WIP] Introduce specialized traits for TreeNode children handling

2021-03-29 Thread GitBox


SparkQA commented on pull request #31932:
URL: https://github.com/apache/spark/pull/31932#issuecomment-809897848


   **[Test build #136674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136674/testReport)**
 for PR 31932 at commit 
[`3c0e507`](https://github.com/apache/spark/commit/3c0e5077722d39b25680870ba9d435aafc62466c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `abstract class UnaryExpression extends Expression with 
UnaryLike[Expression] `
 * `abstract class BinaryExpression extends Expression with 
BinaryLike[Expression] `
 * `abstract class TernaryExpression extends Expression with 
TernaryLike[Expression] `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >