[GitHub] [spark] itholic opened a new pull request, #38420: [SPARK-40947][PS][INFRA] Upgrade pandas to 1.5.1

2022-10-27 Thread GitBox


itholic opened a new pull request, #38420:
URL: https://github.com/apache/spark/pull/38420

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR proposes upgrading pandas to 1.5.1, for pandas API on Spark.
   
   New version of pandas (1.5.1) was released last week (Oct 19, 2022).
   
   See [What's new in 
1.5.1](https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.5.1.html) for more 
detail.
   
   ### Why are the changes needed?
   
   
   We should follow the behavior of latest pandas, and support it.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   The existing tests should all pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox


zhengruifeng commented on code in PR #38395:
URL: https://github.com/apache/spark/pull/38395#discussion_r100764


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2100,3 +2100,53 @@ object AsOfJoin {
 }
   }
 }
+
+
+/**
+ * A logical plan for summary.
+ */
+case class Summary(

Review Comment:
   > But we should probably name it UnresolvedSummary
   
   maybe we can create a `UnresolvedDataFrameFunction` instead?
   we can put a group of functions (`summary`, `sampleBy`, `freqItem`, 
`crosstab`) in it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38413: [SPARK-40936][SQL][TESTS] Refactor `AnalysisTest#assertAnalysisErrorClass` by reusing the `SparkFunSuite#checkError`

2022-10-27 Thread GitBox


MaxGekk commented on PR #38413:
URL: https://github.com/apache/spark/pull/38413#issuecomment-1294484128

   Waiting for Ci.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #37219: [WIP][SPARK-39794][PYTHON] Introduce parametric singleton for DataType

2022-10-27 Thread GitBox


itholic commented on PR #37219:
URL: https://github.com/apache/spark/pull/37219#issuecomment-1294481459

   Is this still in progress? Just for confirming


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox


zhengruifeng commented on code in PR #38395:
URL: https://github.com/apache/spark/pull/38395#discussion_r100764


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2100,3 +2100,53 @@ object AsOfJoin {
 }
   }
 }
+
+
+/**
+ * A logical plan for summary.
+ */
+case class Summary(

Review Comment:
   > But we should probably name it UnresolvedSummary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

2022-10-27 Thread GitBox


dongjoon-hyun commented on code in PR #38397:
URL: https://github.com/apache/spark/pull/38397#discussion_r1007642185


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
##
@@ -173,6 +181,22 @@ class ParquetFileFormat
 val datetimeRebaseModeInRead = parquetOptions.datetimeRebaseModeInRead
 val int96RebaseModeInRead = parquetOptions.int96RebaseModeInRead
 
+// Should always be set by FileSourceScanExec creating this.
+// Check conf before checking option, to allow working around an issue by 
changing conf.
+val returningBatch = 
sparkSession.sessionState.conf.parquetVectorizedReaderEnabled &&
+  options.get(FileFormat.OPTION_RETURNING_BATCH)
+.getOrElse {
+  throw new IllegalArgumentException(
+"OPTION_RETURNING_BATCH should always be set for 
ParquetFileFormat." +

Review Comment:
   Ditto. nit. Add one more space at the end of the message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

2022-10-27 Thread GitBox


dongjoon-hyun commented on code in PR #38397:
URL: https://github.com/apache/spark/pull/38397#discussion_r1007641451


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala:
##
@@ -126,9 +136,24 @@ class OrcFileFormat
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
-val enableVectorizedReader = supportBatch(sparkSession, resultSchema)
 val capacity = sqlConf.orcVectorizedReaderBatchSize
 
+// Should always be set by FileSourceScanExec creating this.
+// Check conf before checking option, to allow working around an issue by 
changing conf.
+val enableVectorizedReader = sqlConf.orcVectorizedReaderEnabled &&
+  options.get(FileFormat.OPTION_RETURNING_BATCH)
+.getOrElse {
+  throw new IllegalArgumentException(
+"OPTION_RETURNING_BATCH should always be set for OrcFileFormat." +
+  "To workaround this issue, set 
spark.sql.orc.enableVectorizedReader=false.")

Review Comment:
   Is this a correct recommendation? Why not recommend to set 
`OPTION_RETURNING_BATCH`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

2022-10-27 Thread GitBox


dongjoon-hyun commented on code in PR #38397:
URL: https://github.com/apache/spark/pull/38397#discussion_r1007640461


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala:
##
@@ -126,9 +136,24 @@ class OrcFileFormat
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
-val enableVectorizedReader = supportBatch(sparkSession, resultSchema)
 val capacity = sqlConf.orcVectorizedReaderBatchSize
 
+// Should always be set by FileSourceScanExec creating this.
+// Check conf before checking option, to allow working around an issue by 
changing conf.
+val enableVectorizedReader = sqlConf.orcVectorizedReaderEnabled &&
+  options.get(FileFormat.OPTION_RETURNING_BATCH)
+.getOrElse {
+  throw new IllegalArgumentException(
+"OPTION_RETURNING_BATCH should always be set for OrcFileFormat." +

Review Comment:
   nit. Add one space at the end?
   ```
   "OPTION_RETURNING_BATCH should always be set for OrcFileFormat." +
   "OPTION_RETURNING_BATCH should always be set for OrcFileFormat. " +
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #37955:
URL: https://github.com/apache/spark/pull/37955#issuecomment-1294469606

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


cloud-fan closed pull request #38406: [SPARK-40926][CONNECT] Refactor server 
side tests to only use DataFrame API
URL: https://github.com/apache/spark/pull/38406


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


cloud-fan commented on PR #38406:
URL: https://github.com/apache/spark/pull/38406#issuecomment-1294467993

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38417:
URL: https://github.com/apache/spark/pull/38417#issuecomment-1294464841

   Thank you, @Yikun .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox


MaxGekk closed pull request #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when 
input has odd number of symbols
URL: https://github.com/apache/spark/pull/38416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38398: [SPARK-39778][SQL] Improve error classes and messages

2022-10-27 Thread GitBox


MaxGekk closed pull request #38398: [SPARK-39778][SQL] Improve error classes 
and messages
URL: https://github.com/apache/spark/pull/38398


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38398: [SPARK-39778][SQL] Improve error classes and messages

2022-10-27 Thread GitBox


MaxGekk commented on PR #38398:
URL: https://github.com/apache/spark/pull/38398#issuecomment-1294464168

   Merging to master. Thank you, @itholic @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chenminghua8 closed pull request #38381: [SPARK-40793][SQL] Fix the LogicalRelation computeStats for Row-level Runtime Filtering cannot be applied

2022-10-27 Thread GitBox


chenminghua8 closed pull request #38381: [SPARK-40793][SQL] Fix the 
LogicalRelation computeStats for Row-level Runtime Filtering cannot be applied
URL: https://github.com/apache/spark/pull/38381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox


MaxGekk commented on PR #38416:
URL: https://github.com/apache/spark/pull/38416#issuecomment-1294459349

   All GAs passed:
   https://user-images.githubusercontent.com/1580697/198506375-f36f3e73-b0a8-4d2e-a3f8-9fb49ba453c3.png;>
   Merging to 3.3. Thank you, @vitaliili-db.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #38336: [SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

2022-10-27 Thread GitBox


cloud-fan commented on PR #38336:
URL: https://github.com/apache/spark/pull/38336#issuecomment-1294436600

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #38336: [SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

2022-10-27 Thread GitBox


cloud-fan closed pull request #38336: [SPARK-40862][SQL] Support non-aggregated 
subqueries in RewriteCorrelatedScalarSubquery
URL: https://github.com/apache/spark/pull/38336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


LuciferYang commented on PR #38413:
URL: https://github.com/apache/spark/pull/38413#issuecomment-1294431154

   @MaxGekk 
   
   Refactor `assertAnalysisErrorClass` method:
   
   - Reuse `checkError` in `assertAnalysisErrorClass` 
   - Use `queryContext` instead of `line + pos` in `assertAnalysisErrorClass`  
method  signature
   - Fixed related tests
   
   Please review this  if you have time, thanks ~
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #38413: [WIP][SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


LuciferYang commented on code in PR #38413:
URL: https://github.com/apache/spark/pull/38413#discussion_r1007582019


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala:
##
@@ -716,9 +714,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
 assertAnalysisErrorClass(parsePlan("WITH t(x) AS (SELECT 1) SELECT * FROM 
t WHERE y = 1"),
   "UNRESOLVED_COLUMN.WITH_SUGGESTION",
   Map("objectName" -> "`y`", "proposal" -> "`t`.`x`"),
-  caseSensitive = true,
-  line = -1,
-  pos = -1)
+  Array(ExpectedContext("y", 46, 46))

Review Comment:
   
https://github.com/apache/spark/blob/cf086b10de784fc92ae8b4d16065823ace520a7a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala#L319-L332
   
   This change is due to the `checkError` method will perform a forced check 
when `actualQueryContext` is not empty. If we can to relax some check 
conditions,  can add a precondition `queryContext.nonEmpty` for the 
`queryContext` check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox


zhengruifeng commented on PR #37955:
URL: https://github.com/apache/spark/pull/37955#issuecomment-1294400941

   Merged into master, thank you @itholic for doing this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox


zhengruifeng closed pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] 
Upgrade pandas to 1.5.0
URL: https://github.com/apache/spark/pull/37955


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

2022-10-27 Thread GitBox


Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1294391023

   also cc @holdenk @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-10-27 Thread GitBox


AngersZh commented on PR #35594:
URL: https://github.com/apache/spark/pull/35594#issuecomment-1294389484

   ping @cloud-fan @yaooqinn @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


HeartSaVioR commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007564839


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SessionWindow.scala:
##
@@ -68,11 +68,29 @@ case class SessionWindow(timeColumn: Expression, 
gapDuration: Expression) extend
   with Unevaluable
   with NonSQLExpression {
 
+  private def inputTypeOnTimeColumn: AbstractDataType = {

Review Comment:
   Nope. Just for future visibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #38411: [SPARK-40933][SQL] Make df.stat.{cov, corr} consistent with sql functions

2022-10-27 Thread GitBox


zhengruifeng commented on PR #38411:
URL: https://github.com/apache/spark/pull/38411#issuecomment-1294384281

   @HyukjinKwon as far as I know, they are the seem now. 
   
   the original tests didn't cover null handling and empty dataset, I add a new 
UT to make sure no behavior change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38406:
URL: https://github.com/apache/spark/pull/38406#discussion_r1007558507


##
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
 }
 
proto.Relation.newBuilder().setLocalRelation(localRelationBuilder.build()).build()
   }
+
+  private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
+val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
+analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
+EliminateSubqueryAliases(connectAnalyzed)

Review Comment:
   There is no issue after removing it. I pushed a commit to remove it anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007556667


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.
+  Alias alias = 200;
+
+  // Relation alias.
+  message Alias {

Review Comment:
   default is empty string...
   
   I really don't like the way of proto for this..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007479005


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SessionWindow.scala:
##
@@ -68,11 +68,29 @@ case class SessionWindow(timeColumn: Expression, 
gapDuration: Expression) extend
   with Unevaluable
   with NonSQLExpression {
 
+  private def inputTypeOnTimeColumn: AbstractDataType = {

Review Comment:
   Sorry, are you asking for anything actionable?



##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.

Review Comment:
   Good point. Added late records



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38406:
URL: https://github.com/apache/spark/pull/38406#discussion_r100731


##
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
 }
 
proto.Relation.newBuilder().setLocalRelation(localRelationBuilder.build()).build()
   }
+
+  private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
+val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
+analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
+EliminateSubqueryAliases(connectAnalyzed)

Review Comment:
   Did we hit any issues in this test suite without doing it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38406:
URL: https://github.com/apache/spark/pull/38406#discussion_r100731


##
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
 }
 
proto.Relation.newBuilder().setLocalRelation(localRelationBuilder.build()).build()
   }
+
+  private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
+val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
+analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
+EliminateSubqueryAliases(connectAnalyzed)

Review Comment:
   Did we hit any issues if not doing it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007555285


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.
+  Alias alias = 200;
+
+  // Relation alias.
+  message Alias {

Review Comment:
   oh, I thought the default value of string is null...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007553459


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.
+  Alias alias = 200;
+
+  // Relation alias.
+  message Alias {

Review Comment:
   do you think if we need to care client  call `xx.as("")`? Does client side 
should reject this? If so we can just use a string.
   
   It's always a matter of if we need to know a field set or not set or default 
value, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007553459


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.
+  Alias alias = 200;
+
+  // Relation alias.
+  message Alias {

Review Comment:
   do you think if we need to care client  call `xx.as("")`? Does client side 
should reject this? If so we can just use a string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007552036


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.
+  Alias alias = 200;
+
+  // Relation alias.
+  message Alias {

Review Comment:
   why can't we use string directly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38415:
URL: https://github.com/apache/spark/pull/38415#discussion_r1007551962


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -47,6 +47,13 @@ message Relation {
 
 Unknown unknown = 999;
   }
+  // Optional. Every relation might have an alias.

Review Comment:
   I don't have a strong opinion, but not sure which one is better. We can also 
follow catalyst, and add a new plan `SubqueryAlias(child, alias)` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38395:
URL: https://github.com/apache/spark/pull/38395#discussion_r1007548886


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2100,3 +2100,53 @@ object AsOfJoin {
 }
   }
 }
+
+
+/**
+ * A logical plan for summary.
+ */
+case class Summary(

Review Comment:
   I see, by using a logical plan, we delay the analysis until the spark 
connect planner constructs the full plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shardulm94 commented on pull request #37479: [SPARK-40045][SQL]Optimize the order of filtering predicates

2022-10-27 Thread GitBox


shardulm94 commented on PR #37479:
URL: https://github.com/apache/spark/pull/37479#issuecomment-1294365611

   @caican00 Do you think this PR is ready for another round of review? In our 
organization, we have seen a number of users impacted by this after migration 
to DSv2, so it would be nice to get this merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #38361: [SPARK-40892][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls

2022-10-27 Thread GitBox


HeartSaVioR closed pull request #38361: [SPARK-40892][SQL][SS] Loosen the 
requirement of window_time rule - allow multiple window_time calls
URL: https://github.com/apache/spark/pull/38361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #38361: [SPARK-40892][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls

2022-10-27 Thread GitBox


HeartSaVioR commented on PR #38361:
URL: https://github.com/apache/spark/pull/38361#issuecomment-1294361918

   Thanks @cloud-fan ! Given this PR stayed for 4 days and no feedback so far, 
I'm merging this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 10.0.0

2022-10-27 Thread GitBox


LuciferYang commented on PR #38369:
URL: https://github.com/apache/spark/pull/38369#issuecomment-1294357109

   Thanks @dongjoon-hyun @itholic @bjornjorgensen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox


zhengruifeng commented on code in PR #38395:
URL: https://github.com/apache/spark/pull/38395#discussion_r1007535733


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2100,3 +2100,53 @@ object AsOfJoin {
 }
   }
 }
+
+
+/**
+ * A logical plan for summary.
+ */
+case class Summary(

Review Comment:
   if we use `df.summary` here, an analysis exception will be thrown 
immediately; if we use a new unresolved plan, I guess the exception will not be 
thrown before job execution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-10-27 Thread GitBox


cloud-fan commented on code in PR #38418:
URL: https://github.com/apache/spark/pull/38418#discussion_r1007527891


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -2813,13 +2813,41 @@ class AstBuilder extends 
SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
   ctx: CreateOrReplaceTableColTypeContext): StructField = withOrigin(ctx) {
 import ctx._
 
+// Check that no duplicates exist among any CREATE TABLE column options 
specified.
+var isNotNull: Option[Boolean] = None

Review Comment:
   seems this can be a boolean, starting with false, and set to true if NOT 
NULL is present.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox


itholic commented on PR #37955:
URL: https://github.com/apache/spark/pull/37955#issuecomment-1294313246

   CI passed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


Yikun commented on PR #38417:
URL: https://github.com/apache/spark/pull/38417#issuecomment-1294284723

   Thanks, late LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox


HyukjinKwon commented on PR #38419:
URL: https://github.com/apache/spark/pull/38419#issuecomment-1294277396

   cc @wangyum FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun closed pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s 
Dockerfile by default and remove `Dockerfile.java17`
URL: https://github.com/apache/spark/pull/38417


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38417:
URL: https://github.com/apache/spark/pull/38417#issuecomment-1294270084

   Thank you so much, @viirya . Merged to master for Apache Spark 3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38417:
URL: https://github.com/apache/spark/pull/38417#issuecomment-1294267607

   Could you review this, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37048: [SPARK-39655][CORE] Add a config to limit the number of RDD partitions

2022-10-27 Thread GitBox


github-actions[bot] closed pull request #37048: [SPARK-39655][CORE] Add a 
config to limit the number of RDD partitions
URL: https://github.com/apache/spark/pull/37048


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #37219: [WIP][SPARK-39794][PYTHON] Introduce parametric singleton for DataType

2022-10-27 Thread GitBox


github-actions[bot] commented on PR #37219:
URL: https://github.com/apache/spark/pull/37219#issuecomment-1294261496

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a diff in pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'

2022-10-27 Thread GitBox


vinodkc commented on code in PR #38263:
URL: https://github.com/apache/spark/pull/38263#discussion_r1007474647


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3950,6 +3950,14 @@ object SQLConf {
 .checkValues(ErrorMessageFormat.values.map(_.toString))
 .createWithDefault(ErrorMessageFormat.PRETTY.toString)
 
+  val SPARK_MASKING_ALGO = buildConf("spark.sql.masking.algo")

Review Comment:
   I was trying to follow a similar property value approach as in 
[HiveConf](https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4487)
 ,  where SHA algorithm type is used to decide whether FIPS Mode is enabled or 
not.
   Do you still recommend to change it to boolean property 
`spark.sql.masking.algo.fipsModeEnabled` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-10-27 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1294245157

   @dtenedor , yes, please close yours as a dup. I appreciate your help in 
reviewing this PR and on top of this change, I'm planning to  add additional 
built-in  mask functions  supported in Hive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox


vinodkc commented on PR #38419:
URL: https://github.com/apache/spark/pull/38419#issuecomment-1294221419

   @HyukjinKwon , @dongjoon-hyun , Can you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc opened a new pull request, #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox


vinodkc opened a new pull request, #38419:
URL: https://github.com/apache/spark/pull/38419

   
   
   ### What changes were proposed in this pull request?
   This PR implements the built-in function `TRUNC`to truncate numbers to the 
previous integer or decimal. It optionally accepts a second integer argument to 
indicate the number of decimal places for rounding, in either direction. When 
the second argument is absent, the function rounds to the nearest whole number. 
When the second argument is specified, the function rounds to the nearest 
number with n decimal places of precision. 
   
   
   
   
   
   
   ### Why are the changes needed?
   To support truncate numbers
   eg: 
   trunc(123.4382) -> 123
   trunc(-123.4382) -> -123
   trunc(123.4382, 2) -> 123.43
   trunc(123.4382, 1) -> 123.4
   trunc(123.4382, 0) -> 123
   trunc(123.4382, -1) -> 120
   trunc(123.4382, -2) -> 100
   trunc(123.4382, -3) -> 0
   
   The same feature is available in Hive, PostgreSQL, MySQL, Oracle, Redshift, 
   **Some references** :
   PostgreSQL: https://www.postgresql.org/docs/15/functions-math.html
   AWS Redshift: https://docs.aws.amazon.com/redshift/latest/dg/r_TRUNC.html
   Presto : 
https://prestodb.io/docs/current/functions/math.html?highlight=trunc#id3
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes, new built-in function 'trunc' is added. Same function has an overloaded 
version to trucate Date values. 
   ### How was this patch tested?
   
   Added test cases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rahulsmahadev commented on a diff in pull request #38404: [WIP] Replace Where

2022-10-27 Thread GitBox


rahulsmahadev commented on code in PR #38404:
URL: https://github.com/apache/spark/pull/38404#discussion_r1007450737


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1276,6 +1276,24 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   AppendData.byPosition(r, query)
 } else if (conf.partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC) {
   OverwritePartitionsDynamic.byPosition(r, query)
+} else if (i.replacePredicates.nonEmpty) {
+  def findAttrInRelation(name: String): Attribute = {

Review Comment:
   we should check with OSS reviewers(once this PR is ready) if there is some 
existing method for this, so that we don't need to duplicate code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor opened a new pull request, #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-10-27 Thread GitBox


dtenedor opened a new pull request, #38418:
URL: https://github.com/apache/spark/pull/38418

   ### What changes were proposed in this pull request?
   
   Relax ordering constraint for CREATE TABLE column options.
   
   Before this PR, the grammar for each CREATE TABLE column was:
   
   ```
   createOrReplaceTableColType
   : colName=errorCapturingIdentifier dataType (NOT NULL)? 
defaultExpression? commentSpec?
   ;
   ```
   
   ### Why are the changes needed?
   
   There was a constraint on the order of: `(NOT NULL, DEFAULT value, COMMENT 
value)`. This PR updates the grammar to allow these options in any order 
instead, to improve usability.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the SQL syntax updates slightly.
   
   ### How was this patch tested?
   
   Existing parser tests plus a new unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] carlfu-db commented on pull request #38404: [WIP] Replace Where

2022-10-27 Thread GitBox


carlfu-db commented on PR #38404:
URL: https://github.com/apache/spark/pull/38404#issuecomment-1294192968

   > Mind adding a test, filing a JIRA, etc? See also 
https://spark.apache.org/contributing.html
   
   Will do. Still in progress :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


HeartSaVioR commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007411825


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.

Review Comment:
   Although not mandatory as other test case contains the verification. Just 
something "good to have".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #38277: [SPARK-40815][SQL] Introduce DelegateSymlinkTextInputFormat to handle empty splits when "spark.hadoopRDD.ignoreEmptySplits" is enabled in ord

2022-10-27 Thread GitBox


sunchao commented on PR #38277:
URL: https://github.com/apache/spark/pull/38277#issuecomment-129460

   @sadikovi sorry for the delay, will take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadikovi commented on pull request #38277: [SPARK-40815][SQL] Introduce DelegateSymlinkTextInputFormat to handle empty splits when "spark.hadoopRDD.ignoreEmptySplits" is enabled in or

2022-10-27 Thread GitBox


sadikovi commented on PR #38277:
URL: https://github.com/apache/spark/pull/38277#issuecomment-1294089945

   @sunchao @dongjoon-hyun Could you take another look? Thanks. I have 
addressed your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007394123


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.
+AddData(inputData, 30, 31),
+// op1 W (29, 29)
+// agg: [25, 30) 5 [30, 35) 2
+// output: None
+// state: [25, 30) 5 [30, 35) 2
+// op2 W (29, 29)
+// agg: None
+// output: None
+// state: [20, 30) (1, 5)
+
+// no-data batch triggered
+
+// op1 W (29, 31)
+// agg: None
+// output: [25, 30) 5
+// state: [30, 35) 2
+// op2 W (29, 31)
+// agg: [20, 30) (1, 5)
+// output: [20, 30) (1, 5)
+// state: None
+CheckNewAnswer((20, 2, 10)),
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0))
+  )
+}
+  }
+
+  test("agg -> agg -> agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+   

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


HeartSaVioR commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007386332


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1941,6 +1941,22 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val STATEFUL_OPERATOR_ALLOW_MULTIPLE =
+buildConf("spark.sql.streaming.statefulOperator.allowMultiple")
+  .internal()
+  .doc("When true, multiple stateful operators are allowed to be present 
in a streaming " +
+"pipeline. The support for multiple stateful operators introduces a 
minor (semantically " +
+"correct) change in respect to late record filtering - late records 
are detected and " +
+"filtered in respect to the watermark from the previous microbatch 
instead of the " +
+"current one. This is a behavior change for Spark streaming pipelines 
and we allow " +

Review Comment:
   Yes, we will have a bunch of errors in test suites when we disable no-data 
batch. Most test cases are assuming that no-data batch always happens.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


HeartSaVioR commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007383463


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.

Review Comment:
   I meant adding a new batch (or existing batch) for testing late events.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007369668


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.
+AddData(inputData, 30, 31),
+// op1 W (29, 29)
+// agg: [25, 30) 5 [30, 35) 2
+// output: None
+// state: [25, 30) 5 [30, 35) 2
+// op2 W (29, 29)
+// agg: None
+// output: None
+// state: [20, 30) (1, 5)
+
+// no-data batch triggered
+
+// op1 W (29, 31)
+// agg: None
+// output: [25, 30) 5
+// state: [30, 35) 2
+// op2 W (29, 31)
+// agg: [20, 30) (1, 5)
+// output: [20, 30) (1, 5)
+// state: None
+CheckNewAnswer((20, 2, 10)),
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0))
+  )
+}
+  }
+
+  test("agg -> agg -> agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+   

[GitHub] [spark] dongjoon-hyun closed pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 10.0.0

2022-10-27 Thread GitBox


dongjoon-hyun closed pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 
10.0.0
URL: https://github.com/apache/spark/pull/38369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38417:
URL: https://github.com/apache/spark/pull/38417#issuecomment-1294047475

   cc @Yikun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007348475


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)
+// agg: [20, 30) (1, 5)
+// output: None
+// state: [20, 30) (1, 5)
+CheckNewAnswer(),
+assertNumStateRows(Seq(1, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+// Move the watermark.
+AddData(inputData, 30, 31),
+// op1 W (29, 29)
+// agg: [25, 30) 5 [30, 35) 2
+// output: None
+// state: [25, 30) 5 [30, 35) 2
+// op2 W (29, 29)
+// agg: None
+// output: None
+// state: [20, 30) (1, 5)
+
+// no-data batch triggered
+
+// op1 W (29, 31)
+// agg: None
+// output: [25, 30) 5
+// state: [30, 35) 2
+// op2 W (29, 31)
+// agg: [20, 30) (1, 5)
+// output: [20, 30) (1, 5)
+// state: None
+CheckNewAnswer((20, 2, 10)),
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0))
+  )
+}
+  }
+
+  test("agg -> agg -> agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+   

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox


dongjoon-hyun opened a new pull request, #38417:
URL: https://github.com/apache/spark/pull/38417

   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007325252


##
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##
@@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.execution.streaming.state.StateStore
+import org.apache.spark.sql.functions._
+
+// Tests for the multiple stateful operators support.
+class MultiStatefulOperatorsSuite
+  extends StreamTest with StateStoreMetricsTest with BeforeAndAfter {
+
+  import testImplicits._
+
+  before {
+SparkSession.setActiveSession(spark) // set this before force initializing 
'joinExec'
+spark.streams.stateStoreCoordinator // initialize the lazy coordinator
+  }
+
+  after {
+StateStore.stop()
+  }
+
+  test("window agg -> window agg, append mode") {
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") {
+  val inputData = MemoryStream[Int]
+
+  val stream = inputData.toDF()
+.withColumn("eventTime", timestamp_seconds($"value"))
+.withWatermark("eventTime", "0 seconds")
+.groupBy(window($"eventTime", "5 seconds") as 'window)
+.agg(count("*") as 'count)
+.groupBy(window($"window", "10 seconds"))
+.agg(count("*") as 'count, sum("count") as 'sum)
+.select($"window".getField("start").cast("long").as[Long],
+  $"count".as[Long], $"sum".as[Long])
+
+  testStream(stream)(
+AddData(inputData, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
+// op1 W (0, 0)
+// agg: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// output: None
+// state: [10, 15) 5, [15, 20) 5, [20, 25) 2
+// op2 W (0, 0)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (0, 21)
+// agg: None
+// output: [10, 15) 5, [15, 20) 5
+// state: [20, 25) 2
+// op2 W (0, 21)
+// agg: [10, 20) (2, 10)
+// output: [10, 20) (2, 10)
+// state: None
+CheckNewAnswer((10, 2, 10)), // W (21, 21) [20, 25) 2 ... W (0, 21) 
None // [10, 20) 2
+assertNumStateRows(Seq(0, 1)),
+assertNumRowsDroppedByWatermark(Seq(0, 0)),
+
+AddData(inputData, 22, 23, 24, 25, 26, 27, 28, 29),
+// op1 W (21, 21)
+// agg: [20, 25) 5, [25, 30) 4
+// output: None
+// state: [20, 25) 5, [25, 30) 4
+// op2 W (21, 21)
+// agg: None
+// output: None
+// state: None
+
+// no-data batch triggered
+
+// op1 W (21, 29)
+// agg: None
+// output: [20, 25) 5
+// state: [25, 30) 4
+// op2 W (20, 25)

Review Comment:
   Good catch, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38406:
URL: https://github.com/apache/spark/pull/38406#discussion_r1007318294


##
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
 }
 
proto.Relation.newBuilder().setLocalRelation(localRelationBuilder.build()).build()
   }
+
+  private def analyzePlan(plan: LogicalPlan): LogicalPlan = {
+val connectAnalyzed = analysis.SimpleAnalyzer.execute(plan)
+analysis.SimpleAnalyzer.checkAnalysis(connectAnalyzed)
+EliminateSubqueryAliases(connectAnalyzed)

Review Comment:
   hmmm this is what I borrowed from 
https://github.com/apache/spark/blob/c50d865fa9eb5207bc8c9992e37843412ec0cbc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L513
   
   We are using this Catalyst DSL analyze call already before this refactoring.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


alex-balikov commented on code in PR #38405:
URL: https://github.com/apache/spark/pull/38405#discussion_r1007234930


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1941,6 +1941,22 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val STATEFUL_OPERATOR_ALLOW_MULTIPLE =
+buildConf("spark.sql.streaming.statefulOperator.allowMultiple")
+  .internal()
+  .doc("When true, multiple stateful operators are allowed to be present 
in a streaming " +
+"pipeline. The support for multiple stateful operators introduces a 
minor (semantically " +
+"correct) change in respect to late record filtering - late records 
are detected and " +
+"filtered in respect to the watermark from the previous microbatch 
instead of the " +
+"current one. This is a behavior change for Spark streaming pipelines 
and we allow " +

Review Comment:
   Currently it is the same watermark passed to all operators. The issue is if 
anyone has nit tests which check exactly what records are filtered with 
carefully constructed batches and Trigger.Once - such tests can detect the 
change in behavior and fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #38404: Replace Where

2022-10-27 Thread GitBox


AmplabJenkins commented on PR #38404:
URL: https://github.com/apache/spark/pull/38404#issuecomment-1293998786

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox


AmplabJenkins commented on PR #38405:
URL: https://github.com/apache/spark/pull/38405#issuecomment-1293998716

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


AmplabJenkins commented on PR #38406:
URL: https://github.com/apache/spark/pull/38406#issuecomment-1293998669

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-10-27 Thread GitBox


AmplabJenkins commented on PR #38409:
URL: https://github.com/apache/spark/pull/38409#issuecomment-1293998625

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #38410: [SPARK-40932][CORE] Fix issue messages for allGather are overridden

2022-10-27 Thread GitBox


AmplabJenkins commented on PR #38410:
URL: https://github.com/apache/spark/pull/38410#issuecomment-1293998577

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vitaliili-db commented on pull request #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox


vitaliili-db commented on PR #38416:
URL: https://github.com/apache/spark/pull/38416#issuecomment-1293889541

   @MaxGekk backported, please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vitaliili-db opened a new pull request, #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox


vitaliili-db opened a new pull request, #38416:
URL: https://github.com/apache/spark/pull/38416

   
   
   ### What changes were proposed in this pull request?
   
   
   Fix for a bug in Unhex function when there is an odd number of symbols in 
the input string.
   This is backport of #38402
   
   
   ### Why are the changes needed?
   
   Unhex function and other functions depending on it (e.g. ToBinary) produce 
incorrect output.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38406:
URL: https://github.com/apache/spark/pull/38406#discussion_r1007207351


##
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -67,7 +67,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
   case proto.Relation.RelTypeCase.AGGREGATE => 
transformAggregate(rel.getAggregate)
   case proto.Relation.RelTypeCase.SQL => transformSql(rel.getSql)
   case proto.Relation.RelTypeCase.LOCAL_RELATION =>
-transformLocalRelation(rel.getLocalRelation)
+transformLocalRelation(rel.getLocalRelation, common)

Review Comment:
   I sent a PR for this topic (to avoid complicate current refactoring PR too 
much): https://github.com/apache/spark/pull/38415



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia opened a new pull request, #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


amaliujia opened a new pull request, #38415:
URL: https://github.com/apache/spark/pull/38415

   
   
   ### What changes were proposed in this pull request?
   
   In the past, Connect server can check `alias` for `Read` and `Project`. 
However for Spark DataFrame, every DataFrame can be chained with `as(alias: 
String)` thus every Relation/LogicalPlan can have an `alias`. This PR refactors 
to make this work.
   
   ### Why are the changes needed?
   
   Improve API coverage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox


amaliujia commented on PR #38415:
URL: https://github.com/apache/spark/pull/38415#issuecomment-1293887655

   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox


amaliujia commented on code in PR #38395:
URL: https://github.com/apache/spark/pull/38395#discussion_r1007192022


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2100,3 +2100,53 @@ object AsOfJoin {
 }
   }
 }
+
+
+/**
+ * A logical plan for summary.
+ */
+case class Summary(

Review Comment:
   >  but if there is something wrong in analysis, the exception will not be 
thrown.
   
   @zhengruifeng can you elaborate this a bit more? Is this a regression in 
terms of bad cases (e.g. users expect to exceptions but now it is gone)?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38412:
URL: https://github.com/apache/spark/pull/38412#issuecomment-1293828092

   Thank you, @LuciferYang , @HyukjinKwon , @singhpk234 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5

2022-10-27 Thread GitBox


dongjoon-hyun closed pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni 
to 1.5.2-5
URL: https://github.com/apache/spark/pull/38412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox


dongjoon-hyun commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1293774559

   > Could you make the same patch for 3.1 branch?
   
   No, Apache Spark 3.1 reached EOL last month because the first release was 
March 2, 2021.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vitas commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox


vitas commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1293772433

   Could you make the same patch for 3.1 branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


LuciferYang commented on PR #38413:
URL: https://github.com/apache/spark/pull/38413#issuecomment-1293745767

   > assertAnalysisErrorClass
   
   Sounds good, let me try.  Set this to draft first and will ping you when it 
can be reviewed @MaxGekk 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox


bjornjorgensen commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1293732028

   @xinrong-meng There are two tests that don't work for branch 3.2 
   Those are both python tests, can you have a look at them? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


MaxGekk commented on PR #38413:
URL: https://github.com/apache/spark/pull/38413#issuecomment-129372

   I wonder why do we need `assertAnalysisErrorClass()` at all. `checkError` 
does the same job. Seems like `assertAnalysisErrorClass()` checks additionally 
case sensitivity (can be done in test explicitly if it is really needed) + 
checking line + pos (we should check query context instead of that, I guess).
   
   Let's consider to invoke `checkError` in `assertAnalysisErrorClass()` or 
remove it completely (invoke `checkError()` directly in tests as we do in other 
places).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-10-27 Thread GitBox


ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1007065247


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   Isn't it supposed to be `Option(view.sql)`? Or how does it differ from the 
`val sql` below? I think we just need to keep one for callers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox


bjornjorgensen commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1293716740

   @srowen tests have been re-run but it's the same result. 
   But it's the same with everyone else's PR, for a long time.
   Like this one https://github.com/LuciferYang/spark/actions/runs/3309640149 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-10-27 Thread GitBox


SandishKumarHN commented on code in PR #38344:
URL: https://github.com/apache/spark/pull/38344#discussion_r1007058916


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala:
##
@@ -98,13 +92,9 @@ private[protobuf] case class ProtobufDataToCatalyst(
   case PermissiveMode =>
 nullResultRow
   case FailFastMode =>
-throw new SparkException(
-  "Malformed records are detected in record parsing. " +
-s"Current parse Mode: ${FailFastMode.name}. To process malformed 
records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.",
-  e)
+throw 
QueryCompilationErrors.malformedRecordsDetectedInRecordParsingError(e)
   case _ =>
-throw new AnalysisException(unacceptableModeMessage(parseMode.name))
+throw 
QueryCompilationErrors.parseModeUnsupportedError("from_protobuf", parseMode)

Review Comment:
   @MaxGekk fixed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyeweikuer closed pull request #38414: Review from master

2022-10-27 Thread GitBox


wangyeweikuer closed pull request #38414: Review from master
URL: https://github.com/apache/spark/pull/38414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyeweikuer opened a new pull request, #38414: Review from master

2022-10-27 Thread GitBox


wangyeweikuer opened a new pull request, #38414:
URL: https://github.com/apache/spark/pull/38414

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] awdavidson commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-10-27 Thread GitBox


awdavidson commented on PR #38312:
URL: https://github.com/apache/spark/pull/38312#issuecomment-1293507848

   @cloud-fan @LuciferYang any update/response regarding this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox


bjornjorgensen commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1293504608

   ok, I re-run the tests now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


LuciferYang commented on code in PR #38413:
URL: https://github.com/apache/spark/pull/38413#discussion_r1006825185


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala:
##
@@ -183,30 +185,28 @@ trait AnalysisTest extends PlanTest {
 analyzer.checkAnalysis(analyzer.execute(inputPlan))
   }
 
-  if (e.getErrorClass != expectedErrorClass ||
-  e.messageParameters != expectedMessageParameters ||
-  (line >= 0 && e.line.getOrElse(-1) != line) ||
-  (pos >= 0) && e.startPosition.getOrElse(-1) != pos) {

Review Comment:
   It seems that the last condition was wrong? I think
   
   ```
   (pos >= 0) && e.startPosition.getOrElse(-1) != pos
   ```
   
   Should be
   
   ```
   (pos >= 0 && e.startPosition.getOrElse(-1) != pos)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox


LuciferYang commented on PR #38413:
URL: https://github.com/apache/spark/pull/38413#issuecomment-1293477815

   cc @MaxGekk @HyukjinKwon @dongjoon-hyun Think again, does this refactor look 
more simple?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >