date:20220607

[GitHub] [spark] HyukjinKwon closed pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36799: [SPARK-39350][SQL] Add flag to control 
breaking change process for: DESC NAMESPACE EXTENDED should redact properties
URL: https://github.com/apache/spark/pull/36799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36799:
URL: https://github.com/apache/spark/pull/36799#issuecomment-1149535232

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #36803: [SPARK-39411][BUILD] Fix release script to address type hint in pyspark/version.py

2022-06-07 Thread GitBox



HyukjinKwon opened a new pull request, #36803:
URL: https://github.com/apache/spark/pull/36803

   ### What changes were proposed in this pull request?
   
   This PR proposes to address type hints `__version__: str` correctly in each 
release. The type hint was added from Spark 3.3.0 at 
https://github.com/apache/spark/commit/f59e1d548e2e7c97195697910c40c5383a76ca48.
   
   ### Why are the changes needed?
   
   For PySpark to have the correct version in releases.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   Manually tested by setting environment variables and running the changed 
shall commands locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



huaxingao commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149525380

   I think over. I think it's better to have a separate PR to fix the explain 
problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



MaxGekk closed pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error 
messages for try_* function hints
URL: https://github.com/apache/spark/pull/36792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



MaxGekk commented on PR #36792:
URL: https://github.com/apache/spark/pull/36792#issuecomment-1149521109

   +1, LGTM. Merging to 3.3.
   Thank you, @vli-databricks and @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



huaxingao commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149518253

   The fix looks good but the explain result bothers me. Here is what I got 
from the explain result:
   ```
   spark.read.parquet(dir.getCanonicalPath).filter("isnotnull(f)").explain(true)
   
   == Physical Plan ==
   *(1) Filter isnotnull(f#0)
   +- *(1) ColumnarToRow
  +- FileScan parquet [f#0] Batched: true, DataFilters: [isnotnull(f#0)], 
Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/private/var/folders/pt/_5f4sxy56x70dv9zpz032f0mgn/T/spark-42...,
 PartitionFilters: [], PushedFilters: [IsNotNull(f)], ReadSchema: 
struct>
   ```
   
   The explain has `PushedFilters: [IsNotNull(f)]` but the filter actually is 
not pushed down.
   
   The problem is that the pushed down filter information in explain comes from 
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L368).
 As long as the data filters don't include any metadata col filters and can be 
translated OK, Spark assumes the filters can be pushed down OK. 
   
   I am thinking if we should just fix the repeated primitive types for now and 
fix the explain in another PR, or we should fix the explain problem in this PR 
too.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-07 Thread GitBox



cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891950939


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   > In addition, we can make the test pass with hive-storage-api 2.8.1 too.
   
   I also tried to upgrade to 2.8.1 yesterday, did a test, and found that some 
tests of the `OrcFilterSuite` were broken.
   
   Because [HIVE-24458](https://issues.apache.org/jira/browse/HIVE-24458) 
modifies the output of the `SearchArgument#toString` method, which causes some 
test comparisons to fail.
   
   Now there are two ways to solve it,
   1 is to modify the expect value of the suite.
   2 is to use `SearchArgumentImpl#toOldString` provided by HIVE-24458.
   
   I have now tried the second.
   
   
https://github.com/cxzl25/spark/commit/0adfdf60762faef2450f2f3c54c68f1109c5092e
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-07 Thread GitBox



cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891950939


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   > In addition, we can make the test pass with hive-storage-api 2.8.1 too.
   I also tried to upgrade to 2.8.1 yesterday, did a test, and found that some 
tests of the `OrcFilterSuite` were broken.
   
   Because [HIVE-24458](https://issues.apache.org/jira/browse/HIVE-24458) 
modifies the output of the `SearchArgument#toString` method, which causes some 
test comparisons to fail.
   
   Now there are two ways to solve it,
   1 is to modify the expect value of the suite.
   2 is to use `SearchArgumentImpl#toOldString` provided by HIVE-24458.
   
   I have now tried the second.
   
   
https://github.com/cxzl25/spark/commit/0adfdf60762faef2450f2f3c54c68f1109c5092e
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



pan3793 commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891948046


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Updated to use `./bin/...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891946395


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Does `pyspark` mean PySpark shell? Let's either explicitly use the script 
name like `bin/...` or use the full name as PySpark shell.



##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Does `pyspark` mean PySpark shell? Let's either explicitly use the script 
name like `./bin/...` or use the full name as PySpark shell.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36683:
URL: https://github.com/apache/spark/pull/36683#issuecomment-1149492294

   cc @mengxr and @WeichenXu123 in case you guys have some comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36683:
URL: https://github.com/apache/spark/pull/36683#discussion_r891944792


##
python/pyspark/sql/pandas/conversion.py:
##
@@ -596,7 +596,7 @@ def _create_from_pandas_with_arrow(
 ]
 
 # Slice the DataFrame to be batched
-step = -(-len(pdf) // self.sparkContext.defaultParallelism)  # round 
int up
+step = self._jconf.arrowMaxRecordsPerBatch()

Review Comment:
   Yeah, that's true... perf diff seems trivial in any event and seems it works 
around the `spark.rpc.message.maxSize` issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



LuciferYang commented on PR #36800:
URL: https://github.com/apache/spark/pull/36800#issuecomment-1149491231

   thanks @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36800: [SPARK-39409][BUILD] Upgrade 
scala-maven-plugin to 4.6.2
URL: https://github.com/apache/spark/pull/36800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36800:
URL: https://github.com/apache/spark/pull/36800#issuecomment-1149490024

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36802:
URL: https://github.com/apache/spark/pull/36802#issuecomment-1149489864

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-07 Thread GitBox



HyukjinKwon opened a new pull request, #36802:
URL: https://github.com/apache/spark/pull/36802

   ### What changes were proposed in this pull request?
   
   This PR fixes the test to make `CastWithAnsiOffSuite` properly respect 
`ansiEnabled` in `cast string to date #2` test by using 
`CastWithAnsiOffSuite.cast` instead of `Cast` expression.
   
   ### Why are the changes needed?
   
   To make the tests pass. Currently it fails when ANSI mode is on:
   
   https://github.com/apache/spark/runs/6786744647
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   Manually tested in my IDE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36797:
URL: https://github.com/apache/spark/pull/36797#issuecomment-1149483125

   Merged to branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve 
PySpark Structured Streaming page more readable
URL: https://github.com/apache/spark/pull/36797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



LuciferYang commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149455635

   I think this pr should be backport to previous Spark version, because  when 
run `SPARK-39393: Do not push down predicate filters for repeated primitive 
fields` without this pr, I found the error logs as follows:
   
   ```
   Last failure message: There are 1 possibly leaked file streams..
at 
org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:185)
at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:192)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:402)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:401)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:312)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:311)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach(SharedSparkSession.scala:164)
at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach$(SharedSparkSession.scala:158)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.afterEach(ParquetFilterSuite.scala:100)
at 
org.scalatest.BeforeAndAfterEach.$anonfun$runTest$1(BeforeAndAfterEach.scala:247)
at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
at 
org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
at org.scalatest.FailedStatus$.whenCompleted(Status.scala:505)
at org.scalatest.Status.withAfterEffect(Status.scala:373)
at org.scalatest.Status.withAfterEffect$(Status.scala:371)
at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:477)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:246)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
at 
org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:64)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:64)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at 
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at 
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at 
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at 
org.jetbrains.

[GitHub] [spark] Yaohua628 opened a new pull request, #36801: [SPARK-39404][SQL][Streaming] Minor fix for querying `_metadata` in streaming

2022-06-07 Thread GitBox



Yaohua628 opened a new pull request, #36801:
URL: https://github.com/apache/spark/pull/36801

   
   
   ### What changes were proposed in this pull request?
   We added the support to query the `_metadata` column with a file-based 
streaming source: https://github.com/apache/spark/pull/35676. 
   
   We propose to use `transformUp` instead of `match` when pattern matching the 
`dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine 
for `FileStreamSource` because `FileStreamSource` always returns one 
`LogicalRelation` node 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247).
 
   
   But the proposed change will make the logic robust and we really should not 
rely on the upstream source to return a desired plan. In addition, the proposed 
change could also make `_metadata` work if someone wants to customize 
`FileStreamSource` `getBatch`.
   
   
   ### Why are the changes needed?
   Robust
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1149442173

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-07 Thread GitBox



AngersZh commented on PR #36786:
URL: https://github.com/apache/spark/pull/36786#issuecomment-1149434288

   > Could we add a test?
   
   Need to test after built, seems hard to write test in UT...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



LuciferYang opened a new pull request, #36800:
URL: https://github.com/apache/spark/pull/36800

   ### What changes were proposed in this pull request?
   This pr aims upgrade scala-maven-plugin to 4.6.2
   
   
   ### Why are the changes needed?
   This version brings some bug fix related to `Incremental recompile`:
   
   - [Incremental recompileMode - java.lang.NoClassDefFoundError: 
java/sql/ResultSetMetaData](https://github.com/davidB/scala-maven-plugin/issues/502)
   - [Fix incremental compilation on Java 11+, close 
#600](https://github.com/davidB/scala-maven-plugin/pull/609)
   
   all changes as follows:
   
   - https://github.com/davidB/scala-maven-plugin/compare/4.6.1...4.6.2
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass Github Actions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891887309


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +571,103 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog(catalogName).asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+  assert(table.properties().get("external").equals("true"))
+  assert(table.properties().get("location").equals("file:" + 
dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test managed table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map.empty[String, String])
+
+  val tableName2 = "my_table2"
+  val description2 = "this is a test external table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName2).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description2,
+options = Map("path" -> dir.getAbsolutePath))
+
+  val tables = spark.catalog.listTables("testcat.my_db").collect()
+  assert(tables.size == 2)
+
+  val expectedTable1 =
+new Table(tableName, catalogName, Array(dbName), description,
+  CatalogTableType.MANAGED.name, false)
+  assert(tables.exists(t =>
+expectedTable1.name.equals(t.name) && 
expectedTable1.database.equals(t.database) &&
+expectedTable1.description.equals(t.description) &&
+expectedTable1.tableType.equals(t.tableType) &&
+expectedTable1.isTemporary == t.isTemporary))
+
+  val expectedTable2 =
+new Table(tableName2, catalogName, Array(dbName), description2,
+  CatalogTableType.EXTERNAL.name, false)
+  assert(tables.exists(t =>
+expectedTable2.name.equals(t.name) && 
expectedTable2.database.equals(t.database) &&
+expectedTable2.description.equals(t.description) &&
+expectedTable2.tableType.equals(t.tableType) &&
+expectedTable2.isTemporary == t.isTemporary))
+}

Review Comment:
   Sure I will add this test case.
   
   I won't call this is to test `backward compatibility` though. `listTable` 
should always accept DB name: either it is a `database` or it is a 
`catalog_name.database`.
   
   This is more like to test corretness. 



-- 
This is an automated message from the Apache Git Service.
To respond to

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891881553


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +571,103 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog(catalogName).asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+  assert(table.properties().get("external").equals("true"))
+  assert(table.properties().get("location").equals("file:" + 
dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test managed table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map.empty[String, String])
+
+  val tableName2 = "my_table2"
+  val description2 = "this is a test external table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName2).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description2,
+options = Map("path" -> dir.getAbsolutePath))
+
+  val tables = spark.catalog.listTables("testcat.my_db").collect()
+  assert(tables.size == 2)
+
+  val expectedTable1 =
+new Table(tableName, catalogName, Array(dbName), description,
+  CatalogTableType.MANAGED.name, false)
+  assert(tables.exists(t =>
+expectedTable1.name.equals(t.name) && 
expectedTable1.database.equals(t.database) &&
+expectedTable1.description.equals(t.description) &&
+expectedTable1.tableType.equals(t.tableType) &&
+expectedTable1.isTemporary == t.isTemporary))
+
+  val expectedTable2 =
+new Table(tableName2, catalogName, Array(dbName), description2,
+  CatalogTableType.EXTERNAL.name, false)
+  assert(tables.exists(t =>
+expectedTable2.name.equals(t.name) && 
expectedTable2.database.equals(t.database) &&
+expectedTable2.description.equals(t.description) &&
+expectedTable2.tableType.equals(t.tableType) &&
+expectedTable2.isTemporary == t.isTemporary))
+}

Review Comment:
   can we test the backward compatibility case? e.g. register a new catalog 
"default" in this test case, and `listTable("default")` should still list 
tables under "default" schema in HMS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891880524


##
sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala:
##
@@ -165,7 +165,8 @@ class GlobalTempViewSuite extends QueryTest with 
SharedSparkSession {
   assert(spark.catalog.tableExists(globalTempDB, "src"))
   assert(spark.catalog.getTable(globalTempDB, "src").toString == new Table(
 name = "src",
-database = globalTempDB,
+catalog = "spark_catalog",

Review Comment:
   ```suggestion
   catalog = CatalogManager.SESSION_CATALOG_NAME,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879819


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)

Review Comment:
   
`metadata.map(_.identifier.database).getOrElse(tableIdent.database).map(Array(_)).orNull`.
 In abnormal case we should return null, not Array(null)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879819


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)

Review Comment:
   
`metadata.map(_.identifier.database).getOrElse(tableIdent.database).toArray`. 
In abnormal case we should return null, not Array(null)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36782: [SPARK-39394][DOCS][SS] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



itholic commented on PR #36782:
URL: https://github.com/apache/spark/pull/36782#issuecomment-1149410531

   @HyukjinKwon Sure, I created a PR: https://github.com/apache/spark/pull/36797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879436


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)
 new Table(
   name = tableIdent.table,
-  database = 
metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull,
+  catalog = "spark_catalog",

Review Comment:
   ```suggestion
 catalog = CatalogManager.SESSION_CATALOG_NAME,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879053


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -97,8 +98,21 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
*/
   @throws[AnalysisException]("database does not exist")
   override def listTables(dbName: String): Dataset[Table] = {
-val tables = sessionCatalog.listTables(dbName).map(makeTable)
-CatalogImpl.makeDataset(tables, sparkSession)
+// dbName could be either 2-part name or 3-part name. We assume it is 
2-part name

Review Comment:
   ```suggestion
   // `dbName` could be either a single database name (behavior in Spark 
3.3 and prior) or
   // a qualified namespace with catalog name. We assume it's a single 
database name and ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Borjianamin98 commented on a diff in pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



Borjianamin98 commented on code in PR #36781:
URL: https://github.com/apache/spark/pull/36781#discussion_r891878580


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##
@@ -1316,6 +1317,34 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 }
   }
 
+  test("SPARK-39393 Do not push down predicate filters for repeated primitive 
fields") {

Review Comment:
   Thanks. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891878020


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -965,6 +965,10 @@ class SessionCatalog(
 isTempView(nameParts.asTableIdentifier)
   }
 
+  def isGlobalTempView(dbName: String): Boolean = {

Review Comment:
   ```suggestion
 def isGlobalTempViewDB(dbName: String): Boolean = {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891876268


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   The error is simply wrapped and the original error is kept in causedBy, so 
this is not a big problem. The bigger problem is how to define internal errors. 
We picked some common java exceptions that can be treated as internal errors: 
when end users see these errors from Spark, it indicates a bug.
   
   For third-party Spark plugins, I think all errors from them are internal 
errors? end-users can do nothing to work around these errors. cc @srielau 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor opened a new pull request, #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



dtenedor opened a new pull request, #36799:
URL: https://github.com/apache/spark/pull/36799

   ### What changes were proposed in this pull request?
   
   Add a flag to control breaking change process for: DESC NAMESPACE EXTENDED 
should redact properties.
   
   ### Why are the changes needed?
   
   This lets Spark users control how the new behavior rolls out to users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   This PR extends unit test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



dtenedor commented on PR #36799:
URL: https://github.com/apache/spark/pull/36799#issuecomment-1149401878

   @HyukjinKwon would you be able to help with this PR for the breaking change 
process?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891867452


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   And we may need to make this clear to the interface contract on data source 
- now we are applying the error class framework even the exception comes from 
the data source. The convention, only applies in Spark project itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36789:
URL: https://github.com/apache/spark/pull/36789#issuecomment-1149394119

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan

2022-06-07 Thread GitBox



wangyum commented on PR #36798:
URL: https://github.com/apache/spark/pull/36798#issuecomment-1149393100

   cc @maryannxue @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request, #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan

2022-06-07 Thread GitBox



wangyum opened a new pull request, #36798:
URL: https://github.com/apache/spark/pull/36798

   ### What changes were proposed in this pull request?
   
   This pr updates the buildKeys for `DynamicPruningSubquery.withNewPlan`.
   
   ### Why are the changes needed?
   
   Fix bug. Otherwise, the structural integrity of the plan is broken after 
this step:
   
https://github.com/apache/spark/blob/d9fd36eb76fcfec95763cc4dc594eb7856b0fad2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala#L129-L134
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891861831


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   I just realized this is included in Spark 3.3.0 instead of 3.4.0... I 
thought we have sufficient time to look into, but it doesn't seem so 
unfortunately...
   
   Can we check how the error message has changed for the test in user facing, 
and decide whether it is a breaking change or not? If the error class framework 
hides the details since it's considered as an internal error, this is a huge 
breaking change since we give a guidance in the error message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891847061


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Thank you for reviewing.
   
   For any event timestamp, the start time of the last window (whether tumbling 
or sliding) obtained by it should always be less than the current timestamp. 
When `(timestamp - window.starttime)% window.slideduration <0`, the obtained 
laststart will be greater than timestamp. At this time, it is necessary to 
shift the window to the right, which is the correct laststart value.
   
   For example
   
   code
   ```
   val timestamp = -13
   val offset = 0
   val windowSize = 7
   
   // old code
   val lastStartOld = timestamp - (timestamp - offset - windowSize) % 
windowSize
   
  // new code
   val remainder =  (timestamp - offset) % windowSize
   
   val lastStartNew =
 if (remainder < 0) {
   timestamp - remainder - windowSize
 } else {
   timestamp - remainder
 }
   
   println(s"lastStartOld = $lastStartOld   lastStartNew = $lastStartNew")
   ```
   result
   ```
   lastStartOld = -7 lastStartNew = -14
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



pan3793 commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891852053


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit

Review Comment:
   You are right, because `spark-shell` invokes `spark-submit`, I change the 
description to "for spark-submit and the command that depends on spark-submit"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions.predicate`, the `remainder < 
0` can't get a` boolean` value directly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions.predicate`, the `remainder < 
0` can't get a` boolean` value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891848795


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   `before epoch` means that B.C or before '1970-01-01 00:00:00'? I 
understanding that both of negative value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



srowen commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891850083


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   I think the question is how this arises in the first place. The code is self 
explanatory, not asking you to explain it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891848795


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   `before epoch` means that B.C or before '1970-01-01 00:00:00',I 
understanding that both of negative value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xiuzhu9527 commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-07 Thread GitBox



xiuzhu9527 commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1149366600

   @pan3793 Thank you very much for your reply. This problem has affected 
users' use. Can we fix this problem first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891847061


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Thank you for reviewing.
   
   For any event timestamp, the start time of the last window (whether tumbling 
or sliding) obtained by it should always be less than the current timestamp. 
When `(timestamp - window.starttime)% window.slideduration <0`, the obtained 
laststart will be greater than timestamp. At this time, it is necessary to 
shift the window to the right, which is the correct laststart value.
   
   For example
   
   code
   ```
   val timestamp = -13
   val offset = 0
   val windowSize = 7
   
   // old code
   val lastStartOld = timestamp - (timestamp - offset - windowSize) % 
windowSize
   
  // new code
   val remainder =  (timestamp - offset) % windowSize
   
   val lastStartNew =
 if (remainder < 0) {
   timestamp - remainder - windowSize
 } else {
   timestamp - remainder - windowSize
 }
   
   println(s"lastStartOld = $lastStartOld   lastStartNew = $lastStartNew")
   ```
   result
   ```
   lastStartOld = -7 lastStartNew = -14
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891846346


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   What makes the time as "illegal" or "inappropriate" is the matter. It does 
not only bind to the bug of the application. The definition is not strict 
enough - if we call readTable whereas concurrent operation on non-atomic 
drop-and-recreate table is happening, the time is "conditionally" 
"inappropriate".
   
   For sure, we can be strict on the project's policy to follow the convention 
you mentioned (probably define new more-specific exception(s) if needed). For 
Kafka data source, we'll need some time to sort out on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-06-07 Thread GitBox



cloud-fan closed pull request #36703: [SPARK-39321][SQL] Refactor TryCast to 
use RuntimeReplaceable
URL: https://github.com/apache/spark/pull/36703


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-06-07 Thread GitBox



cloud-fan commented on PR #36703:
URL: https://github.com/apache/spark/pull/36703#issuecomment-1149360596

   thanks for the review, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dcoliversun commented on a diff in pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



dcoliversun commented on code in PR #36781:
URL: https://github.com/apache/spark/pull/36781#discussion_r891843252


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##
@@ -1316,6 +1317,34 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 }
   }
 
+  test("SPARK-39393 Do not push down predicate filters for repeated primitive 
fields") {

Review Comment:
   Nice work. Little suggestion: 
   ```suggestion
 test("SPARK-39393: Do not push down predicate filters for repeated 
primitive fields") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions. predicate`, the `remainder < 
0` can't get a` boolean` value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The remainder is a expressions. Predicate, the remainder < 0 
can't get a Boolean value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891840233


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   If Spark is a simple Java library, then this is fine. The caller should 
handle these errors property when building a product.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891839935


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   > Signals that a method has been invoked at an illegal or inappropriate time.
   
   This means a bug in any product, right? We should not expose these low-level 
errors to end-users, same to NPE, index out of bounds error, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic opened a new pull request, #36797: Spark 39394 3.3

2022-06-07 Thread GitBox



itholic opened a new pull request, #36797:
URL: https://github.com/apache/spark/pull/36797

   ### What changes were proposed in this pull request?
   
   Hotfix https://github.com/apache/spark/pull/36782 for branch-3.3.
   
   
   ### Why are the changes needed?
   
   The improvement of document readability will also improve the usability for 
PySpark Structured Streaming.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, now the documentation is categorized by its class or their own purpose 
more clearly as below:
   
   ![Screen Shot 2022-06-07 at 12 30 01 
PM](https://user-images.githubusercontent.com/44108233/172289737-bd6ebf0e-601c-4a80-a16a-cf885302e7b6.png)
   
   
   ### How was this patch tested?
   
   The existing doc build in CI should cover.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36792:
URL: https://github.com/apache/spark/pull/36792#issuecomment-1149336001

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891827819


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   
https://docs.oracle.com/javase/8/docs/api/java/lang/IllegalStateException.html
   
   > Signals that a method has been invoked at an illegal or inappropriate 
time. In other words, the Java environment or Java application is not in an 
appropriate state for the requested operation.
   
   It's arguable what is the "state" of the application. If the state of the 
application is dependent on the external system (like this), this statement 
sounds to me to be valid.
   
   If Spark project wants to restrict the usage of IllegalStateException to 
only denote the possible internal error then that is fair, but would be great 
if we do not rely on "convention" (that's a magic word everyone can claim the 
different things with the same word) and explicitly mention it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36790: [SPARK-39402][SQL] Optimize ReplaceCTERefWithRepartition to support coalesce partitions

2022-06-07 Thread GitBox



wangyum commented on PR #36790:
URL: https://github.com/apache/spark/pull/36790#issuecomment-1149320236

   cc @maryannxue @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36788: [SPARK-39401][SQL][TESTS] Replace withView with withTempView in CTEInlineSuite

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36788: [SPARK-39401][SQL][TESTS] Replace 
withView with withTempView in CTEInlineSuite
URL: https://github.com/apache/spark/pull/36788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36788: [SPARK-39401][SQL][TESTS] Replace withView with withTempView in CTEInlineSuite

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36788:
URL: https://github.com/apache/spark/pull/36788#issuecomment-1149311965

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891812728


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit

Review Comment:
   Is this only for spark-submit? Seems like it's also used in Spark shell



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36782: [SPARK-39394][DOCS][SS] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36782:
URL: https://github.com/apache/spark/pull/36782#issuecomment-1149309255

   @itholic mind creating a backporting PR to 3.3? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891811193


##
python/pyspark/sql/tests/test_session.py:
##
@@ -379,6 +381,54 @@ def test_use_custom_class_for_extensions(self):
 )
 
 
+class CreateDataFrame(unittest.TestCase):

Review Comment:
   Can we add this test into ArrowTests so it tests Arrow 
optimization/nonoptimization?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #36795:
URL: https://github.com/apache/spark/pull/36795#issuecomment-1149307716

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun closed pull request #36795: [SPARK-39407][SQL][TESTS] Fix 
ParquetIOSuite to handle the case where DNS responses on `nonexistent`
URL: https://github.com/apache/spark/pull/36795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #36795:
URL: https://github.com/apache/spark/pull/36795#issuecomment-1149307127

   Thank you, @huaxingao . Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen closed pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



JoshRosen closed pull request #36794: Add Serializable Trait to 
DirectBinaryAvroDecoder
URL: https://github.com/apache/spark/pull/36794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast commented on pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



eswardhinakaran-toast commented on PR #36794:
URL: https://github.com/apache/spark/pull/36794#issuecomment-1149281389

   i meant to close this PR. accidental pull request
   
   On Tue, Jun 7, 2022 at 7:40 PM UCB AMPLab ***@***.***> wrote:
   
   > Can one of the admins verify this patch?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36794:
URL: https://github.com/apache/spark/pull/36794#issuecomment-1149277712

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast closed pull request #36796: Add serializable trait to direct binary avro decoder

2022-06-07 Thread GitBox



eswardhinakaran-toast closed pull request #36796: Add serializable trait to 
direct binary avro decoder
URL: https://github.com/apache/spark/pull/36796


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast opened a new pull request, #36796: Add serializable trait to direct binary avro decoder

2022-06-07 Thread GitBox



eswardhinakaran-toast opened a new pull request, #36796:
URL: https://github.com/apache/spark/pull/36796

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on

2022-06-07 Thread GitBox



dongjoon-hyun opened a new pull request, #36795:
URL: https://github.com/apache/spark/pull/36795

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast opened a new pull request, #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



eswardhinakaran-toast opened a new pull request, #36794:
URL: https://github.com/apache/spark/pull/36794

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891581163


##
python/pyspark/sql/session.py:
##
@@ -952,12 +953,30 @@ def createDataFrame(  # type: ignore[misc]
 schema = [x.encode("utf-8") if not isinstance(x, str) else x for x 
in schema]
 
 try:
-import pandas
+import pandas as pd
 
 has_pandas = True
 except Exception:
 has_pandas = False
-if has_pandas and isinstance(data, pandas.DataFrame):
+
+try:
+import numpy as np

Review Comment:
   All NumPy versions should be fine as long as pandas can convert the ndarray 
to pandas Frame. So we check pandas minimum version only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1149066369

   Thank YOU for your contribution. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

2022-06-07 Thread GitBox



dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891607243


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   I'd expect something like `BytesColumnVector should not throw 
RuntimeException due to overflow`.
   ```
   Caused by: java.lang.RuntimeException: Overflow of newLength. 
smallBuffer.length=1073741824, nextElemLength=1048576
at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zr-msft commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



zr-msft commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1149052032

   thank you @dongjoon-hyun, I made the assumption that the live site was 
updated after a PR was merged. I see my updates on the rc5 docs and thank you 
for the quick response


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-07 Thread GitBox



srielau commented on code in PR #36693:
URL: https://github.com/apache/spark/pull/36693#discussion_r891606582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala:
##
@@ -36,13 +36,31 @@ class AnalysisException protected[sql] (
 @transient val plan: Option[LogicalPlan] = None,
 val cause: Option[Throwable] = None,
 val errorClass: Option[String] = None,
+val errorSubClass: Option[String] = None,
 val messageParameters: Array[String] = Array.empty)
   extends Exception(message, cause.orNull) with SparkThrowable with 
Serializable {
 
+protected[sql] def this(message: String,

Review Comment:
   It's need for binary compatibility :-(



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

2022-06-07 Thread GitBox



dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891605317


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   Please revise the test case name too. A test case name had better be 
self-describing about the test case body.
   
   For example, we can add a test case as `ignored` test before upgrading 
`hive-storage-api`. In addition, we can make the test pass with 
`hive-storage-api` 2.8.1 too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891588142


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {
+  if (md == null) {
+Option(DecimalType.SYSTEM_DEFAULT)
+  } else {
+val scale = md.build().getLong("scale")
+// In Teradata, Number or Number(*) means precision and scale is 
flexible.
+// However, the scale returned from JDBC is 0, which leads to 
fractional part loss.
+// Handle this special case by adding explicit conversion to system 
default decimal type.
+// Note, even if the NUMBER is defined with explicit precision and 
scale like NUMBER(20, 0),
+// Spark will treat it as DecimalType.SYSTEM_DEFAULT, which is 
NUMBER(38,18)
+if (scale == 0) {
+  Option(DecimalType.SYSTEM_DEFAULT)
+} else {
+  // In Teradata, Number(*, scale) will return size, namely precision, 
as 40,
+  // which conflicts to DecimalType.MAX_PRECISION
+  Option(DecimalType(Math.min(size, DecimalType.MAX_PRECISION), 
scale.toInt))

Review Comment:
   Thanks for this comment! It reminds me of sth more reasonable than my 
current practice! Since in Teradata, only Number(\*)/Number, Number(*,scale) 
and Number(precision,scale) is valid expression, which means when scale is 
flexible, the precision returned must be 40. So we don't need to convert all 
scale = 0 to default decimal type, but only need to do it when the precision = 
40 is detected. Which means we will respect user's explicit scale = 0 settings, 
like Number(20,0), will be converted to DecimalType(20,0). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891588142


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {
+  if (md == null) {
+Option(DecimalType.SYSTEM_DEFAULT)
+  } else {
+val scale = md.build().getLong("scale")
+// In Teradata, Number or Number(*) means precision and scale is 
flexible.
+// However, the scale returned from JDBC is 0, which leads to 
fractional part loss.
+// Handle this special case by adding explicit conversion to system 
default decimal type.
+// Note, even if the NUMBER is defined with explicit precision and 
scale like NUMBER(20, 0),
+// Spark will treat it as DecimalType.SYSTEM_DEFAULT, which is 
NUMBER(38,18)
+if (scale == 0) {
+  Option(DecimalType.SYSTEM_DEFAULT)
+} else {
+  // In Teradata, Number(*, scale) will return size, namely precision, 
as 40,
+  // which conflicts to DecimalType.MAX_PRECISION
+  Option(DecimalType(Math.min(size, DecimalType.MAX_PRECISION), 
scale.toInt))

Review Comment:
   Thanks for this comment! It reminds me of sth more reasonable than my 
current practice! Since in Teradata, only Number(*)/Number, Number(*,scale) and 
Number(precision,scale) is valid expression, which means when scale is 
flexible, the precision returned must be 40. So we don't need to convert all 
scale = 0 to default decimal type, but only need to do it when the precision = 
40 is detected. Which means we will respect user's explicit scale = 0 settings, 
like Number(20,0), will be converted to DecimalType(20,0). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891581163


##
python/pyspark/sql/session.py:
##
@@ -952,12 +953,30 @@ def createDataFrame(  # type: ignore[misc]
 schema = [x.encode("utf-8") if not isinstance(x, str) else x for x 
in schema]
 
 try:
-import pandas
+import pandas as pd
 
 has_pandas = True
 except Exception:
 has_pandas = False
-if has_pandas and isinstance(data, pandas.DataFrame):
+
+try:
+import numpy as np

Review Comment:
   All NumPy versions should be fine as long as pandas can convert the ndarray 
to pandas Frame.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks opened a new pull request, #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks opened a new pull request, #36793:
URL: https://github.com/apache/spark/pull/36793

   ### What changes were proposed in this pull request?
   Accept numpy array in createDataFrame, with existing dtypes support.
   
   
   ### Why are the changes needed?
   As part of [SPARK-39405], 
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, NumPy array is accepted in createDataFrame now:
   ```py
   ```
   
   
   ### How was this patch tested?
   Unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891563533


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +128,44 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)
 new Table(
   name = tableIdent.table,

Review Comment:
   done



##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +128,44 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =

Review Comment:
   hm updated.



##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -97,8 +98,18 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
*/
   @throws[AnalysisException]("database does not exist")
   override def listTables(dbName: String): Dataset[Table] = {
-val tables = sessionCatalog.listTables(dbName).map(makeTable)
-CatalogImpl.makeDataset(tables, sparkSession)
+if (sessionCatalog.databaseExists(dbName)) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang closed pull request #36694: [MINOR][BUILD] Remove redundant maven `` definition

2022-06-07 Thread GitBox



LuciferYang closed pull request #36694: [MINOR][BUILD] Remove redundant maven 
`` definition
URL: https://github.com/apache/spark/pull/36694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



vli-databricks commented on PR #36792:
URL: https://github.com/apache/spark/pull/36792#issuecomment-1149003141

   @MaxGekk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks opened a new pull request, #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



vli-databricks opened a new pull request, #36792:
URL: https://github.com/apache/spark/pull/36792

   
   
   Refine ANSI error messages and remove 'To return NULL instead'
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   Improve error messaging for ANSI mode since the user may not even aware that 
query was returning NULLs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerrypeng commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



jerrypeng commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891557370


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   If the timeColumn is before epoch, what would be the timestamp long val be? 
Negative?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891537461


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   Actually we still need the change in `listTables`: 
   
   existing users do not use 3-part name and they just use 2 part name. In the 
case of the `listTables`, existing users use `dbname` directly (so just `b` but 
not `a.b`). In this case, there is a choice to decide which catalog it is, and 
the default catalog is hive metastore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891534128


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -64,15 +65,34 @@ class Database(
 @Stable
 class Table(
 val name: String,
-@Nullable val database: String,
+@Nullable val catalog: String,
+@Nullable val namespace: Array[String],
 @Nullable val description: String,
 val tableType: String,
 val isTemporary: Boolean)
   extends DefinedByConstructorParams {
 
+  def this(name: String, database: String, description: String, tableType: 
String,
+isTemporary: Boolean) = {
+this(name, null, Array(database), description, tableType, isTemporary)
+  }
+
+  def database: String = {
+if (namespace == null) {
+  null
+} else if (namespace.length == 2) {

Review Comment:
   Right I think we only care about array.length = 1 which is the database.



##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -64,15 +65,34 @@ class Database(
 @Stable
 class Table(
 val name: String,
-@Nullable val database: String,
+@Nullable val catalog: String,
+@Nullable val namespace: Array[String],
 @Nullable val description: String,
 val tableType: String,
 val isTemporary: Boolean)
   extends DefinedByConstructorParams {
 
+  def this(name: String, database: String, description: String, tableType: 
String,
+isTemporary: Boolean) = {
+this(name, null, Array(database), description, tableType, isTemporary)
+  }
+
+  def database: String = {
+if (namespace == null) {
+  null
+} else if (namespace.length == 2) {

Review Comment:
   Right I think we only care about array.length = 1 which is the database. 
Will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerrypeng commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



jerrypeng commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891532480


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Can you explain why this is correct now?  Please give an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891528661


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {

Review Comment:
   Good point! Will modify as per.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891497676


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   yeah if you agree I will remove unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks closed pull request #36791: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function…

2022-06-07 Thread GitBox



vli-databricks closed pull request #36791: [SPARK-39392][SQL][3.3] Refine ANSI 
error messages for try_* function…
URL: https://github.com/apache/spark/pull/36791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 219 matches

Mail list logo