[PR] [MINOR][DOCS] Change `SPARK_ANSI_SQL_MODE`in PlanStabilitySuite documentation [spark]

2024-04-21 Thread via GitHub


HyukjinKwon opened a new pull request, #46148:
URL: https://github.com/apache/spark/pull/46148

   ### What changes were proposed in this pull request?
   
   This PR proposes to fix `SPARK_ANSI_SQL_MODE=true` to 
`SPARK_ANSI_SQL_MODE=false` in `PlanStabilitySuite` documentation
   
   ### Why are the changes needed?
   
   In order to guide developers to generate golden files by both Non-ANSI and 
ANSI modes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   Manually.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2067976202

   @dongjoon-hyun @sunchao @LuciferYang the integration test with Hive RC0 
passed, do you have other concerns or extra cases that need to be verified?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR] Improve SparkPi example [spark]

2024-04-21 Thread via GitHub


EnricoMi closed pull request #45664: [MINOR] Improve SparkPi example
URL: https://github.com/apache/spark/pull/45664


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR] Improve SparkPi example [spark]

2024-04-21 Thread via GitHub


EnricoMi commented on PR #45664:
URL: https://github.com/apache/spark/pull/45664#issuecomment-2067958222

   Thanks for the input.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2067976499

   also cc @wangyum @HyukjinKwon @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574144603


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available 
in maven central

Review Comment:
   I searched `DEFAULT_ARTIFACT_REPOSITORY`, it looks like affects multiple 
places, I addressed the `ADD JAR` in another approach 
https://github.com/apache/spark/pull/45372/commits/ec9848f108ce529e5e4f2d55bc66169474a0eb95



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574116717


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available 
in maven central

Review Comment:
   I remember it did not work on my first try, let me try again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" [spark]

2024-04-21 Thread via GitHub


pan3793 commented on PR #46150:
URL: https://github.com/apache/spark/pull/46150#issuecomment-2068478468

   cc @dongjoon-hyun @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-21 Thread via GitHub


itholic commented on code in PR #46123:
URL: https://github.com/apache/spark/pull/46123#discussion_r1574132243


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -1315,6 +1315,35 @@ def test_parse_json(self):
 self.assertEqual("""{"a":1}""", actual["var"])
 self.assertEqual("""{"b":[{"c":"str2"}]}""", actual["var_lit"])
 
+def test_variant_expressions(self):
+df = self.spark.createDataFrame([Row(json="""{ "a" : 1 }"""), 
Row(json="""{ "b" : 2 }""")])
+v = F.parse_json(df.json)
+
+def check(resultDf, expected):
+self.assertEqual([r[0] for r in resultDf.collect()], expected)
+
+check(df.select(F.is_variant_null(v)), [False, False])
+check(df.select(F.schema_of_variant(v)), ["STRUCT", 
"STRUCT"])
+check(df.select(F.schema_of_variant_agg(v)), ["STRUCT"])
+
+check(df.select(F.variant_get(v, "$.a", "int")), [1, None])
+check(df.select(F.variant_get(v, "$.b", "int")), [None, 2])
+check(df.select(F.variant_get(v, "$.a", "double")), [1.0, None])
+
+with self.assertRaises(SparkRuntimeException) as ex:
+df.select(F.variant_get(v, "$.a", "binary")).collect()

Review Comment:
   Got it. Looks good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47932][SQL][TEST] Avoid using legacy commons-lang [spark]

2024-04-21 Thread via GitHub


smileyboy2019 commented on PR #46154:
URL: https://github.com/apache/spark/pull/46154#issuecomment-2068511712

   Structured Streaming supports writing Spark SQL and using SQL to write 
stream processing logic. Is this possible, similar to the syntax of Flink. SQL 
can satisfy the flow processing process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47596][DSTREAMS] Streaming: Migrate logWarn with variables to structured logging framework [spark]

2024-04-21 Thread via GitHub


smileyboy2019 commented on PR #46079:
URL: https://github.com/apache/spark/pull/46079#issuecomment-2068510767

   Structured Streaming supports writing Spark SQL and using SQL to write 
stream processing logic. Is this possible, similar to the syntax of Flink. SQL 
can satisfy the flow processing process.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47912][SQL] Infer serde class from format classes [spark]

2024-04-21 Thread via GitHub


smileyboy2019 commented on PR #46132:
URL: https://github.com/apache/spark/pull/46132#issuecomment-2068512262

   Structured Streaming supports writing Spark SQL and using SQL to write 
stream processing logic. Is this possible, similar to the syntax of Flink. SQL 
can satisfy the flow processing process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574143381


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##
@@ -1627,10 +1627,8 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val testData = TestHive.getHiveFile("data/files/sample.json").toURI
 withTable("t") {
-  // hive-catalog-core has some transitive dependencies which dont exist 
on maven central

Review Comment:
   The "some transitive dependencies which dont exist on maven central" is 
"org.pentaho:pentaho-aggdesigner-algorithm", it's not a problem in Hive 2.3.10, 
see https://github.com/apache/hive/pull/4486



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47932][SQL][TEST] Avoid using legacy commons-lang [spark]

2024-04-21 Thread via GitHub


pan3793 commented on PR #46154:
URL: https://github.com/apache/spark/pull/46154#issuecomment-2068515201

   @smileyboy2019 the question is irrelevant to this PR, please ask questions 
in the mailing list or Slack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-21 Thread via GitHub


zhengruifeng commented on code in PR #46123:
URL: https://github.com/apache/spark/pull/46123#discussion_r1574083962


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala:
##
@@ -6975,6 +6975,71 @@ object functions {
*/
   def parse_json(json: Column): Column = Column.fn("parse_json", json)
 
+  /**
+   * Check if a variant value is a variant null. Returns true if and only if 
the input is a
+   * variant null and false otherwise (including in the case of SQL NULL).
+   *
+   * @param v
+   *   a variant column.
+   * @group variant_funcs

Review Comment:
   @chenhao-db I mean group `variant_funcs` is missing 
[here](https://github.com/apache/spark/blob/9f34b8eca2f3578367352ec28532e29fe3a91864/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala#L58-L81)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574084085


##
sql/core/src/test/resources/sql-tests/analyzer-results/predicate-functions.sql.out:
##
@@ -450,3 +450,18 @@ Project [NOT between(to_timestamp(2022-12-26 00:00:01, 
None, TimestampType, Some
 select rand(123) not between 0.1 AND 0.2
 -- !query analysis
 [Analyzer test output redacted due to nondeterminism]
+
+
+-- !query
+set spark.sql.legacy.bangEqualsNot=true
+select 1 ! between 0 and 2
+-- !query analysis
+java.lang.IllegalArgumentException
+spark.sql.legacy.bangEqualsNot should be boolean, but was true

Review Comment:
   this looks weird...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46126:
URL: https://github.com/apache/spark/pull/46126#discussion_r1574081702


##
connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##
@@ -208,14 +208,13 @@ object SchemaConverters {
   // could be "a" and "A" and we need to distinguish them. In 
this case, we throw
   // an exception.
   // Stable id prefix can be empty so the name of the field 
can be just the type.
-  val tempFieldName =
-
s"${stableIdPrefixForUnionType}${s.getName.toLowerCase(Locale.ROOT)}"
-  if (fieldNameSet.contains(tempFieldName)) {
+  val tempFieldName = 
s"${stableIdPrefixForUnionType}${s.getName}"
+

Review Comment:
   Shall we remove this new empty line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


srielau commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574085978


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -365,6 +367,8 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 val cols = 
Option(ctx.identifierList()).map(visitIdentifierList).getOrElse(Nil)
 val partitionKeys = 
Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty)
 
+blockBang(ctx.errorCapturingNot())

Review Comment:
   I tried that. But I need to invoke visitErrorCapturingNot explicitly which 
makes it worse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


srielau commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574095617


##
common/utils/src/main/resources/error/error-conditions.json:
##
@@ -3798,6 +3798,21 @@
 ],
 "sqlState" : "22003"
   },
+  "SYNTAX_DISCONTINUED" : {
+"message" : [
+  "Support of the clause or keyword:  has been discontinued in 
this context."
+],
+"subClass" : {
+  "BANG_EQUALS_NOT" : {
+"message" : [
+  "The '!' keyword is supported as a prefix operator in a logical 
operation only.",

Review Comment:
   NOT, AND, OR (?) I can rephrase, "Only supported for the prefix operator 
"NOT"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]

2024-04-21 Thread via GitHub


sadikovi commented on PR #46126:
URL: https://github.com/apache/spark/pull/46126#issuecomment-2068451840

   Thanks for the review @dongjoon-hyun. I have addressed your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on PR #45372:
URL: https://github.com/apache/spark/pull/45372#issuecomment-2068475033

   @dongjoon-hyun This one is mostly for verification, let's try any ideas on 
it. I plan to split it into several PRs once Hive 2.3.10 is officially released.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574116295


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##
@@ -3748,7 +3748,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val sc = spark.sparkContext
-val hiveVersion = "2.3.9"
+val hiveVersion = "2.3.9" // TODO update to 2.3.10 after jar available in 
maven central

Review Comment:
   this one is actually irrelevant, see 
SPARK-47928(https://github.com/apache/spark/pull/46150)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574084230


##
common/utils/src/main/resources/error/error-conditions.json:
##
@@ -3798,6 +3798,21 @@
 ],
 "sqlState" : "22003"
   },
+  "SYNTAX_DISCONTINUED" : {
+"message" : [
+  "Support of the clause or keyword:  has been discontinued in 
this context."
+],
+"subClass" : {
+  "BANG_EQUALS_NOT" : {
+"message" : [
+  "The '!' keyword is supported as a prefix operator in a logical 
operation only.",

Review Comment:
   what is logical operation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47931][SQL] Remove unused and leaked threadlocal/session sessionHive [spark]

2024-04-21 Thread via GitHub


yaooqinn opened a new pull request, #46153:
URL: https://github.com/apache/spark/pull/46153

   
   
   
   
   ### What changes were proposed in this pull request?
   
   
   This `sessionHive` is never used and properly closed
   
   ### Why are the changes needed?
   
   
   A thread local hive instance like `sessionHive` shall be closed properly via 
'Hive.closeCurrent'. Here we just remove it as we never use it for HMS 
connectivities.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   no
   ### How was this patch tested?
   
   pass compilation
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


srielau commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574101454


##
common/utils/src/main/resources/error/error-conditions.json:
##
@@ -3798,6 +3798,21 @@
 ],
 "sqlState" : "22003"
   },
+  "SYNTAX_DISCONTINUED" : {
+"message" : [
+  "Support of the clause or keyword:  has been discontinued in 
this context."
+],
+"subClass" : {
+  "BANG_EQUALS_NOT" : {
+"message" : [
+  "The '!' keyword is supported as a prefix operator in a logical 
operation only.",

Review Comment:
   Rephrased



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-04-21 Thread via GitHub


erenavsarogullari commented on PR #45234:
URL: https://github.com/apache/spark/pull/45234#issuecomment-2068474192

   Thanks @cloud-fan and @ulysses-you for the reviews and approval.
   I have just rebased to get green build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574127899


##
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   opened https://github.com/apache/spark/pull/46154



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeResources to avoid memory leak [spark]

2024-04-21 Thread via GitHub


JacobZheng0927 commented on PR #46131:
URL: https://github.com/apache/spark/pull/46131#issuecomment-2068538332

   > Thank you for making a PR, @JacobZheng0927 .
   > 
   > However, your PR fails to compile. Please make GitHub Action CI green.
   > 
   > > [error] (core / Compile / compileIncremental) Compilation failed
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574094232


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala:
##
@@ -1626,6 +1625,7 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val testData = TestHive.getHiveFile("data/files/sample.json").toURI
+val hiveVersion = "2.3.9" // TODO remove after Hive 2.3.10 jar available 
in maven central

Review Comment:
   Ditto.  Do you know why `DEFAULT_ARTIFACT_REPOSITORY` change isn't enough 
here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574094424


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -365,6 +367,8 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 val cols = 
Option(ctx.identifierList()).map(visitIdentifierList).getOrElse(Nil)
 val partitionKeys = 
Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty)
 
+blockBang(ctx.errorCapturingNot())

Review Comment:
   ah I see



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic [spark]

2024-04-21 Thread via GitHub


HyukjinKwon opened a new pull request, #46155:
URL: https://github.com/apache/spark/pull/46155

   ### What changes were proposed in this pull request?
   
   Same as https://github.com/apache/spark/pull/46129 but for `Column` class.
   
   ### Why are the changes needed?
   
   Same as https://github.com/apache/spark/pull/46129
   
   ### Does this PR introduce _any_ user-facing change?
   
   Same as https://github.com/apache/spark/pull/46129
   
   ### How was this patch tested?
   
   Manually tested, and CI should verify them.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574093194


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##
@@ -3748,7 +3748,7 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val sc = spark.sparkContext
-val hiveVersion = "2.3.9"
+val hiveVersion = "2.3.9" // TODO update to 2.3.10 after jar available in 
maven central

Review Comment:
   I'm not sure if this TODO is really required when we have a staging 
repository, @pan3793 .
   
   To be sure, we need to pass all the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47633][SQL] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #45763:
URL: https://github.com/apache/spark/pull/45763#discussion_r1574118395


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##
@@ -2056,6 +2056,8 @@ case class LateralJoin(
 joinType: JoinType,
 condition: Option[Expression]) extends UnaryNode {
 
+  override lazy val allAttributes: AttributeSeq = children.flatMap(_.output) 
++ right.plan.output

Review Comment:
   It looks clearer to do `left.output ++ right.plan.output`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47932][SQL][TEST] Avoid using legacy commons-lang [spark]

2024-04-21 Thread via GitHub


pan3793 opened a new pull request, #46154:
URL: https://github.com/apache/spark/pull/46154

   
   
   ### What changes were proposed in this pull request?
   
   Remove usage of `commons-lang` and in favor `commons-lang3`
   
   ### Why are the changes needed?
   
   Migrate `commons-lang` to `commons-lang3`
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Pass GA.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46126:
URL: https://github.com/apache/spark/pull/46126#discussion_r1574083057


##
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##
@@ -367,6 +367,20 @@ abstract class AvroSuite
 "member_myrecord2: struct>",
   Seq())
 
+// SPARK-47904: Test that field name case is preserved.

Review Comment:
   Please create a new test case with a proper test prefix because this test 
method, `SPARK-4: ...` is already too long.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-21 Thread via GitHub


harshmotw-db commented on code in PR #46141:
URL: https://github.com/apache/spark/pull/46141#discussion_r1574119555


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##
@@ -75,6 +75,36 @@ case class ParseJson(child: Expression)
 copy(child = newChild)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr) - Parse a JSON string as an Variant value. Returns 
null when the string is not valid JSON value.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('{"a":1,"b":0.8}');
+   {"a":1,"b":0.8}
+  """,
+  since = "4.0.0",
+  group = "variant_funcs"
+)
+// scalastyle:on line.size.limit
+case class TryParseJson(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+  def this(child: Expression) = this(child, TryEval(ParseJson(child)))

Review Comment:
   I understand. I have reimplemented it and made it similar to the 
implementation of the `try_variant_get` expression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r157408


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -365,6 +367,8 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
 val cols = 
Option(ctx.identifierList()).map(visitIdentifierList).getOrElse(Nil)
 val partitionKeys = 
Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty)
 
+blockBang(ctx.errorCapturingNot())

Review Comment:
   does it work if we override `def visitErrorCapturingNot` and put the check 
there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47907] Put bang under a config [spark]

2024-04-21 Thread via GitHub


srielau commented on code in PR #46138:
URL: https://github.com/apache/spark/pull/46138#discussion_r1574088137


##
sql/core/src/test/resources/sql-tests/analyzer-results/predicate-functions.sql.out:
##
@@ -450,3 +450,18 @@ Project [NOT between(to_timestamp(2022-12-26 00:00:01, 
None, TimestampType, Some
 select rand(123) not between 0.1 AND 0.2
 -- !query analysis
 [Analyzer test output redacted due to nondeterminism]
+
+
+-- !query
+set spark.sql.legacy.bangEqualsNot=true
+select 1 ! between 0 and 2
+-- !query analysis
+java.lang.IllegalArgumentException
+spark.sql.legacy.bangEqualsNot should be boolean, but was true

Review Comment:
   Oh, missing semicolon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6 [spark]

2024-04-21 Thread via GitHub


panbingkun commented on PR #46152:
URL: https://github.com/apache/spark/pull/46152#issuecomment-2068523328

   The benchmark(org.apache.spark.MapStatusesConvertBenchmark) results as 
follows:
   JDK 17:
   JDK 21:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47730][K8S] Support APP_ID and EXECUTOR_ID placeholders in labels [spark]

2024-04-21 Thread via GitHub


jshmchenxi opened a new pull request, #46149:
URL: https://github.com/apache/spark/pull/46149

   Currently, only the pod annotations supports `APP_ID` and `EXECUTOR_ID` 
placeholders. This commit aims to add the same function to pod labels.
   
   The use case is to support using customized labels for availability zone 
based topology pod affinity. We want to use the Spark application ID as the 
customized label value, to allow Spark executor pods to run in the same 
availability zone as Spark driver pod.
   
   Although we can use the Spark internal label `spark-app-selector` directly, 
this is not a good practice when using it along with YuniKorn Gang Scheduling. 
When Gang Scheduling is enabled, the YuniKorn placeholder pods should use the 
same affinity as real Spark pods. In this way, we have to add the internal 
`spark-app-selector` label to the placeholder pods. This is not good because 
the placeholder pods could be recognized as Spark pods in the monitoring system.
   
   Thus we propose supporting the `APP_ID` and `EXECUTOR_ID` placeholders in 
Spark pod labels as well for flexibility.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46632][SQL] EquivalentExpressions addExprTree should allow all type of expressions [spark]

2024-04-21 Thread via GitHub


planga82 closed pull request #45894: [SPARK-46632][SQL] EquivalentExpressions 
addExprTree should allow all type of expressions
URL: https://github.com/apache/spark/pull/45894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46632][SQL] EquivalentExpressions addExprTree should allow all type of expressions [spark]

2024-04-21 Thread via GitHub


planga82 commented on PR #45894:
URL: https://github.com/apache/spark/pull/45894#issuecomment-2067982861

   Thank you @zml1206 . I saw that you have proposed other solution. I'm going 
to close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


GideonPotok commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573733891


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule {
 
 case otherExpr @ (
   _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: 
Greatest | _: Least |
-  _: Coalesce | _: BinaryExpression | _: ConcatWs) =>
+  _: Coalesce | _: BinaryExpression | _: ConcatWs | _: StringLPad | _: 
StringRPad) =>

Review Comment:
   I see. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


GideonPotok commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573741040


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule {
 
 case otherExpr @ (
   _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: 
Greatest | _: Least |
-  _: Coalesce | _: BinaryExpression | _: ConcatWs) =>
+  _: Coalesce | _: BinaryExpression | _: ConcatWs | _: StringLPad | _: 
StringRPad) =>

Review Comment:
   @uros-db done. please re-review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47730][K8S] Support APP_ID and EXECUTOR_ID placeholders in labels [spark]

2024-04-21 Thread via GitHub


jshmchenxi commented on PR #46149:
URL: https://github.com/apache/spark/pull/46149#issuecomment-2068041487

   cc @dongjoon-hyun
   This is a similar feature as #35704 and #42600. Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


uros-db commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573839675


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -52,6 +52,12 @@ object CollationTypeCasts extends TypeCoercionRule {
   overlay.withNewChildren(collateToSingleType(Seq(overlay.input, 
overlay.replace))
 ++ Seq(overlay.pos, overlay.len))
 
+case stringPadExpr@(_: StringLPad | _: StringRPad) =>
+  val newChildren = collateToSingleType(
+Seq(stringPadExpr.children.head, stringPadExpr.children(2)))
+  stringPadExpr.withNewChildren(
+Seq(newChildren.head, stringPadExpr.children(1), newChildren(1)))

Review Comment:
   ```suggestion
 val Seq(str, pad) = collateToSingleType(Seq(stringLPad.str, 
stringLPad.pad))
 stringLPad.withNewChildren(Seq(str, stringLPad.len, pad))
   ```
   
   Spark expressions have parameter names, so we might as well use them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


uros-db commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573842165


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -52,6 +52,12 @@ object CollationTypeCasts extends TypeCoercionRule {
   overlay.withNewChildren(collateToSingleType(Seq(overlay.input, 
overlay.replace))
 ++ Seq(overlay.pos, overlay.len))
 
+case stringPadExpr@(_: StringLPad | _: StringRPad) =>
+  val newChildren = collateToSingleType(
+Seq(stringPadExpr.children.head, stringPadExpr.children(2)))
+  stringPadExpr.withNewChildren(
+Seq(newChildren.head, stringPadExpr.children(1), newChildren(1)))

Review Comment:
   ```suggestion
 val Seq(str, len, pad) = stringPadExpr.children
 val Seq(newStr, newPad) = collateToSingleType(Seq(str, pad))
 stringPadExpr.withNewChildren(Seq(newStr, len, newPad))
   ```
   
   note: the previous approach will require having 2 separate cases
   ```
   case stringLPad: StringLPad =>
   case stringRPad: StringRPad =>
   ```
   
   but if you do want to preserve the single case, try something like the 
second approach



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


GideonPotok commented on PR #46041:
URL: https://github.com/apache/spark/pull/46041#issuecomment-2068126309

   > small changes in casting
   
   @uros-db done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" [spark]

2024-04-21 Thread via GitHub


pan3793 opened a new pull request, #46150:
URL: https://github.com/apache/spark/pull/46150

   
   
   ### What changes were proposed in this pull request?
   
   `SQLQuerySuite`/"SPARK-33084: Add jar support Ivy URI in SQL" uses Hive deps 
to test `ADD JAR` which pulls tons of transitive deps, this PR replaces it with 
light jars but covers all semantics to speed up the UT.
   
   ### Why are the changes needed?
   
   Speed up the test.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Run UT locally.
   
   Before
   ```
   [info] - SPARK-33084: Add jar support Ivy URI in SQL (16 minutes, 55 seconds)
   ```
   After
   ```
   [info] - SPARK-33084: Add jar support Ivy URI in SQL (17 seconds, 783 
milliseconds)
   ```
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


uros-db commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573905109


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -52,6 +52,14 @@ object CollationTypeCasts extends TypeCoercionRule {
   overlay.withNewChildren(collateToSingleType(Seq(overlay.input, 
overlay.replace))
 ++ Seq(overlay.pos, overlay.len))
 
+case stringRPad: StringRPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringRPad.str, 
stringRPad.pad))
+  stringRPad.withNewChildren(Seq(str, stringRPad.len, pad))
+
+case stringLPad: StringLPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringLPad.str, 
stringLPad.pad))
+  stringLPad.withNewChildren(Seq(str, stringLPad.len, pad))

Review Comment:
   it would be good if we could combine these, like you did before
   
   try if something like this works:
   ```
 val Seq(str, len, pad) = stringPadExpr.children
 val Seq(newStr, newPad) = collateToSingleType(Seq(str, pad))
 stringPadExpr.withNewChildren(Seq(newStr, len, newPad))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated [spark]

2024-04-21 Thread via GitHub


parthshyara commented on code in PR #39011:
URL: https://github.com/apache/spark/pull/39011#discussion_r1573887387


##
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:
##
@@ -1046,17 +1048,45 @@ private[spark] class TaskSetManager(
 
   /** Called by TaskScheduler when an executor is lost so we can re-enqueue 
our tasks */
   override def executorLost(execId: String, host: String, reason: 
ExecutorLossReason): Unit = {
-// Re-enqueue any tasks that ran on the failed executor if this is a 
shuffle map stage,
-// and we are not using an external shuffle server which could serve the 
shuffle outputs.
-// The reason is the next stage wouldn't be able to fetch the data from 
this dead executor
-// so we would need to rerun these tasks on other executors.
-if (isShuffleMapTasks && !env.blockManager.externalShuffleServiceEnabled 
&& !isZombie) {
+// Re-enqueue any tasks with potential shuffle data loss that ran on the 
failed executor
+// if this is a shuffle map stage, and we are not using an external 
shuffle server which
+// could serve the shuffle outputs or the executor lost is caused by 
decommission (which
+// can destroy the whole host). The reason is the next stage wouldn't be 
able to fetch the
+// data from this dead executor so we would need to rerun these tasks on 
other executors.
+val maybeShuffleMapOutputLoss = isShuffleMapTasks &&
+  (reason.isInstanceOf[ExecutorDecommission] || 
!env.blockManager.externalShuffleServiceEnabled)

Review Comment:
   @Ngone51 @mridulm Is the above issue being tracked elsewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #46150:
URL: https://github.com/apache/spark/pull/46150#discussion_r1573890525


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##
@@ -3748,22 +3748,21 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 
   test("SPARK-33084: Add jar support Ivy URI in SQL") {
 val sc = spark.sparkContext
-val hiveVersion = "2.3.9"
 // transitive=false, only download specified jar
-sql(s"ADD JAR 
ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion?transitive=false")
-assert(sc.listJars()
-  
.exists(_.contains(s"org.apache.hive.hcatalog_hive-hcatalog-core-$hiveVersion.jar")))
+sql(s"ADD JAR 
ivy://org.springframework:spring-core:6.1.6?transitive=false")
+
assert(sc.listJars().exists(_.contains("org.springframework_spring-core-6.1.6.jar")))
+
assert(!sc.listJars().exists(_.contains("org.springframework_spring-jcl-6.1.6.jar")))
 
 // default transitive=true, test download ivy URL jar return multiple jars
-sql("ADD JAR ivy://org.scala-js:scalajs-test-interface_2.12:1.2.0")
-assert(sc.listJars().exists(_.contains("scalajs-library_2.12")))
-assert(sc.listJars().exists(_.contains("scalajs-test-interface_2.12")))
-
-sql(s"ADD JAR ivy://org.apache.hive:hive-contrib:$hiveVersion" +
-  "?exclude=org.pentaho:pentaho-aggdesigner-algorithm=true")
-
assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-contrib-$hiveVersion.jar")))
-
assert(sc.listJars().exists(_.contains(s"org.apache.hive_hive-exec-$hiveVersion.jar")))
-
assert(!sc.listJars().exists(_.contains("org.pentaho.pentaho_aggdesigner-algorithm")))

Review Comment:
   this assertion is invalid. should be 
   ```patch
   - org.pentaho.pentaho_aggdesigner-algorithm
   + org.pentaho_pentaho-aggdesigner-algorithm
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


uros-db commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573905109


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -52,6 +52,14 @@ object CollationTypeCasts extends TypeCoercionRule {
   overlay.withNewChildren(collateToSingleType(Seq(overlay.input, 
overlay.replace))
 ++ Seq(overlay.pos, overlay.len))
 
+case stringRPad: StringRPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringRPad.str, 
stringRPad.pad))
+  stringRPad.withNewChildren(Seq(str, stringRPad.len, pad))
+
+case stringLPad: StringLPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringLPad.str, 
stringLPad.pad))
+  stringLPad.withNewChildren(Seq(str, stringLPad.len, pad))

Review Comment:
   it would be good if we could combine these, like you did before
   
   try if something like this works:
   ```
   case stringPadExpr @ (_: StringLPad | _: StringRPad) =>
 val Seq(str, len, pad) = stringPadExpr.children
 val Seq(newStr, newPad) = collateToSingleType(Seq(str, pad))
 stringPadExpr.withNewChildren(Seq(newStr, len, newPad))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][DOCS] Change `SPARK_ANSI_SQL_MODE`in PlanStabilitySuite documentation [spark]

2024-04-21 Thread via GitHub


HyukjinKwon commented on PR #46148:
URL: https://github.com/apache/spark/pull/46148#issuecomment-2068330509

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][DOCS] Change `SPARK_ANSI_SQL_MODE`in PlanStabilitySuite documentation [spark]

2024-04-21 Thread via GitHub


HyukjinKwon closed pull request #46148: [MINOR][DOCS] Change 
`SPARK_ANSI_SQL_MODE`in PlanStabilitySuite documentation
URL: https://github.com/apache/spark/pull/46148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-21 Thread via GitHub


zhengruifeng closed pull request #46045: [SPARK-47845][SQL][PYTHON][CONNECT] 
Support Column type in split function for scala and python
URL: https://github.com/apache/spark/pull/46045


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-21 Thread via GitHub


zhengruifeng commented on PR #46045:
URL: https://github.com/apache/spark/pull/46045#issuecomment-2068333943

   thank you @CTCC1 
   
   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-21 Thread via GitHub


chenhao-db commented on code in PR #46123:
URL: https://github.com/apache/spark/pull/46123#discussion_r1574033907


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -1315,6 +1315,35 @@ def test_parse_json(self):
 self.assertEqual("""{"a":1}""", actual["var"])
 self.assertEqual("""{"b":[{"c":"str2"}]}""", actual["var_lit"])
 
+def test_variant_expressions(self):
+df = self.spark.createDataFrame([Row(json="""{ "a" : 1 }"""), 
Row(json="""{ "b" : 2 }""")])
+v = F.parse_json(df.json)
+
+def check(resultDf, expected):
+self.assertEqual([r[0] for r in resultDf.collect()], expected)
+
+check(df.select(F.is_variant_null(v)), [False, False])
+check(df.select(F.schema_of_variant(v)), ["STRUCT", 
"STRUCT"])
+check(df.select(F.schema_of_variant_agg(v)), ["STRUCT"])
+
+check(df.select(F.variant_get(v, "$.a", "int")), [1, None])
+check(df.select(F.variant_get(v, "$.b", "int")), [None, 2])
+check(df.select(F.variant_get(v, "$.a", "double")), [1.0, None])
+
+with self.assertRaises(SparkRuntimeException) as ex:
+df.select(F.variant_get(v, "$.a", "binary")).collect()

Review Comment:
   > only `variant_get` cast an exception, right?
   
   Correct. We don't need to include more test cases in this PR because they 
have already been extensively tested in 
[VariantExpressionSuite](https://github.com/apache/spark/blob/9f34b8eca2f3578367352ec28532e29fe3a91864/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/variant/VariantExpressionSuite.scala#L251).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeResources to avoid memory leak [spark]

2024-04-21 Thread via GitHub


JacobZheng0927 commented on PR #46131:
URL: https://github.com/apache/spark/pull/46131#issuecomment-2068359129

   cc @dongjoon-hyun
   This is a similar to https://github.com/apache/spark/pull/35613. Please take 
a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6 [spark]

2024-04-21 Thread via GitHub


panbingkun opened a new pull request, #46152:
URL: https://github.com/apache/spark/pull/46152

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46149:
URL: https://github.com/apache/spark/pull/46149#discussion_r1574055887


##
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala:
##
@@ -35,7 +35,9 @@ import org.apache.spark.util.Utils
 
 class BasicDriverFeatureStepSuite extends SparkFunSuite {
 
-  private val CUSTOM_DRIVER_LABELS = Map("labelkey" -> "labelvalue")
+  private val CUSTOM_DRIVER_LABELS = Map(
+"labelkey" -> "labelvalue",
+"yunikorn.apache.org/app-id" -> "{{APPID}}")

Review Comment:
   Although I understand the intention, let's isolate `yunikorn` related 
feature inside `YuniKornSuite.scala`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574056705


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar

Review Comment:
   Is this a cascading dependency brought by the new version of Guava?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47914][SQL] Do not display the splits parameter in Rang [spark]

2024-04-21 Thread via GitHub


guixiaowen commented on PR #46136:
URL: https://github.com/apache/spark/pull/46136#issuecomment-2068383150

   > Mind checking the test failures?
   
   ok. I will check it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46149:
URL: https://github.com/apache/spark/pull/46149#discussion_r1574056485


##
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala:
##
@@ -35,7 +35,9 @@ import org.apache.spark.util.Utils
 
 class BasicDriverFeatureStepSuite extends SparkFunSuite {
 
-  private val CUSTOM_DRIVER_LABELS = Map("labelkey" -> "labelvalue")
+  private val CUSTOM_DRIVER_LABELS = Map(
+"labelkey" -> "labelvalue",
+"yunikorn.apache.org/app-id" -> "{{APPID}}")

Review Comment:
   Previously, we mixed that and it turns that it was not good in a long term 
perspective.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46149:
URL: https://github.com/apache/spark/pull/46149#discussion_r1574056783


##
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala:
##
@@ -102,13 +102,15 @@ private[spark] trait BasicTestsSuite { k8sSuite: 
KubernetesSuite =>
 sparkAppConf
   .set("spark.kubernetes.driver.label.label1", "label1-value")
   .set("spark.kubernetes.driver.label.label2", "label2-value")
+  .set("spark.kubernetes.driver.label.yunikorn.apache.org/app-id", 
"{{APP_ID}}")

Review Comment:
   Could you try to add this test coverage in `YuniKornSuite.scala`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on code in PR #46149:
URL: https://github.com/apache/spark/pull/46149#discussion_r1574057309


##
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala:
##
@@ -589,7 +589,8 @@ class KubernetesSuite extends SparkFunSuite
 assert(pod.getMetadata.getLabels.get("label2") === "label2-value")
 assert(pod.getMetadata.getAnnotations.get("annotation1") === 
"annotation1-value")
 assert(pod.getMetadata.getAnnotations.get("annotation2") === 
"annotation2-value")
-val appId = 
pod.getMetadata.getAnnotations.get("yunikorn.apache.org/app-id")
+val appIdLabel = 
pod.getMetadata.getLabels.get("yunikorn.apache.org/app-id")
+val appIdAnnotation = 
pod.getMetadata.getAnnotations.get("yunikorn.apache.org/app-id")

Review Comment:
   It would be better if we can move all yunikorn stuff into `YuniKornSuite` .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]

2024-04-21 Thread via GitHub


dongjoon-hyun commented on PR #46126:
URL: https://github.com/apache/spark/pull/46126#issuecomment-2068387729

   Sorry for being late. I'll take a look Today, @sadikovi .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574059438


##
project/SparkBuild.scala:
##
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   Will similar configurations still be needed after this pr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on code in PR #46141:
URL: https://github.com/apache/spark/pull/46141#discussion_r1574060791


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##
@@ -75,6 +75,36 @@ case class ParseJson(child: Expression)
 copy(child = newChild)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr) - Parse a JSON string as an Variant value. Returns 
null when the string is not valid JSON value.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('{"a":1,"b":0.8}');
+   {"a":1,"b":0.8}
+  """,
+  since = "4.0.0",
+  group = "variant_funcs"
+)
+// scalastyle:on line.size.limit
+case class TryParseJson(expr: Expression, replacement: Expression)
+  extends RuntimeReplaceable with InheritAnalysisRules {
+  def this(child: Expression) = this(child, TryEval(ParseJson(child)))

Review Comment:
   I'm not a big fan of the `TryEval` approach, as it's more like a temporary 
workaround. See this TODO: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala#L83
   
   The issue is, `TryEval` swallows all the exceptions, including the ones from 
inputs. e.g. `try_parse_json(a function that produces JSON string but fails)` 
should still fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574062976


##
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   I think this can be submitted as an independent pr. Also, do you know why 
check-style didn't detect this issue?
   
   
https://github.com/apache/spark/blob/9f34b8eca2f3578367352ec28532e29fe3a91864/dev/checkstyle.xml#L199-L202



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574065984


##
project/SparkBuild.scala:
##
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   This PR is mainly of PoC, so I just make minimal changes to verify it works. 
I think we can unify the Guava version in all places after upgrading Hive 
2.3.10, in an independent PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


LuciferYang commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574066564


##
project/SparkBuild.scala:
##
@@ -952,7 +955,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"

Review Comment:
   fine to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r1574066575


##
sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java:
##
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.sql.hive.test;
 
-import org.apache.commons.lang.builder.HashCodeBuilder;
+import org.apache.commons.lang3.builder.HashCodeBuilder;

Review Comment:
   not sure, why checkstyle does not complain about that. let me fix it 
independently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DO-NOT-MERGE] Test Apache Hive 2.3.10 RC0 [spark]

2024-04-21 Thread via GitHub


pan3793 commented on code in PR #45372:
URL: https://github.com/apache/spark/pull/45372#discussion_r157404


##
dev/deps/spark-deps-hadoop-3-hive-2.3:
##
@@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47912][SQL] Infer serde class from format classes [spark]

2024-04-21 Thread via GitHub


wForget commented on PR #46132:
URL: https://github.com/apache/spark/pull/46132#issuecomment-2068404048

   @cloud-fan could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47902][SQL]Making Compute Current Time* expressions foldable [spark]

2024-04-21 Thread via GitHub


cloud-fan commented on PR #46120:
URL: https://github.com/apache/spark/pull/46120#issuecomment-2068419611

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47902][SQL]Making Compute Current Time* expressions foldable [spark]

2024-04-21 Thread via GitHub


cloud-fan closed pull request #46120: [SPARK-47902][SQL]Making Compute Current 
Time* expressions foldable
URL: https://github.com/apache/spark/pull/46120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


GideonPotok commented on code in PR #46041:
URL: https://github.com/apache/spark/pull/46041#discussion_r1573929256


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -52,6 +52,14 @@ object CollationTypeCasts extends TypeCoercionRule {
   overlay.withNewChildren(collateToSingleType(Seq(overlay.input, 
overlay.replace))
 ++ Seq(overlay.pos, overlay.len))
 
+case stringRPad: StringRPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringRPad.str, 
stringRPad.pad))
+  stringRPad.withNewChildren(Seq(str, stringRPad.len, pad))
+
+case stringLPad: StringLPad =>
+  val Seq(str, pad) = collateToSingleType(Seq(stringLPad.str, 
stringLPad.pad))
+  stringLPad.withNewChildren(Seq(str, stringLPad.len, pad))

Review Comment:
   @uros-db  done. please re-review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


09306677806 commented on PR #46041:
URL: https://github.com/apache/spark/pull/46041#issuecomment-2068214445

   Lotfan barsihay lam ra shoro konid
   
   ShahrzadMahro
   
   در تاریخ دوشنبه ۱۵ آوریل ۲۰۲۴،‏ ۱۳:۴۱ Gideon Potok ***@***.***>
   نوشت:
   
   > @uros-db  please review this one, too.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated [spark]

2024-04-21 Thread via GitHub


09306677806 commented on PR #39011:
URL: https://github.com/apache/spark/pull/39011#issuecomment-2068221273

   #(&£€&445₩)*khakqganHynosql sharsing
   
   ShahrzadMahro
   
   در تاریخ یکشنبه ۲۱ آوریل ۲۰۲۴،‏ ۲۲:۳۶ Parth Shyara ***@***.***>
   نوشت:
   
   > ***@***. commented on this pull request.
   > --
   >
   > In core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
   > :
   >
   > > +val maybeShuffleMapOutputLoss = isShuffleMapTasks &&
   > +  (reason.isInstanceOf[ExecutorDecommission] || 
!env.blockManager.externalShuffleServiceEnabled)
   >
   > @Ngone51  @mridulm
   >  Is the above issue being tracked elsewhere?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]

2024-04-21 Thread via GitHub


zeotuan opened a new pull request, #46151:
URL: https://github.com/apache/spark/pull/46151

   
   The PR aims to migrate `logInfo` in module MLLib with variables to 
structured logging framework.
   
   ### Why are the changes needed?
   
   To enhance Apache Spark's logging system by implementing structured logging.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   - Pass GA.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-21 Thread via GitHub


HyukjinKwon commented on PR #46129:
URL: https://github.com/apache/spark/pull/46129#issuecomment-2068279142

   Merged to master.
   
   I will followup if there are more comments to address.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47909][PYTHON][CONNECT] Parent DataFrame class for Spark Connect and Spark Classic [spark]

2024-04-21 Thread via GitHub


HyukjinKwon closed pull request #46129: [SPARK-47909][PYTHON][CONNECT] Parent 
DataFrame class for Spark Connect and Spark Classic
URL: https://github.com/apache/spark/pull/46129


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46620][PS][CONNECT] Implement `Frame.asfreq` [spark]

2024-04-21 Thread via GitHub


Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [spark]

2024-04-21 Thread via GitHub


GideonPotok commented on PR #46040:
URL: https://github.com/apache/spark/pull/46040#issuecomment-2068213773

   @uros-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47412][SQL] Add Collation Support for LPad/RPad. [spark]

2024-04-21 Thread via GitHub


09306677806 commented on PR #46041:
URL: https://github.com/apache/spark/pull/46041#issuecomment-2068216025

   U;Sh(7700077)SQL HARMAN 04#G)
   
   ShahrzadMahro
   
   در تاریخ دوشنبه ۲۲ آوریل ۲۰۲۴،‏ ۰۱:۲۴ Gideon Potok ***@***.***>
   نوشت:
   
   > ***@***. commented on this pull request.
   > --
   >
   > In
   > 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
   > :
   >
   > > +case stringRPad: StringRPad =>
   > +  val Seq(str, pad) = collateToSingleType(Seq(stringRPad.str, 
stringRPad.pad))
   > +  stringRPad.withNewChildren(Seq(str, stringRPad.len, pad))
   > +
   > +case stringLPad: StringLPad =>
   > +  val Seq(str, pad) = collateToSingleType(Seq(stringLPad.str, 
stringLPad.pad))
   > +  stringLPad.withNewChildren(Seq(str, stringLPad.len, pad))
   >
   > @uros-db  done. please re-review.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47929]Setup Static Analysis for Operator [spark-kubernetes-operator]

2024-04-21 Thread via GitHub


jiangzho opened a new pull request, #6:
URL: https://github.com/apache/spark-kubernetes-operator/pull/6

   
   
   ### What changes were proposed in this pull request?
   
   
   This is a breakdown PR from #2  - setting up common build tasks and required 
plugins.
   
   Checkstyle xml is taken from Apache Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-21 Thread via GitHub


chenhao-db commented on PR #46123:
URL: https://github.com/apache/spark/pull/46123#issuecomment-2068317353

   @zhengruifeng @LuciferYang Could you help take another look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47890][CONNECT][PYTHON] Add variant functions to Scala and Python. [spark]

2024-04-21 Thread via GitHub


itholic commented on code in PR #46123:
URL: https://github.com/apache/spark/pull/46123#discussion_r1574020712


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -1315,6 +1315,35 @@ def test_parse_json(self):
 self.assertEqual("""{"a":1}""", actual["var"])
 self.assertEqual("""{"b":[{"c":"str2"}]}""", actual["var_lit"])
 
+def test_variant_expressions(self):
+df = self.spark.createDataFrame([Row(json="""{ "a" : 1 }"""), 
Row(json="""{ "b" : 2 }""")])
+v = F.parse_json(df.json)
+
+def check(resultDf, expected):
+self.assertEqual([r[0] for r in resultDf.collect()], expected)
+
+check(df.select(F.is_variant_null(v)), [False, False])
+check(df.select(F.schema_of_variant(v)), ["STRUCT", 
"STRUCT"])
+check(df.select(F.schema_of_variant_agg(v)), ["STRUCT"])
+
+check(df.select(F.variant_get(v, "$.a", "int")), [1, None])
+check(df.select(F.variant_get(v, "$.b", "int")), [None, 2])
+check(df.select(F.variant_get(v, "$.a", "double")), [1.0, None])
+
+with self.assertRaises(SparkRuntimeException) as ex:
+df.select(F.variant_get(v, "$.a", "binary")).collect()

Review Comment:
   qq: So, only `variant_get` cast an exception right?
   
   If we have other negative cases let's add more failure test, otherwise looks 
good as it is.



##
python/pyspark/sql/tests/test_functions.py:
##
@@ -1315,6 +1315,35 @@ def test_parse_json(self):
 self.assertEqual("""{"a":1}""", actual["var"])
 self.assertEqual("""{"b":[{"c":"str2"}]}""", actual["var_lit"])
 
+def test_variant_expressions(self):
+df = self.spark.createDataFrame([Row(json="""{ "a" : 1 }"""), 
Row(json="""{ "b" : 2 }""")])
+v = F.parse_json(df.json)
+
+def check(resultDf, expected):
+self.assertEqual([r[0] for r in resultDf.collect()], expected)
+
+check(df.select(F.is_variant_null(v)), [False, False])
+check(df.select(F.schema_of_variant(v)), ["STRUCT", 
"STRUCT"])
+check(df.select(F.schema_of_variant_agg(v)), ["STRUCT"])
+
+check(df.select(F.variant_get(v, "$.a", "int")), [1, None])
+check(df.select(F.variant_get(v, "$.b", "int")), [None, 2])
+check(df.select(F.variant_get(v, "$.a", "double")), [1.0, None])
+
+with self.assertRaises(SparkRuntimeException) as ex:
+df.select(F.variant_get(v, "$.a", "binary")).collect()

Review Comment:
   qq: So, only `variant_get` cast an exception, right?
   
   If we have other negative cases let's add more failure test, otherwise looks 
good as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org