date:20221106

[GitHub] [spark] EnricoMi commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-06 Thread GitBox



EnricoMi commented on code in PR #38223:
URL: https://github.com/apache/spark/pull/38223#discussion_r1015075283


##
python/pyspark/worker.py:
##
@@ -188,22 +241,7 @@ def wrapped(key_series, value_series):
 elif len(argspec.args) == 2:
 key = tuple(s[0] for s in key_series)
 result = f(key, pd.concat(value_series, axis=1))
-
-if not isinstance(result, pd.DataFrame):
-raise TypeError(
-"Return type of the user-defined function should be "
-"pandas.DataFrame, but is {}".format(type(result))
-)
-# the number of columns of result have to match the return type
-# but it is fine for result to have no columns at all if it is empty
-if not (
-len(result.columns) == len(return_type) or len(result.columns) == 
0 and result.empty
-):
-raise RuntimeError(
-"Number of columns of the returned pandas.DataFrame "
-"doesn't match specified schema. "
-"Expected: {} Actual: {}".format(len(return_type), 
len(result.columns))
-)
+verify_pandas_result(result, return_type, 
assign_cols_by_name(runner_conf))

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] EnricoMi commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-06 Thread GitBox



EnricoMi commented on code in PR #38223:
URL: https://github.com/apache/spark/pull/38223#discussion_r1015075103


##
python/pyspark/sql/tests/pandas/test_pandas_cogrouped_map.py:
##
@@ -165,100 +148,191 @@ def merge_pandas(lft, _):
 )
 
 def test_apply_in_pandas_not_returning_pandas_dataframe(self):
-left = self.data1
-right = self.data2
+self._test_merge_error(
+fn=lambda lft, rgt: lft.size + rgt.size,
+error_class=PythonException,
+error_message_regex="Return type of the user-defined function "
+"should be pandas.DataFrame, but is ",
+)
+
+def test_apply_in_pandas_returning_column_names(self):
+self._test_merge(fn=lambda lft, rgt: pd.merge(lft, rgt, on=["id", 
"k"]))
 
+def test_apply_in_pandas_returning_no_column_names(self):
 def merge_pandas(lft, rgt):
-return lft.size + rgt.size
+res = pd.merge(lft, rgt, on=["id", "k"])
+res.columns = range(res.columns.size)
+return res
 
-with QuietTest(self.sc):
-with self.assertRaisesRegex(
-PythonException,
-"Return type of the user-defined function should be 
pandas.DataFrame, "
-"but is ",
-):
-(
-left.groupby("id")
-.cogroup(right.groupby("id"))
-.applyInPandas(merge_pandas, "id long, k int, v int, v2 
int")
-.collect()
-)
+self._test_merge(fn=merge_pandas)
 
-def test_apply_in_pandas_returning_wrong_number_of_columns(self):
-left = self.data1
-right = self.data2
+def test_apply_in_pandas_returning_column_names_sometimes(self):
+def merge_pandas(lft, rgt):
+res = pd.merge(lft, rgt, on=["id", "k"])
+if 0 in lft["id"] and lft["id"][0] % 2 == 0:
+return res
+res.columns = range(res.columns.size)
+return res
+
+self._test_merge(fn=merge_pandas)
 
+def test_apply_in_pandas_returning_wrong_column_names(self):
 def merge_pandas(lft, rgt):
 if 0 in lft["id"] and lft["id"][0] % 2 == 0:
 lft["add"] = 0
 if 0 in rgt["id"] and rgt["id"][0] % 3 == 0:
 rgt["more"] = 1
 return pd.merge(lft, rgt, on=["id", "k"])
 
-with QuietTest(self.sc):
-with self.assertRaisesRegex(
-PythonException,
-"Number of columns of the returned pandas.DataFrame "
-"doesn't match specified schema. Expected: 4 Actual: 6",
-):
-(
-# merge_pandas returns two columns for even keys while we 
set schema to four
-left.groupby("id")
-.cogroup(right.groupby("id"))
-.applyInPandas(merge_pandas, "id long, k int, v int, v2 
int")
-.collect()
-)
+self._test_merge_error(
+fn=merge_pandas,
+error_class=PythonException,
+error_message_regex="Column names of the returned pandas.DataFrame 
"
+"do not match specified schema.  Unexpected: add, more  Schema: 
id, k, v, v2\n",
+)
 
-def test_apply_in_pandas_returning_empty_dataframe(self):
-left = self.data1
-right = self.data2
+# with very large schema, missing and unexpected is limited to 5
+# and the schema is abbreviated in the error message
+schema = "id long, k long, mean double, " + ", ".join(
+f"column_with_long_column_name_{no} integer" for no in range(35)
+)
+self._test_merge_error(
+fn=lambda lft, rgt: pd.DataFrame(
+[
+(
+lft.id,
+lft.k,
+lft.v.mean(),
+)
++ tuple(lft.v.mean() for _ in range(7))
+],
+columns=["id", "k", "mean"] + [f"extra_column_{no} integer" 
for no in range(7)],
+),
+output_schema=schema,
+error_class=PythonException,
+error_message_regex="Column names of the returned 
pandas\\.DataFrame "
+"do not match specified schema\\.  "
+"Missing \\(first 5 of 35\\): column_with_long_column_name_0,"
+" column_with_long_column_name_1, column_with_long_column_name_10,"
+" column_with_long_column_name_11, column_with_long_column_name_12 
 "
+"Unexpected \\(first 5 of 7\\): extra_column_0 integer, 
extra_column_1 integer,"
+" extra_column_2 integer, extra_column_3 integer, extra_column_4 
integer  "
+"Schema: id, k, mean, column_with_long_column_name_0, 
column_with_long_column_name_1,"
+" column_with_long_column_name_2,

[GitHub] [spark] EnricoMi commented on pull request #38509: [SPARK-41014][PYTHON][DOC] Improve documentation and typing of groupby and cogroup applyInPandas

2022-11-06 Thread GitBox



EnricoMi commented on PR #38509:
URL: https://github.com/apache/spark/pull/38509#issuecomment-1305160983

   @HyukjinKwon @zhengruifeng this are minor improvements for documentation and 
typing of PySpark applyInPandas methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] EnricoMi commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-11-06 Thread GitBox



EnricoMi commented on PR #38312:
URL: https://github.com/apache/spark/pull/38312#issuecomment-1305155414

   @cloud-fan where do we stand with this? Is this a regression? How do we 
proceed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38519: [MINOR][SQL] Remove unused an error class and query error methods

2022-11-06 Thread GitBox



LuciferYang commented on PR #38519:
URL: https://github.com/apache/spark/pull/38519#issuecomment-1305147588

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #38519: [MINOR][SQL] Remove unused an error class and query error methods

2022-11-06 Thread GitBox



MaxGekk closed pull request #38519: [MINOR][SQL] Remove unused an error class 
and query error methods
URL: https://github.com/apache/spark/pull/38519


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #38519: [MINOR][SQL] Remove unused an error class and query error methods

2022-11-06 Thread GitBox



MaxGekk commented on PR #38519:
URL: https://github.com/apache/spark/pull/38519#issuecomment-1305144565

   Merging to master. Thank you, @itholic for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun opened a new pull request, #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-06 Thread GitBox



panbingkun opened a new pull request, #38531:
URL: https://github.com/apache/spark/pull/38531

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary`

2022-11-06 Thread GitBox



zhengruifeng commented on code in PR #38318:
URL: https://github.com/apache/spark/pull/38318#discussion_r1015041619


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -323,6 +323,14 @@ def unionByName(self, other: "DataFrame", 
allowMissingColumns: bool = False) ->
 def where(self, condition: Expression) -> "DataFrame":
 return self.filter(condition)
 
+def summary(self, *statistics: str) -> "DataFrame":
+_statistics: List[str] = list(statistics)

Review Comment:
   different from 
https://github.com/apache/spark/blob/29e45528315c43336ba60ee1717ef5ec0c001c00/python/pyspark/sql/dataframe.py#L2575-L2576
 since i think that preprocessing weird



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary`

2022-11-06 Thread GitBox



zhengruifeng commented on PR #38318:
URL: https://github.com/apache/spark/pull/38318#issuecomment-1305136349

   cc @cloud-fan @HyukjinKwon @amaliujia 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-06 Thread GitBox



cloud-fan commented on PR #38513:
URL: https://github.com/apache/spark/pull/38513#issuecomment-1305100077

   I think canonicalization should not change the data type in the first place. 
Adding cast only hides the bug. What's worse, it doesn't help with the goal of 
canonicalization: match plans/expressions that are semantically equal, due to 
the extra cast.
   
   Can we be stricter on when we can reorder? e.g. add an allowlist and only 
reorder under certain cases, e.g. integral add with ansi off.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38530: [SPARK-41027][SQL] Use `UNEXPECTED_INPUT_TYPE` instead of `MAP_FROM_ENTRIES_WRONG_TYPE`

2022-11-06 Thread GitBox



LuciferYang commented on PR #38530:
URL: https://github.com/apache/spark/pull/38530#issuecomment-1305095279

   Yes, if the exception test lacks the corresponding UT, I will add one
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-06 Thread GitBox



cxzl25 commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1305078019

   Help review. Thanks.
   @leanken-zz  @cloud-fan  @wangyum 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-06 Thread GitBox



ulysses-you commented on code in PR #38513:
URL: https://github.com/apache/spark/pull/38513#discussion_r1015003195


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -481,12 +481,12 @@ case class Add(
 // TODO: do not reorder consecutive `Add`s with different `evalMode`
 val reorderResult =
   orderCommutative({ case Add(l, r, _) => Seq(l, r) }).reduce(Add(_, _, 
evalMode))
-if (resolved && reorderResult.resolved && reorderResult.dataType == 
dataType) {
-  reorderResult
+if (resolved && reorderResult.resolved && reorderResult.dataType != 
dataType) {

Review Comment:
   I'm not worry about new data types, just want to avoid unnecessary logic for 
some unrelated data types like integer, float ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-06 Thread GitBox



gengliangwang commented on code in PR #38513:
URL: https://github.com/apache/spark/pull/38513#discussion_r101535


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -481,12 +481,12 @@ case class Add(
 // TODO: do not reorder consecutive `Add`s with different `evalMode`
 val reorderResult =
   orderCommutative({ case Add(l, r, _) => Seq(l, r) }).reduce(Add(_, _, 
evalMode))
-if (resolved && reorderResult.resolved && reorderResult.dataType == 
dataType) {
-  reorderResult
+if (resolved && reorderResult.resolved && reorderResult.dataType != 
dataType) {

Review Comment:
   @ulysses-you the current code seems fine. If there is a new data type in the 
future, we can still avoid the issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-06 Thread GitBox



gengliangwang commented on code in PR #38513:
URL: https://github.com/apache/spark/pull/38513#discussion_r1014999188


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -481,12 +481,12 @@ case class Add(
 // TODO: do not reorder consecutive `Add`s with different `evalMode`
 val reorderResult =
   orderCommutative({ case Add(l, r, _) => Seq(l, r) }).reduce(Add(_, _, 
evalMode))
-if (resolved && reorderResult.resolved && reorderResult.dataType == 
dataType) {
-  reorderResult
+if (resolved && reorderResult.resolved && reorderResult.dataType != 
dataType) {
+  // SPARK-40903: Append cast for the canonicalization of decimal Add if 
the result data type is
+  // changed. Otherwise, it may cause data checking error within 
ComplexTypeMergingExpression.
+  Cast(reorderResult, dataType)

Review Comment:
   Yes, the cast is better after seconds thought.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-06 Thread GitBox



itholic commented on code in PR #38447:
URL: https://github.com/apache/spark/pull/38447#discussion_r1014998869


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala:
##
@@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends 
QueryErrorsBase {
   }
 
   def unclosedBracketedCommentError(command: String, position: Origin): 
Throwable = {
-new ParseException(Some(command), "Unclosed bracketed comment", position, 
position,
-  Some("_LEGACY_ERROR_TEMP_0055"))
+new ParseException(
+  Some(command),
+  "Found an unclosed bracketed comment. Please, append */ at the end of 
the comment.",
+  position,
+  position,
+  Some("UNCLOSED_BRACKETED_COMMENT"))

Review Comment:
   Cool! Thanks for the suggestion 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #38422: [SPARK-40948][SQL] Introduce new error class: PATH_NOT_FOUND

2022-11-06 Thread GitBox



itholic commented on PR #38422:
URL: https://github.com/apache/spark/pull/38422#issuecomment-1305067447

   Thanks, @MaxGekk ! Just adjusted the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #38530: [SPARK-41027][SQL] Use `UNEXPECTED_INPUT_TYPE` instead of `MAP_FROM_ENTRIES_WRONG_TYPE`

2022-11-06 Thread GitBox



itholic commented on PR #38530:
URL: https://github.com/apache/spark/pull/38530#issuecomment-1305066532

   It would be great if we put some example into the PR description how the 
error message changes with `Before` and `After` example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38530: [SPARK-41027][SQL] Use `UNEXPECTED_INPUT_TYPE ` instead of `MAP_FROM_ENTRIES_WRONG_TYPE`

2022-11-06 Thread GitBox



LuciferYang commented on PR #38530:
URL: https://github.com/apache/spark/pull/38530#issuecomment-1305030163

   cc @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-06 Thread GitBox



LuciferYang commented on code in PR #38507:
URL: https://github.com/apache/spark/pull/38507#discussion_r1014970870


##
core/src/main/resources/error/error-classes.json:
##
@@ -225,14 +230,14 @@
   "The  should all be of type map, but it's ."
 ]
   },
-  "MAP_CONTAINS_KEY_DIFF_TYPES" : {
+  "MAP_FROM_ENTRIES_WRONG_TYPE" : {

Review Comment:
   https://github.com/apache/spark/pull/38530



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-06 Thread GitBox



LuciferYang commented on code in PR #38507:
URL: https://github.com/apache/spark/pull/38507#discussion_r1014970870


##
core/src/main/resources/error/error-classes.json:
##
@@ -225,14 +230,14 @@
   "The  should all be of type map, but it's ."
 ]
   },
-  "MAP_CONTAINS_KEY_DIFF_TYPES" : {
+  "MAP_FROM_ENTRIES_WRONG_TYPE" : {

Review Comment:
   https://github.com/apache/spark/pull/38507



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #38530: [SPARK-41027][SQL] Use `UNEXPECTED_INPUT_TYPE ` instead of `MAP_FROM_ENTRIES_WRONG_TYPE`

2022-11-06 Thread GitBox



LuciferYang opened a new pull request, #38530:
URL: https://github.com/apache/spark/pull/38530

   ### What changes were proposed in this pull request?
   This pr aims to use `UNEXPECTED_INPUT_TYPE ` instead of 
`MAP_FROM_ENTRIES_WRONG_TYPE` and remove definition of 
`MAP_FROM_ENTRIES_WRONG_TYPE`
   
   
   ### Why are the changes needed?
   Re-use `UNEXPECTED_INPUT_TYPE `.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. The PR changes user-facing error messages.
   
   
   ### How was this patch tested?
   Pass GA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-06 Thread GitBox



LuciferYang commented on code in PR #38507:
URL: https://github.com/apache/spark/pull/38507#discussion_r1014968186


##
core/src/main/resources/error/error-classes.json:
##
@@ -225,14 +230,14 @@
   "The  should all be of type map, but it's ."
 ]
   },
-  "MAP_CONTAINS_KEY_DIFF_TYPES" : {
+  "MAP_FROM_ENTRIES_WRONG_TYPE" : {

Review Comment:
   SPARK-41027, but if use `UNEXPECTED_INPUT_TYPE` , the `requiredType` should 
be `Array of pair Struct`
   
   ```
   "requiredType" -> s"${toSQLType(ArrayType)} of pair ${toSQLType(StructType)}"
   ```
   
   is it ok?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jackylee-ch commented on a diff in pull request #38496: [WIP][SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-11-06 Thread GitBox



jackylee-ch commented on code in PR #38496:
URL: https://github.com/apache/spark/pull/38496#discussion_r1014966247


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala:
##
@@ -53,19 +53,41 @@ class PathFilterIgnoreNonData(stagingDir: String) extends 
PathFilter with Serial
 object CommandUtils extends Logging {
 
   /** Change statistics after changing data by commands. */
-  def updateTableStats(sparkSession: SparkSession, table: CatalogTable): Unit 
= {
-val catalog = sparkSession.sessionState.catalog
-if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
+  def updateTableStats(
+  sparkSession: SparkSession,
+  table: CatalogTable,
+  wroteStats: Option[WriteStats] = None): Unit = {
+val sessionState = sparkSession.sessionState
+val catalog = sessionState.catalog
+if (sessionState.conf.autoSizeUpdateEnabled) {
   val newTable = catalog.getTableMetadata(table.identifier)
   val (newSize, newPartitions) = 
CommandUtils.calculateTotalSize(sparkSession, newTable)
-  val isNewStats = newTable.stats.map(newSize != 
_.sizeInBytes).getOrElse(true)
+  val isNewStats = newTable.stats.forall(newSize != _.sizeInBytes)
   if (isNewStats) {
 val newStats = CatalogStatistics(sizeInBytes = newSize)
 catalog.alterTableStats(table.identifier, Some(newStats))
   }
   if (newPartitions.nonEmpty) {
 catalog.alterPartitions(table.identifier, newPartitions)
   }
+} else if (sessionState.conf.autoUpdateBasedOnMetricsEnabled && 
wroteStats.nonEmpty) {
+  val stat = wroteStats.get
+  stat.mode match {
+case SaveMode.Overwrite | SaveMode.ErrorIfExists =>
+  catalog.alterTableStats(table.identifier,
+Some(CatalogStatistics(stat.numBytes, stat.numRows)))

Review Comment:
   Hm, we should consider about partition Statistics here. If we overwrite the 
part of the partitions, it would get wrong table statistcs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-06 Thread GitBox



ulysses-you commented on code in PR #38513:
URL: https://github.com/apache/spark/pull/38513#discussion_r1014959493


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -481,12 +481,12 @@ case class Add(
 // TODO: do not reorder consecutive `Add`s with different `evalMode`
 val reorderResult =
   orderCommutative({ case Add(l, r, _) => Seq(l, r) }).reduce(Add(_, _, 
evalMode))
-if (resolved && reorderResult.resolved && reorderResult.dataType == 
dataType) {
-  reorderResult
+if (resolved && reorderResult.resolved && reorderResult.dataType != 
dataType) {

Review Comment:
   for safe, can we do this for decimal type only ? e.g
   ```scala
   left match {
 case _: DecimalType if resolved && reorderResult.resolved && 
reorderResult.dataType != dataType =>
   Cast(reorderResult, dataType)
 case _ => reorderResult
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-06 Thread GitBox



LuciferYang commented on code in PR #38507:
URL: https://github.com/apache/spark/pull/38507#discussion_r1014954543


##
core/src/main/resources/error/error-classes.json:
##
@@ -225,14 +230,14 @@
   "The  should all be of type map, but it's ."
 ]
   },
-  "MAP_CONTAINS_KEY_DIFF_TYPES" : {
+  "MAP_FROM_ENTRIES_WRONG_TYPE" : {

Review Comment:
   @MaxGekk 
   Are you talking about `MAP_FROM_ENTRIES_WRONG_TYPE`? 
   This is not a new subclass introduced by this pr, but a new one created by 
https://github.com/apache/spark/pull/38197/files. Let me file a new jira to 
replace `MAP_FROM_ENTRIES_WRONG_TYPE` with `UNEXPECTED_INPUT_TYPE`, because it 
is not related to this pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-11-06 Thread GitBox



zhengruifeng commented on PR #38395:
URL: https://github.com/apache/spark/pull/38395#issuecomment-1304998340

   i am going to close this pr and continue work on 
https://github.com/apache/spark/pull/38318


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-11-06 Thread GitBox



zhengruifeng closed pull request #38395: [SPARK-40917][SQL] Add a dedicated 
logical plan for `Summary`
URL: https://github.com/apache/spark/pull/38395


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #37630: [SPARK-40193][SQL] Merge subquery plans with different filters

2022-11-06 Thread GitBox



beliefer commented on PR #37630:
URL: https://github.com/apache/spark/pull/37630#issuecomment-1304985806

   @peter-toth Could you fix these conflicts. I want test this PR. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #36253: [SPARK-38932][SQL] Datasource v2 support report distinct keys

2022-11-06 Thread GitBox



github-actions[bot] commented on PR #36253:
URL: https://github.com/apache/spark/pull/36253#issuecomment-1304942022

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding

2022-11-06 Thread GitBox



github-actions[bot] closed pull request #37065: [SPARK-38699][SQL] Use error 
classes in the execution errors of dictionary encoding
URL: https://github.com/apache/spark/pull/37065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #37317: [SPARK-39894][SQL] Combine the similar binary comparison in boolean expression.

2022-11-06 Thread GitBox



github-actions[bot] closed pull request #37317: [SPARK-39894][SQL] Combine the 
similar binary comparison in boolean expression.
URL: https://github.com/apache/spark/pull/37317


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #37235: [SPARK-39824][PYTHON][PS] Introduce index where and putmask func in pyspark

2022-11-06 Thread GitBox



github-actions[bot] closed pull request #37235: [SPARK-39824][PYTHON][PS] 
Introduce index where and putmask func in pyspark
URL: https://github.com/apache/spark/pull/37235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #37316: [SPARK-39893][SQL] Push limit 1 to the aggregate's child plan if grouping expressions and aggregate expressions are foldable

2022-11-06 Thread GitBox



github-actions[bot] commented on PR #37316:
URL: https://github.com/apache/spark/pull/37316#issuecomment-1304941982

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-06 Thread GitBox



srowen commented on PR #38510:
URL: https://github.com/apache/spark/pull/38510#issuecomment-1304934181

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-06 Thread GitBox



srowen closed pull request #38510: [MINOR][DOC] revisions for spark sql 
performance tuning to improve readability and grammar
URL: https://github.com/apache/spark/pull/38510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-06 Thread GitBox



LuciferYang commented on PR #38507:
URL: https://github.com/apache/spark/pull/38507#issuecomment-1304929641

   will update this one today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



LuciferYang commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304926628

   > Shouldn't guava changes be done in a separate PR?
   
   Currently, I will not be considered upgrade Guava for the following reasons:
   
   1. Although Spark 3.4.0 will no longer release the hadoop-2 distribution.sh, 
but the build and testing process is still going on. We need to keep it and 
will not consider upgrading Guava until it is really removed
   
   2. In my impression, Hive 2.3 still dependency on the Guava 14.0.1, @sunchao 
 has tried to solve this problem before, but it should not be finished and I 
don't know whether there are any plans for follow-up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



LuciferYang commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304926354

   > @LuciferYang feel free to continue with your PR. I marked 
[SPARK-41023](https://issues.apache.org/jira/browse/SPARK-41023) as a duplicate 
of [SPARK-40911](https://issues.apache.org/jira/browse/SPARK-40911).
   
   Thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



LuciferYang commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304922578

   @srowen test passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-06 Thread GitBox



amaliujia commented on code in PR #38488:
URL: https://github.com/apache/spark/pull/38488#discussion_r1013499840


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -211,14 +212,66 @@ def filter(self, condition: Expression) -> "DataFrame":
 plan.Filter(child=self._plan, filter=condition), 
session=self._session
 )
 
-def first(self) -> Optional["pandas.DataFrame"]:
-return self.head(1)
+def first(self) -> Optional[Row]:
+"""Returns the first row as a :class:`Row`.
+
+.. versionadded:: 3.4.0
+
+Returns
+---
+:class:`Row`
+   First row if :class:`DataFrame` is not empty, otherwise ``None``.
+"""
+return self.head()

Review Comment:
   ah actually we cannot. `self.head()` returns `Optional[Row]` but 
`self.head(n)` returns `List[Row]`. `self.head()` is to make sure mypy check 
pass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on pull request #38529: [SPARK-41026][CONNECT] Support Repartition in Connect Proto

2022-11-06 Thread GitBox



amaliujia commented on PR #38529:
URL: https://github.com/apache/spark/pull/38529#issuecomment-1304913709

   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia opened a new pull request, #38529: [SPARK-41026][CONNECT] Support Repartition in Connect Proto

2022-11-06 Thread GitBox



amaliujia opened a new pull request, #38529:
URL: https://github.com/apache/spark/pull/38529

   
   
   ### What changes were proposed in this pull request?
   
   Support `Repartition` in Connect proto, which supports two API: 
`repartition` (shuffle=true) and `coalesce` (shuffle=false).  
   
   ### Why are the changes needed?
   
   Improve API coverage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   NO
   
   ### How was this patch tested?
   
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #38528: [SPARK-41025][SS] Introduce ComparableOffset to support offset range validation

2022-11-06 Thread GitBox



HeartSaVioR commented on PR #38528:
URL: https://github.com/apache/spark/pull/38528#issuecomment-1304913137

   cc. @zsxwing @jerrypeng Appreciate your review. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR opened a new pull request, #38528: [SPARK-41025][SS] Introduce ComparableOffset to support offset range validation

2022-11-06 Thread GitBox



HeartSaVioR opened a new pull request, #38528:
URL: https://github.com/apache/spark/pull/38528

   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce a new interface ComparableOffset, which is a 
mix-in of streaming Offset interface to enable comparison between two offset 
instances. MicroBatchExecution will perform validation against offset range if 
the offset instance implements ComparableOffset.
   
   The new interface can be mixed-in with both DSv1 streaming Offset and DSv2 
streaming Offset.
   
   This PR also implements this interface for streaming offset in built-in data 
sources.
   
   ### Why are the changes needed?
   
   Currently, Spark doesn't do any assertion against offsets and data source 
implementation is full of responsibility to validate the offset. It seems more 
useful to provide the offset validation by Spark rather than just documenting 
the responsibility and let data source implementation do the duty.
   
   This offset validation is more important since we have Trigger.AvailableNow 
which gradually increases the offset and terminates when the offset is equal to 
the desired offset. A bug in data source may stall the query progress or even 
data duplication.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No for end users.
   
   ### How was this patch tested?
   
   New UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014902247


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   Yeah I am fine with keep using `V2ViewDescription`, I don't think the change 
is necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



jzhuge commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014901610


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   Indeed, `V2ViewDescription` is an API adaptor for easy cloning of v1 SQL 
commands, and adding missing APIs like `comment` or `owner` (stored in 
properties on disk), etc.
   
   So keep using `V2ViewDescription`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014899090


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   Thanks for explaining. Do you think we also need to modify 
`ShowViewPropertiesExec` and `DescribeViewExec` which uses `V2ViewDescription`? 
Looks like using `V2ViewDescription` would bring some convenience especially 
for `DescribeViewExec`. I am good with both approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014899090


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   Thanks for explaining. Do you think we also need to modify 
`ShowViewPropertiesExec` and `DescribeViewExec` which uses `V2ViewDescription`? 
Looks like using `V2ViewDescription` would bring some convenience especially 
for `DescribeViewExec`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



jzhuge commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014897627


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   >  I found output parameter in ShowCreateViewExec(output: Seq[Attribute], 
desc: V2ViewDescription) is never used?
   
   It is needed by the abstract class `QueryPlan`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



jzhuge commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014897367


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   `ShowCreateV2View` is for v2 view while `V2ViewDescription` is an adaptor 
between v1 and v2, so it makes more sense for `ShowCreateV2View` not to use 
`V2ViewDescription`, instead, relying on V2 interfaces solely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-06 Thread GitBox



sunchao commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1304899326

   I'm going to start working on it this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #38527: [SPARK-40875][CONNECT] Improve aggregate in Connect DSL

2022-11-06 Thread GitBox



AmplabJenkins commented on PR #38527:
URL: https://github.com/apache/spark/pull/38527#issuecomment-1304898323

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on pull request #38527: [SPARK-40875][CONNECT] Improve aggregate in Connect DSL

2022-11-06 Thread GitBox



amaliujia commented on PR #38527:
URL: https://github.com/apache/spark/pull/38527#issuecomment-1304893285

   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia opened a new pull request, #38527: [SPARK-40875][CONNECT] Improve aggregate in Connect DSL

2022-11-06 Thread GitBox



amaliujia opened a new pull request, #38527:
URL: https://github.com/apache/spark/pull/38527

   
   
   ### What changes were proposed in this pull request?
   
   This PR adds the aggregate expressions (or named result expressions) for 
Aggregate in Connect proto and DSL. On the server side, this PR also 
differentiates named expression (e.g. with `alias`) and non-named expression 
(so server will wraps `UnresolvedAlias` and Catalyst will generate alias for 
such expression).
   
   ### Why are the changes needed?
   
   Improve API coverage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-06 Thread GitBox



SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014891059


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##
@@ -3344,7 +3344,7 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
   def failedParsingDescriptorError(descFilePath: String, cause: Throwable): 
Throwable = {
 new AnalysisException(
   errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR",

Review Comment:
   @MaxGekk added 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-06 Thread GitBox



dwsmith1983 commented on PR #38510:
URL: https://github.com/apache/spark/pull/38510#issuecomment-1304882120

   @srowen so the streaming test suite passed now so it should be fine now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-06 Thread GitBox



srowen commented on PR #38510:
URL: https://github.com/apache/spark/pull/38510#issuecomment-1304879526

   Weird, the error is on code that isn't in the repo (now). Can you fully 
rebase and force-push your changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-06 Thread GitBox



dwsmith1983 commented on PR #38510:
URL: https://github.com/apache/spark/pull/38510#issuecomment-1304878701

   @srowen is the failure related to the bot tagging of connect? There is some 
connect file that was added after this tag.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r1014877861


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None

Review Comment:
   What is the benefit of doing such a modification? And by the way, I found 
`output` parameter in `ShowCreateViewExec(output: Seq[Attribute], desc: 
V2ViewDescription)` is never used? Maybe we can remove this parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-11-06 Thread GitBox



ljfgem commented on code in PR #35636:
URL: https://github.com/apache/spark/pull/35636#discussion_r955031916


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.catalyst.plans.logical.ViewDescription
+import org.apache.spark.sql.connector.catalog.{View, ViewCatalog}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * View description backed by a View in V2 catalog.
+ *
+ * @param view a view in V2 catalog
+ */
+case class V2ViewDescription(
+override val identifier: String,
+view: View) extends ViewDescription {
+
+  override val schema: StructType = view.schema
+
+  override val viewText: Option[String] = None
+
+  override val viewCatalogAndNamespace: Seq[String] =
+view.currentCatalog +: view.currentNamespace.toSeq
+
+  override val viewQueryColumnNames: Seq[String] = view.schema.fieldNames

Review Comment:
   I think we need to check if `view.schema` is null or not first? Maybe change 
the schema type to `Option[StructType]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #38519: [MINOR][SQL] Remove unused an error class and query error methods

2022-11-06 Thread GitBox



MaxGekk commented on PR #38519:
URL: https://github.com/apache/spark/pull/38519#issuecomment-1304866695

   @panbingkun @LuciferYang @itholic @cloud-fan @srielau Could you review this 
PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-06 Thread GitBox



MaxGekk commented on code in PR #38490:
URL: https://github.com/apache/spark/pull/38490#discussion_r1014873100


##
core/src/main/resources/error/error-classes.json:
##
@@ -668,6 +668,24 @@
   }
 }
   },
+  "LOCATION_ALREADY_EXISTS" : {
+"message" : [
+  "Cannot create the location  because it already exists.",
+  "Choose a different path or remove the existing location."

Review Comment:
   > I.e. in the most trivial solution just move
   
   ok. I'll move it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #38521: [SPARK-41020][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1019` to `STAR_GROUP_BY_POS`

2022-11-06 Thread GitBox



MaxGekk commented on code in PR #38521:
URL: https://github.com/apache/spark/pull/38521#discussion_r1014871999


##
core/src/main/resources/error/error-classes.json:
##
@@ -912,6 +912,11 @@
 ],
 "sqlState" : "22023"
   },
+  "STAR_GROUP_BY_POS" : {

Review Comment:
   It is better to ask the author of the restriction, cc @gatorsmile and the 
committer @cloud-fan 
   
https://github.com/apache/spark/pull/11846/files#diff-ed19f376a63eba52eea59ca71f3355d4495fad4fad4db9a3324aade0d4986a47R389-R392



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #38521: [SPARK-41020][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1019` to `STAR_GROUP_BY_POS`

2022-11-06 Thread GitBox



MaxGekk commented on code in PR #38521:
URL: https://github.com/apache/spark/pull/38521#discussion_r1014871999


##
core/src/main/resources/error/error-classes.json:
##
@@ -912,6 +912,11 @@
 ],
 "sqlState" : "22023"
   },
+  "STAR_GROUP_BY_POS" : {

Review Comment:
   It is better to ask the author of the restriction, cc @gatorsmile 
   
https://github.com/apache/spark/pull/11846/files#diff-ed19f376a63eba52eea59ca71f3355d4495fad4fad4db9a3324aade0d4986a47R389-R392



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #38522: [SPARK-41022][SQL][TESTS] Test the error class: DEFAULT_DATABASE_NOT_EXISTS, INDEX_ALREADY_EXISTS, INDEX_NOT_FOUND, ROUTINE_NOT_FOUND

2022-11-06 Thread GitBox



MaxGekk closed pull request #38522: [SPARK-41022][SQL][TESTS] Test the error 
class: DEFAULT_DATABASE_NOT_EXISTS, INDEX_ALREADY_EXISTS, INDEX_NOT_FOUND, 
ROUTINE_NOT_FOUND
URL: https://github.com/apache/spark/pull/38522


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #38522: [SPARK-41022][SQL][TESTS] Test the error class: DEFAULT_DATABASE_NOT_EXISTS, INDEX_ALREADY_EXISTS, INDEX_NOT_FOUND, ROUTINE_NOT_FOUND

2022-11-06 Thread GitBox



MaxGekk commented on PR #38522:
URL: https://github.com/apache/spark/pull/38522#issuecomment-1304859631

   +1, LGTM. Merging to master.
   Thank you, @panbingkun.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #38521: [SPARK-41020][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1019` to `STAR_GROUP_BY_POS`

2022-11-06 Thread GitBox



srielau commented on code in PR #38521:
URL: https://github.com/apache/spark/pull/38521#discussion_r1014863229


##
core/src/main/resources/error/error-classes.json:
##
@@ -912,6 +912,11 @@
 ],
 "sqlState" : "22023"
   },
+  "STAR_GROUP_BY_POS" : {

Review Comment:
   LGTM ... but why do we have this restriction?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-06 Thread GitBox



srielau commented on code in PR #38490:
URL: https://github.com/apache/spark/pull/38490#discussion_r1014862790


##
core/src/main/resources/error/error-classes.json:
##
@@ -668,6 +668,24 @@
   }
 }
   },
+  "LOCATION_ALREADY_EXISTS" : {
+"message" : [
+  "Cannot create the location  because it already exists.",
+  "Choose a different path or remove the existing location."

Review Comment:
   I didn't mean to remove advise. I means to collect all the "facts" up front 
and then collect all the "advise" in the end.
   I.e. in the most trivial solution just move "Choose a different path or 
remove the existing location." int the suberror message Anyway, these are 
all nits... 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014855632


##
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.dynamicpruning
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
DynamicPruningSubquery, Expression, PredicateHelper, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
+import org.apache.spark.sql.catalyst.planning.GroupBasedRowLevelOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
+import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Implicits, 
DataSourceV2Relation, DataSourceV2ScanRelation}
+
+/**
+ * A rule that assigns a subquery to filter groups in row-level operations at 
runtime.
+ *
+ * Data skipping during job planning for row-level operations is limited to 
expressions that can be
+ * converted to data source filters. Since not all expressions can be pushed 
down that way and
+ * rewriting groups is expensive, Spark allows data sources to filter group at 
runtime.
+ * If the primary scan in a group-based row-level operation supports runtime 
filtering, this rule
+ * will inject a subquery to find all rows that match the condition so that 
data sources know
+ * exactly which groups must be rewritten.
+ *
+ * Note this rule only applies to group-based row-level operations.
+ */
+case class RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: 
Rule[LogicalPlan])

Review Comment:
   An alternative idea could be to move `OptimizeSubqueries` into its own file. 
However, that's tricky too as it calls the optimizer.
   
   ```
   Optimizer.this.execute(Subquery.fromExpression(s))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #38526: [SPARK-38959][SQL][FOLLOW-UP] Address feedback for RowLevelOperationRuntimeGroupFiltering

2022-11-06 Thread GitBox



aokolnychyi commented on PR #38526:
URL: https://github.com/apache/spark/pull/38526#issuecomment-1304834781

   @cloud-fan, could you take a look to see if that's what you meant in PR 
#36304?
   
   There is also one open question 
[here](https://github.com/apache/spark/pull/36304/files#r999648945).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi opened a new pull request, #38526: [SPARK-38959][SQL][FOLLOW-UP] Address feedback for RowLevelOperationRuntimeGroupFiltering

2022-11-06 Thread GitBox



aokolnychyi opened a new pull request, #38526:
URL: https://github.com/apache/spark/pull/38526

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR is to address the feedback on PR #36304 after that change was merged.
   
   ### Why are the changes needed?
   
   
   These changes are needed for better code quality.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850793


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##
@@ -50,7 +50,8 @@ class SparkOptimizer(
   override def defaultBatches: Seq[Batch] = (preOptimizationBatches ++ 
super.defaultBatches :+
 Batch("Optimize Metadata Only Query", Once, 
OptimizeMetadataOnlyQuery(catalog)) :+
 Batch("PartitionPruning", Once,
-  PartitionPruning) :+
+  PartitionPruning,
+  RowLevelOperationRuntimeGroupFiltering(OptimizeSubqueries)) :+

Review Comment:
   This would be much cleaner but SPARK-36444 removed `OptimizeSubqueries` from 
that batch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850793


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##
@@ -50,7 +50,8 @@ class SparkOptimizer(
   override def defaultBatches: Seq[Batch] = (preOptimizationBatches ++ 
super.defaultBatches :+
 Batch("Optimize Metadata Only Query", Once, 
OptimizeMetadataOnlyQuery(catalog)) :+
 Batch("PartitionPruning", Once,
-  PartitionPruning) :+
+  PartitionPruning,
+  RowLevelOperationRuntimeGroupFiltering(OptimizeSubqueries)) :+

Review Comment:
   This would be much cleaner but SPARK-36444 removed `OptimizeSubqueries`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850975


##
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.dynamicpruning
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
DynamicPruningSubquery, Expression, PredicateHelper, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
+import org.apache.spark.sql.catalyst.planning.GroupBasedRowLevelOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
+import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Implicits, 
DataSourceV2Relation, DataSourceV2ScanRelation}
+
+/**
+ * A rule that assigns a subquery to filter groups in row-level operations at 
runtime.
+ *
+ * Data skipping during job planning for row-level operations is limited to 
expressions that can be
+ * converted to data source filters. Since not all expressions can be pushed 
down that way and
+ * rewriting groups is expensive, Spark allows data sources to filter group at 
runtime.
+ * If the primary scan in a group-based row-level operation supports runtime 
filtering, this rule
+ * will inject a subquery to find all rows that match the condition so that 
data sources know
+ * exactly which groups must be rewritten.
+ *
+ * Note this rule only applies to group-based row-level operations.
+ */
+case class RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: 
Rule[LogicalPlan])

Review Comment:
   Like I said 
[here](https://github.com/apache/spark/pull/36304#discussion_r1014850793), we 
could move the new rule into a separate batch and add `OptimizeSubqueries` to 
it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850975


##
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.dynamicpruning
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
DynamicPruningSubquery, Expression, PredicateHelper, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
+import org.apache.spark.sql.catalyst.planning.GroupBasedRowLevelOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
+import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Implicits, 
DataSourceV2Relation, DataSourceV2ScanRelation}
+
+/**
+ * A rule that assigns a subquery to filter groups in row-level operations at 
runtime.
+ *
+ * Data skipping during job planning for row-level operations is limited to 
expressions that can be
+ * converted to data source filters. Since not all expressions can be pushed 
down that way and
+ * rewriting groups is expensive, Spark allows data sources to filter group at 
runtime.
+ * If the primary scan in a group-based row-level operation supports runtime 
filtering, this rule
+ * will inject a subquery to find all rows that match the condition so that 
data sources know
+ * exactly which groups must be rewritten.
+ *
+ * Note this rule only applies to group-based row-level operations.
+ */
+case class RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: 
Rule[LogicalPlan])

Review Comment:
   Like I said 
[here](https://github.com/apache/spark/pull/36304#discussion_r1014850793), we 
could move the new rule into a separate batch and add `OptimizeSubqueries` to 
it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850793


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##
@@ -50,7 +50,8 @@ class SparkOptimizer(
   override def defaultBatches: Seq[Batch] = (preOptimizationBatches ++ 
super.defaultBatches :+
 Batch("Optimize Metadata Only Query", Once, 
OptimizeMetadataOnlyQuery(catalog)) :+
 Batch("PartitionPruning", Once,
-  PartitionPruning) :+
+  PartitionPruning,
+  RowLevelOperationRuntimeGroupFiltering(OptimizeSubqueries)) :+

Review Comment:
   This would be much cleaner but SPARK-36444 removed `OptimizeSubqueries`. I 
can add a separate batch for group filtering in row-level operations. How does 
that sound, @cloud-fan?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850614


##
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.dynamicpruning
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
DynamicPruningSubquery, Expression, PredicateHelper, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
+import org.apache.spark.sql.catalyst.planning.GroupBasedRowLevelOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
+import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Implicits, 
DataSourceV2Relation, DataSourceV2ScanRelation}
+
+/**
+ * A rule that assigns a subquery to filter groups in row-level operations at 
runtime.
+ *
+ * Data skipping during job planning for row-level operations is limited to 
expressions that can be
+ * converted to data source filters. Since not all expressions can be pushed 
down that way and
+ * rewriting groups is expensive, Spark allows data sources to filter group at 
runtime.
+ * If the primary scan in a group-based row-level operation supports runtime 
filtering, this rule
+ * will inject a subquery to find all rows that match the condition so that 
data sources know
+ * exactly which groups must be rewritten.
+ *
+ * Note this rule only applies to group-based row-level operations.
+ */
+case class RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: 
Rule[LogicalPlan])

Review Comment:
   @sunchao is correct. It wasn't easy to call `OptimizeSubqueries` outside 
`Optimizer`. Hence, I had to come up with this workaround.
   
   @cloud-fan, I also considered simply adding `OptimizeSubqueries` to the 
batch with runtime partition filtering. However, SPARK-36444 specifically 
removed it from there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-06 Thread GitBox



aokolnychyi commented on code in PR #36304:
URL: https://github.com/apache/spark/pull/36304#discussion_r1014850115


##
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.dynamicpruning
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
DynamicPruningSubquery, Expression, PredicateHelper, V2ExpressionUtils}
+import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
+import org.apache.spark.sql.catalyst.planning.GroupBasedRowLevelOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
+import org.apache.spark.sql.execution.datasources.v2.{DataSourceV2Implicits, 
DataSourceV2Relation, DataSourceV2ScanRelation}
+
+/**
+ * A rule that assigns a subquery to filter groups in row-level operations at 
runtime.
+ *
+ * Data skipping during job planning for row-level operations is limited to 
expressions that can be
+ * converted to data source filters. Since not all expressions can be pushed 
down that way and
+ * rewriting groups is expensive, Spark allows data sources to filter group at 
runtime.
+ * If the primary scan in a group-based row-level operation supports runtime 
filtering, this rule
+ * will inject a subquery to find all rows that match the condition so that 
data sources know
+ * exactly which groups must be rewritten.
+ *
+ * Note this rule only applies to group-based row-level operations.
+ */
+case class RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: 
Rule[LogicalPlan])
+  extends Rule[LogicalPlan] with PredicateHelper {
+
+  import DataSourceV2Implicits._
+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
+// apply special dynamic filtering only for group-based row-level 
operations
+case GroupBasedRowLevelOperation(replaceData, cond,
+DataSourceV2ScanRelation(_, scan: SupportsRuntimeV2Filtering, _, _, _))
+if conf.runtimeRowLevelOperationGroupFilterEnabled && cond != 
TrueLiteral =>
+
+  // use reference equality on scan to find required scan relations
+  val newQuery = replaceData.query transformUp {
+case r: DataSourceV2ScanRelation if r.scan eq scan =>
+  // use the original table instance that was loaded for this 
row-level operation
+  // in order to leverage a regular batch scan in the group filter 
query
+  val originalTable = r.relation.table.asRowLevelOperationTable.table
+  val relation = r.relation.copy(table = originalTable)
+  val matchingRowsPlan = buildMatchingRowsPlan(relation, cond)
+
+  val filterAttrs = scan.filterAttributes
+  val buildKeys = 
V2ExpressionUtils.resolveRefs[Attribute](filterAttrs, matchingRowsPlan)
+  val pruningKeys = 
V2ExpressionUtils.resolveRefs[Attribute](filterAttrs, r)
+  val dynamicPruningCond = buildDynamicPruningCond(matchingRowsPlan, 
buildKeys, pruningKeys)
+
+  Filter(dynamicPruningCond, r)
+  }
+
+  // optimize subqueries to rewrite them as joins and trigger job planning
+  replaceData.copy(query = optimizeSubqueries(newQuery))

Review Comment:
   Not really, @cloud-fan. This rule simply attaches a runtime filter to the 
plan that was created while rewriting the delete. We do replace the query but 
it is pretty much the same plan, just with an extra runtime filter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



srowen commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304827262

   Looks OK if tests pass. I think we need to update the two together, so this 
resolves both JIRAs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bsikander commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-06 Thread GitBox



bsikander commented on PR #38352:
URL: https://github.com/apache/spark/pull/38352#issuecomment-1304825343

   @sunchao thank you for you efforts. When can we expect the release of 3.2.3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #38499: [MINOR][DOC] updated some grammar and a missed period in the tuning doc

2022-11-06 Thread GitBox



srowen commented on PR #38499:
URL: https://github.com/apache/spark/pull/38499#issuecomment-1304822113

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #38499: [MINOR][DOC] updated some grammar and a missed period in the tuning doc

2022-11-06 Thread GitBox



srowen closed pull request #38499: [MINOR][DOC] updated some grammar and a 
missed period in the tuning doc
URL: https://github.com/apache/spark/pull/38499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #38525: [SPARK-40950][BUILD][FOLLOWUP] Fix Scala 2.13 Mima check

2022-11-06 Thread GitBox



srowen commented on PR #38525:
URL: https://github.com/apache/spark/pull/38525#issuecomment-1304811696

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #38525: [SPARK-40950][BUILD][FOLLOWUP] Fix Scala 2.13 Mima check

2022-11-06 Thread GitBox



srowen closed pull request #38525: [SPARK-40950][BUILD][FOLLOWUP] Fix Scala 
2.13 Mima check
URL: https://github.com/apache/spark/pull/38525


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bjornjorgensen commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



bjornjorgensen commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304801289

   @pjfanning Yes, but I just wanna ask @LuciferYang  if he has been looking at 
it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pjfanning commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



pjfanning commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304800274

   Shouldn't guava changes be done in a separate PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bjornjorgensen commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



bjornjorgensen commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304798904

   @LuciferYang Have you tried to upgrade guava to 31.1-jre? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pjfanning commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



pjfanning commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304798867

   @LuciferYang feel free to continue with your PR. I marked SPARK-41023 as a 
duplicate of SPARK-40911.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bjornjorgensen commented on pull request #38523: [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

2022-11-06 Thread GitBox



bjornjorgensen commented on PR #38523:
URL: https://github.com/apache/spark/pull/38523#issuecomment-1304797597

   https://issues.apache.org/jira/browse/SPARK-40911 
   
   I think that @pjfanning will work on this one. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #38520: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

2022-11-06 Thread GitBox



panbingkun commented on code in PR #38520:
URL: https://github.com/apache/spark/pull/38520#discussion_r1014823662


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala:
##
@@ -581,6 +591,19 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with
 )
   }
 
+  test("to_json - struct: unable to convert column of ObjectType to JSON") {
+val schema = StructType(StructField("a", 
ObjectType(classOf[java.lang.Integer])) :: Nil)
+val structData = Literal.create(create_row(Integer.valueOf(1)), schema)
+assert(StructsToJson(Map.empty, structData).checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "CANNOT_CONVERT_TO_JSON",

Review Comment:
   Another UT
   
[JsonFunctionsSuite.scala](https://github.com/apache/spark/pull/38520/files#diff-f406c940ba1e8f95cc1d87500cc0906fff593fbc336ba66d04daf784121a6050)
   https://user-images.githubusercontent.com/15246973/200171460-65c1e2fc-0018-438e-84ce-7382b91ec97d.png";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #38520: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

2022-11-06 Thread GitBox



panbingkun commented on code in PR #38520:
URL: https://github.com/apache/spark/pull/38520#discussion_r1014823446


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala:
##
@@ -406,6 +407,15 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with
   InternalRow(UTF8String.fromString("1"), null, 
UTF8String.fromString("1")))
   }
 
+  test("json_tuple - all arguments must be strings") {
+assert(JsonTuple(Seq(Literal(888), Literal(999))).checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "NON_STRING_TYPE",

Review Comment:
   Another UT
   
[JsonFunctionsSuite.scala](https://github.com/apache/spark/pull/38520/files#diff-f406c940ba1e8f95cc1d87500cc0906fff593fbc336ba66d04daf784121a6050)
   https://user-images.githubusercontent.com/15246973/200171401-b88aa6ab-8941-4f2c-86b8-68fd7e81c2d4.png";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #38520: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

2022-11-06 Thread GitBox



panbingkun commented on code in PR #38520:
URL: https://github.com/apache/spark/pull/38520#discussion_r1014823040


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##
@@ -346,6 +358,18 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 assert(errorSubClass === "INVALID_MAP_KEY_TYPE")
 assert(messageParameters === Map("keyType" -> "\"MAP\""))
 }
+
+// accepts only arrays of pair structs
+val mapWrongType = MapFromEntries(Literal(1))
+assert(mapWrongType.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "MAP_FROM_ENTRIES_WRONG_TYPE",
+messageParameters = Map(

Review Comment:
   Anothor UT
   
[DataFrameFunctionsSuite.scala](https://github.com/apache/spark/pull/38520/files#diff-03f5f4bb4f15c5640fae904891b75a1d21f7bf8fa92c4abcb9b35d1500c37ae7)
   https://user-images.githubusercontent.com/15246973/200171295-aebe7755-a31e-4c8d-92b1-146dbe3ef437.png";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #38520: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

2022-11-06 Thread GitBox



panbingkun commented on code in PR #38520:
URL: https://github.com/apache/spark/pull/38520#discussion_r1014823040


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##
@@ -346,6 +358,18 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 assert(errorSubClass === "INVALID_MAP_KEY_TYPE")
 assert(messageParameters === Map("keyType" -> "\"MAP\""))
 }
+
+// accepts only arrays of pair structs
+val mapWrongType = MapFromEntries(Literal(1))
+assert(mapWrongType.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "MAP_FROM_ENTRIES_WRONG_TYPE",
+messageParameters = Map(

Review Comment:
   Anothor UT
   https://user-images.githubusercontent.com/15246973/200171295-aebe7755-a31e-4c8d-92b1-146dbe3ef437.png";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #38524: [SPARK-41024][BUILD] Upgrade scala-maven-plugin to 4.7.2

2022-11-06 Thread GitBox



HyukjinKwon closed pull request #38524: [SPARK-41024][BUILD] Upgrade 
scala-maven-plugin to 4.7.2
URL: https://github.com/apache/spark/pull/38524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #38524: [SPARK-41024][BUILD] Upgrade scala-maven-plugin to 4.7.2

2022-11-06 Thread GitBox



HyukjinKwon commented on PR #38524:
URL: https://github.com/apache/spark/pull/38524#issuecomment-1304783719

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #38525: [SPARK-40950][BUILD][FOLLOWUP] Fix Scala 2.13 Mima check

2022-11-06 Thread GitBox



mridulm commented on PR #38525:
URL: https://github.com/apache/spark/pull/38525#issuecomment-1304775260

   Will wait for tests to pass ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38491: [MINOR][CONNECT] Remove unused import in commands.proto

2022-11-06 Thread GitBox



LuciferYang commented on PR #38491:
URL: https://github.com/apache/spark/pull/38491#issuecomment-1304764271

   looks fine


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #38525: [SPARK-40950][BUILD][FOLLOWUP] Fix Scala 2.13 Mima check

2022-11-06 Thread GitBox



LuciferYang commented on PR #38525:
URL: https://github.com/apache/spark/pull/38525#issuecomment-1304762343

   cc @@dongjoon-hyun @srowen @mridulm @HyukjinKwon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 112 matches

Mail list logo