[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1023233663


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUM_FORMAT_CONT_THOUSANDS_SEPS" : {

Review Comment:
   > This must be fixed before we cut 12.0.
   
   Definitely, it will be fixed before Spark 12. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1022706881


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUM_FORMAT_CONT_THOUSANDS_SEPS" : {

Review Comment:
   @cloud-fan Are you ok with such approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1022613029


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUM_FORMAT_CONT_THOUSANDS_SEPS" : {

Review Comment:
   > Some are query compile time static checks
   
   I would like to propose to do refactoring later. Let's assign unique error 
classes to the errors that come from `checkInputDataTypes()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1022349552


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUM_FORMAT_CONT_THOUSANDS_SEPS" : {

Review Comment:
   Theoretically, a data type is a set of possible values and a set of allowed 
operations on it. The number format is not an arbitrary string, it must be in 
some particular format. So, the error indicates violation of datatype range of 
values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1021226420


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUM_FORMAT_CONT_THOUSANDS_SEPS" : {
+"message" : [
+  "Thousands separators (, or G) must have digits in between them in 
the number format: ."
+]
+  },
+  "NUM_FORMAT_CUR_MUST_BEFORE_DEC" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ."
+]
+  },
+  "NUM_FORMAT_CUR_MUST_BEFORE_DIGIT" : {
+"message" : [
+  "Currency characters must appear before digits in the number format: 
."
+]
+  },
+  "NUM_FORMAT_EMPTY" : {
+"message" : [
+  "The number format string cannot be empty."
+]
+  },
+  "NUM_FORMAT_THOUSANDS_SEPS_MUST_BEFORE_DEC" : {
+"message" : [
+  "Thousands separators (, or G) may not appear after the decimal 
point in the number format: ."
+]
+  },
+  "NUM_FORMAT_UNEXPECTED_TOKEN" : {
+"message" : [
+  "Unexpected  found in the format string ; the 
structure of the format string must match: [MI|S] [$] [0|9|G|,]* [.|D] [0|9]* 
[$] [PR|MI|S]."
+]
+  },
+  "NUM_FORMAT_WRONG_NUM_DIGIT" : {

Review Comment:
   I wonder is the prefix `NUM_` useful. Maybe just `FORMAT`? @cloud-fan 
@srielau WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-12 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1020757754


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"

Review Comment:
   > Do we really need Level 3 error classes? @MaxGekk
   
   I don't think it makes sense.
   
   > If we keep Level 2 error classes, INVALID_NUMBER_FORMAT
   
   Let's make the prefix shorter, for instance `NUM_FORMAT` like
   `DATATYPE_MISMATCH.NUM_FORMAT_CONT_THOUSANDS_SEPS`
instead of
   
`DATATYPE_MISMATCH.NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_HAVE_DIGIT_IN_BETWEEN`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-11 Thread GitBox


MaxGekk commented on code in PR #38531:
URL: https://github.com/apache/spark/pull/38531#discussion_r1020213977


##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"
+]
+  },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : {
+"message" : [
+  "Currency characters must appear before digits in the number format: 
''"

Review Comment:
   ditto: quoting of `numberFormat`



##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"
+]
+  },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : {
+"message" : [
+  "Currency characters must appear before digits in the number format: 
''"
+]
+  },
+  "NUMBER_FORMAT_EMPTY" : {
+"message" : [
+  "The format string cannot be empty"
+]
+  },
+  "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Thousands separators (, or G) may not appear after the decimal 
point in the number format: ''"

Review Comment:
   ditto: quoting + dot at the end.



##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"
+]
+  },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : {
+"message" : [
+  "Currency characters must appear before digits in the number format: 
''"
+]
+  },
+  "NUMBER_FORMAT_EMPTY" : {
+"message" : [
+  "The format string cannot be empty"

Review Comment:
   nit: but for consistency w/ other error messages. Could you add a dot at the 
end.
   ```suggestion
 "The format string cannot be empty."
   ```



##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"

Review Comment:
   Let's quote the format as we do in other places:
   ```
   toSQLValue(pattern, StringType)
   ```
   see, for instance
   
https://github.com/apache/spark/blob/0205478b9d35d62450fd7c9ade520087fd2979a7/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L1381



##
core/src/main/resources/error/error-classes.json:
##
@@ -290,6 +290,46 @@
   "Null typed values cannot be used as arguments of ."
 ]
   },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Currency characters must appear before any decimal point in the 
number format: ''"
+]
+  },
+  "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : {
+"message" : [
+  "Currency characters must appear before digits in the number format: 
''"
+]
+  },
+  "NUMBER_FORMAT_EMPTY" : {
+"message" : [
+  "The format string cannot be empty"
+]
+  },
+  "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_BEFORE_DECIMAL" : {
+"message" : [
+  "Thousands separators (, or G) may not appear after the decimal 
point in the number format: ''"
+]
+  },
+  "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_HAVE_DIGIT_IN_BETWEEN" : {

Review Comment:
   The name is too long. Is it possible to make it slightly shorter?



##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1134,6 +1181,21 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("ToNumber: fails analysis if numberFormat is not a constant") {

Review Comment:
   nit: is not foldable



##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##
@@ -1022,76 +1022,123 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   }
 
   test("ToNumber and ToCharacter: negative tests (the format string is 
invalid)") {
-val unexpectedCharacter = "the structure of the format string must match: 
" +
-  "[MI|S] [$]