[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1023233663 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUM_FORMAT_CONT_THOUSANDS_SEPS" : { Review Comment: > This must be fixed before we cut 12.0. Definitely, it will be fixed before Spark 12. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022706881 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUM_FORMAT_CONT_THOUSANDS_SEPS" : { Review Comment: @cloud-fan Are you ok with such approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022613029 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUM_FORMAT_CONT_THOUSANDS_SEPS" : { Review Comment: > Some are query compile time static checks I would like to propose to do refactoring later. Let's assign unique error classes to the errors that come from `checkInputDataTypes()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022349552 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUM_FORMAT_CONT_THOUSANDS_SEPS" : { Review Comment: Theoretically, a data type is a set of possible values and a set of allowed operations on it. The number format is not an arbitrary string, it must be in some particular format. So, the error indicates violation of datatype range of values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1021226420 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUM_FORMAT_CONT_THOUSANDS_SEPS" : { +"message" : [ + "Thousands separators (, or G) must have digits in between them in the number format: ." +] + }, + "NUM_FORMAT_CUR_MUST_BEFORE_DEC" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ." +] + }, + "NUM_FORMAT_CUR_MUST_BEFORE_DIGIT" : { +"message" : [ + "Currency characters must appear before digits in the number format: ." +] + }, + "NUM_FORMAT_EMPTY" : { +"message" : [ + "The number format string cannot be empty." +] + }, + "NUM_FORMAT_THOUSANDS_SEPS_MUST_BEFORE_DEC" : { +"message" : [ + "Thousands separators (, or G) may not appear after the decimal point in the number format: ." +] + }, + "NUM_FORMAT_UNEXPECTED_TOKEN" : { +"message" : [ + "Unexpected found in the format string ; the structure of the format string must match: [MI|S] [$] [0|9|G|,]* [.|D] [0|9]* [$] [PR|MI|S]." +] + }, + "NUM_FORMAT_WRONG_NUM_DIGIT" : { Review Comment: I wonder is the prefix `NUM_` useful. Maybe just `FORMAT`? @cloud-fan @srielau WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1020757754 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" Review Comment: > Do we really need Level 3 error classes? @MaxGekk I don't think it makes sense. > If we keep Level 2 error classes, INVALID_NUMBER_FORMAT Let's make the prefix shorter, for instance `NUM_FORMAT` like `DATATYPE_MISMATCH.NUM_FORMAT_CONT_THOUSANDS_SEPS` instead of `DATATYPE_MISMATCH.NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_HAVE_DIGIT_IN_BETWEEN` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1020213977 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" +] + }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : { +"message" : [ + "Currency characters must appear before digits in the number format: ''" Review Comment: ditto: quoting of `numberFormat` ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" +] + }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : { +"message" : [ + "Currency characters must appear before digits in the number format: ''" +] + }, + "NUMBER_FORMAT_EMPTY" : { +"message" : [ + "The format string cannot be empty" +] + }, + "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Thousands separators (, or G) may not appear after the decimal point in the number format: ''" Review Comment: ditto: quoting + dot at the end. ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" +] + }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : { +"message" : [ + "Currency characters must appear before digits in the number format: ''" +] + }, + "NUMBER_FORMAT_EMPTY" : { +"message" : [ + "The format string cannot be empty" Review Comment: nit: but for consistency w/ other error messages. Could you add a dot at the end. ```suggestion "The format string cannot be empty." ``` ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" Review Comment: Let's quote the format as we do in other places: ``` toSQLValue(pattern, StringType) ``` see, for instance https://github.com/apache/spark/blob/0205478b9d35d62450fd7c9ade520087fd2979a7/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L1381 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Currency characters must appear before any decimal point in the number format: ''" +] + }, + "NUMBER_FORMAT_CURRENCY_MUST_BEFORE_DIGIT" : { +"message" : [ + "Currency characters must appear before digits in the number format: ''" +] + }, + "NUMBER_FORMAT_EMPTY" : { +"message" : [ + "The format string cannot be empty" +] + }, + "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_BEFORE_DECIMAL" : { +"message" : [ + "Thousands separators (, or G) may not appear after the decimal point in the number format: ''" +] + }, + "NUMBER_FORMAT_THOUSANDS_SEPARATOR_MUST_HAVE_DIGIT_IN_BETWEEN" : { Review Comment: The name is too long. Is it possible to make it slightly shorter? ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala: ## @@ -1134,6 +1181,21 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("ToNumber: fails analysis if numberFormat is not a constant") { Review Comment: nit: is not foldable ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala: ## @@ -1022,76 +1022,123 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } test("ToNumber and ToCharacter: negative tests (the format string is invalid)") { -val unexpectedCharacter = "the structure of the format string must match: " + - "[MI|S] [$]