planga82 commented on a change in pull request #33747:
URL: https://github.com/apache/spark/pull/33747#discussion_r690790953
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
case VALUE_STRING if parser.getTextLength >= 1 =>
// Special case handling for NaN and Infinity.
- parser.getText match {
Review comment:
This is a good point, I have done research about it and it seems that it
is not a consensus. In Json specification this special values are not permitted
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
pag. 12).
From this point different libraries follow different criteria. For example
* **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
uses `'INF', '-INF', 'NaN'`. That is the XML standard for this
(https://www.w3.org/TR/xmlschema-2/)
* **Python 3** uses the following `'-Infinity', 'Infinity', 'NaN'`
(https://docs.python.org/3/library/json.html)
* **Spray-json**: It seems that not support it
* **Gson**, using serializeSpecialFloatingPointValues
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
, uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'`
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
pag 119, pag. 63)
From my point of view, since there is no standard, probably using the same
options as cast could make the users' lives easier, but it’s true that, for
example, all lowercase letters is not an option in any library, so it’s a
difficult decision. Now we are using the same option as Python and Javascript
specification.
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
case VALUE_STRING if parser.getTextLength >= 1 =>
// Special case handling for NaN and Infinity.
- parser.getText match {
Review comment:
This is a good point, I have done research about it and it seems that it
is not a consensus. In Json specification this special values are not permitted
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
pag. 12).
From this point different libraries follow different criteria. For example
* **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
uses `'INF', '-INF', 'NaN'`. That is the **XML standard** for this
(https://www.w3.org/TR/xmlschema-2/)
* **Python 3** uses the following `'-Infinity', 'Infinity', 'NaN'`
(https://docs.python.org/3/library/json.html)
* **Spray-json**: It seems that not support it
* **Gson**, using serializeSpecialFloatingPointValues
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
, uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'`
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
pag 119, pag. 63)
From my point of view, since there is no standard, probably using the same
options as cast could make the users' lives easier, but it’s true that, for
example, all lowercase letters is not an option in any library, so it’s a
difficult decision. Now we are using the same option as Python and Javascript
specification.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]