[GitHub] [spark] planga82 commented on a change in pull request #33747: [SPARK-36453][SQL] Improve consistency processing floating point special literals

GitBox Wed, 18 Aug 2021 09:50:02 -0700


planga82 commented on a change in pull request #33747:
URL: https://github.com/apache/spark/pull/33747#discussion_r690790953




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
 
         case VALUE_STRING if parser.getTextLength >= 1 =>
           // Special case handling for NaN and Infinity.
-          parser.getText match {

Review comment:
       This is a good point, I have done research about it and it seems that it 
is not a consensus. In Json specification this special values are not permitted 
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
 pag. 12). 
   
   From this point different libraries follow different criteria. For example
   * **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS 
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
 uses `'INF', '-INF', 'NaN'`.  That is the XML standard for this 
(https://www.w3.org/TR/xmlschema-2/)
   * **Python 3**  uses the following `'-Infinity', 'Infinity', 'NaN'`  
(https://docs.python.org/3/library/json.html)
   * **Spray-json**: It seems that not support it
   * **Gson**, using serializeSpecialFloatingPointValues 
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
 , uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'` 
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
 pag 119, pag. 63)
   
   From my point of view, since there is no standard, probably using the same 
options as cast could make the users' lives easier, but it’s true that, for 
example, all lowercase letters  is not an option in any library, so it’s a 
difficult decision. Now we are using the same option as Python and Javascript 
specification.
   

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##########
@@ -197,12 +197,12 @@ class JacksonParser(
 
         case VALUE_STRING if parser.getTextLength >= 1 =>
           // Special case handling for NaN and Infinity.
-          parser.getText match {

Review comment:
       This is a good point, I have done research about it and it seems that it 
is not a consensus. In Json specification this special values are not permitted 
(https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
 pag. 12). 
   
   From this point different libraries follow different criteria. For example
   * **Jackson**, if you use ALLOW_NON_NUMERIC_NUMBERS 
(https://fasterxml.github.io/jackson-core/javadoc/2.8/com/fasterxml/jackson/core/JsonParser.Feature.html?is-external=true)
 uses `'INF', '-INF', 'NaN'`.  That is the **XML standard** for this 
(https://www.w3.org/TR/xmlschema-2/)
   * **Python 3**  uses the following `'-Infinity', 'Infinity', 'NaN'`  
(https://docs.python.org/3/library/json.html)
   * **Spray-json**: It seems that not support it
   * **Gson**, using serializeSpecialFloatingPointValues 
(https://www.javadoc.io/doc/com.google.code.gson/gson/2.8.0/com/google/gson/GsonBuilder.html#serializeSpecialFloatingPointValues--)
 , uses the **JavaScript specification** `'-Infinity', 'Infinity', 'NaN'` 
(https://www.ecma-international.org/wp-content/uploads/ECMA-262_12th_edition_june_2021.pdf
 pag 119, pag. 63)
   
   From my point of view, since there is no standard, probably using the same 
options as cast could make the users' lives easier, but it’s true that, for 
example, all lowercase letters  is not an option in any library, so it’s a 
difficult decision. Now we are using the same option as Python and Javascript 
specification.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] planga82 commented on a change in pull request #33747: [SPARK-36453][SQL] Improve consistency processing floating point special literals

Reply via email to