andygrove commented on pull request #35573:
URL: https://github.com/apache/spark/pull/35573#issuecomment-1046114075


   > Dumb questions as I just don't know this code path -- would we parse 
`"1.0"` as a number or reject it for being a string? and you're saying that 
Jackson would parse `"NaN"` as NaN and not a string? It feels weird to parse 
strings that aren't even numbers as numbers, even when the option to do so is 
set, but if that is consistent with other behavior I could see it.
   
   Jackson will parse valid JSON numbers as either integers or floats. It will 
parse strings as strings and does not try to interpret them. Jackson has 
special behavior for tokens such as `NaN` and `Infinity` which are not valid 
JSON values (JSON does not support non-numeric numbers at all). Jackson will 
not attempt to interpret `"NaN"` and would just return a string.
   
   For some reason, the code we have today in Spark's `JacksonParser` will look 
at any returned strings and will parse them if they are `NaN`, `Infinity` or 
`-Infinity` (but not the other special values that Jackson supports, such as 
`+INF`). It will not attempt to parse strings that contain actual numeric 
values like `"1.0"`. This secondary parsing seems like a bad idea overall IMO. 
Users could get similar behavior by specifying that the column is `StringType` 
when reading and then performing an explicit `CAST` to a floating-point type 
and it would support reading numbers and not-numbers regardless of whether they 
are JSON strings or JSON numbers.
   
   If we have the option of removing this code (or putting it behind a 
configuration option) then I would be in favor of that. If not, then I would 
like it to support the same set of not-a-number strings as Jackson and I would 
like it to respect the `allowNonNumericNumbers` option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to