Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20877
> Does that fix actual usecases?
I see the following use cases:
1. Jsons coming usually from embedded systems have not-standard separators
(invisible in some cases). It is very convenient to open a file in hex editor
and copy bytes between }{ to the lineSep option. This is the use case for the
format with `'x'` selector like: `x0d 54 45`
2. In Json Streaming, records could be separated in pretty different ways.
We should leave room for improvement I believe. See `'r'` (for regexp) and
`'/'` reserved selectors
3. Some UTF-8 chars could cause errors from style (format) checkers. It is
easier to represent such chars in hexadecimal format instead of disabling the
checkers.
4. In near future, json datasource will support input json in different
charsets. If the source code in UTF-8 but input json in different charset, it
is slightly hard to put such chars as values for the lineSep option. The
`x<hexs>` format is more convenient here again.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]