[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

MaxGekk Fri, 23 Mar 2018 03:55:43 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/20877
  
    > Does that fix actual usecases?
    
    I see the following use cases:
    
    1. Jsons coming usually from embedded systems have not-standard separators 
(invisible in some cases). It is very convenient to open a file in hex editor 
and copy bytes between }{ to the lineSep option. This is the use case for the 
format with `'x'` selector like: `x0d 54 45`
    
    2. In Json Streaming, records could be separated in pretty different ways. 
We should leave room for improvement I believe. See `'r'` (for regexp) and 
`'/'` reserved selectors
    
    3. Some UTF-8 chars could cause errors from style (format) checkers. It is 
easier to represent such chars in hexadecimal format instead of disabling the 
checkers.
    
    4. In near future, json datasource will support input json in different 
charsets. If the source code in UTF-8 but input json in different charset, it 
is slightly hard to put such chars as values for the lineSep option. The 
`x<hexs>` format is more convenient here again.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

Reply via email to