Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22213
  
    This actually makes sense. We always forget this, but java properties file 
format is [more complex than any of us 
remember](https://docs.oracle.com/javase/10/docs/api/java/util/Properties.html#load(java.io.Reader))
    
    At the time of this trim taking place, all CR/LF chars in the source file 
will have been stripped through one of
    * being the end of an entry: property contains all chars up to that line 
break (or line skipped if empty/comment)
    * being proceeded by a backslash, in which case the following line will 
have its initial whitespace stripped then joined to the subsequent line.
    
    Whoever did the wikipedia article [did some good 
examples](https://en.wikipedia.org/wiki/.properties)
    
    What this means is: by the time the spark trim() code is reached, the only 
CR and LF entries in a property are those from expanding \r and \n character 
pairs in the actual property itself. All of these within a property, e.g 
`p1=a\nb` already get through, this extends it to propertlies like `p2=\r`. 
    
    * should be able to easy to write some tests for `trimExceptCRLF()` 
directly, e.g. how it handles odd strings (one char, value == 0), empty string, 
...
    * There's an XML format for properties too, which should also be tested to 
see WTF goes on there. 
    
    PS, looking up for the properties spec highlights that Java 9 [uses UTF-8 
for the properties 
encoding](https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm#JSINT-GUID-974CF488-23E8-4963-A322-82006A7A14C7).
 Don't know of any implications here. 
    
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to