Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/22213
This actually makes sense. We always forget this, but java properties file
format is [more complex than any of us
remember](https://docs.oracle.com/javase/10/docs/api/java/util/Properties.html#load(java.io.Reader))
At the time of this trim taking place, all CR/LF chars in the source file
will have been stripped through one of
* being the end of an entry: property contains all chars up to that line
break (or line skipped if empty/comment)
* being proceeded by a backslash, in which case the following line will
have its initial whitespace stripped then joined to the subsequent line.
Whoever did the wikipedia article [did some good
examples](https://en.wikipedia.org/wiki/.properties)
What this means is: by the time the spark trim() code is reached, the only
CR and LF entries in a property are those from expanding \r and \n character
pairs in the actual property itself. All of these within a property, e.g
`p1=a\nb` already get through, this extends it to propertlies like `p2=\r`.
* should be able to easy to write some tests for `trimExceptCRLF()`
directly, e.g. how it handles odd strings (one char, value == 0), empty string,
...
* There's an XML format for properties too, which should also be tested to
see WTF goes on there.
PS, looking up for the properties spec highlights that Java 9 [uses UTF-8
for the properties
encoding](https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm#JSINT-GUID-974CF488-23E8-4963-A322-82006A7A14C7).
Don't know of any implications here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]