On 2015-01-15 14:47, Julian Reschke wrote:
Hi there,
is this
"\ud800"
a valid string property?
I'm asking because it will not roundtrip through UTF-8.
Will a persistence implementation that stores "as unicode" need to
escape it? Should we reject it? If yes, at what level?
Best regards, Julian
I did some more tests and found that MongoDB will indeed round-trip it.
I thus modified the JSOP serializer to use \u-escaping for broken
surrogate pairs, which makes these strings round-trip (see
<https://fisheye6.atlassian.com/changelog/jackrabbit?cs=1652158>).
Note that I did *not* change encode() yet, only escape() (because that's
the code path the RDB persistence uses).
Best regards, Julian