On 2015-01-15 14:47, Julian Reschke wrote:
Hi there,

is this

   "\ud800"

a valid string property?

I'm asking because it will not roundtrip through UTF-8.

Will a persistence implementation that stores "as unicode" need to
escape it? Should we reject it? If yes, at what level?

Best regards, Julian

I did some more tests and found that MongoDB will indeed round-trip it.

I thus modified the JSOP serializer to use \u-escaping for broken surrogate pairs, which makes these strings round-trip (see <https://fisheye6.atlassian.com/changelog/jackrabbit?cs=1652158>).

Note that I did *not* change encode() yet, only escape() (because that's the code path the RDB persistence uses).


Best regards, Julian

Reply via email to