The problem is almost certainly in void
serializer::json_emitter::emit_json_string(zstring string),
serializer.cpp line 1206 or thereabouts, where it escapes invalid
characters into unicode escape sequences. I have no idea how to do that
any differently than it is, so Paul, please take a look and see if there
are obvious logic problems.

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Confirmed

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

    let $message := "👊"
    return { "message": $message }

  is serialized into incorrect JSON:

    { "message" : "\ufffffff0\uffffff9f\uffffff91\uffffff8a" }

  the correct result would be:

    { "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to     : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp

Reply via email to