[Zorba-coders] [Bug 1025622] Re: Incorrect JSON serialization of supplementory plane code points
** Summary changed: - incorrect JSON serialization of supplementory plane code points + Incorrect JSON serialization of supplementory plane code points -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: Incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: In Progress Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
** Branch linked: lp:~paul-lucas/zorba/bug-1025622 ** Changed in: zorba Status: Confirmed => In Progress -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: In Progress Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
The problem with that code is that it serializes the string as a sequence of bytes (which is wrong) and not a sequence of either Unicode code-points or UTF-8 characters. I'll fix it myself. -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Confirmed Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
** Changed in: zorba Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Confirmed Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
The problem is almost certainly in void serializer::json_emitter::emit_json_string(zstring string), serializer.cpp line 1206 or thereabouts, where it escapes invalid characters into unicode escape sequences. I have no idea how to do that any differently than it is, so Paul, please take a look and see if there are obvious logic problems. -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Confirmed Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
I put some breakpoints in and it never hits my serialization code, so it's probably in the JSoniq serialization code. -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Incomplete Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
I think that building zorba with option -DZORBA_WITH_JSON=ON is sufficient. -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Incomplete Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
First, how does one execute a JSoniq query? If I put the above query into a file and do: bin/zorba -f -i -r --trailing-nl -q /tmp/foo.xq I get: :2,8: static error [err:XPST0003]: invalid expression; raised at .../src/compiler/translator/translator.cpp:11081 ** Changed in: zorba Status: New => Incomplete -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1025622 Title: incorrect JSON serialization of supplementory plane code points Status in Zorba - The XQuery Processor: Incomplete Bug description: this bug is a follow-up of bug #1024448 Currently, the result of the following JSONiq query: let $message := "👊" return { "message": $message } is serialized into incorrect JSON: { "message" : "\ufff0\uff9f\uff91\uff8a" } the correct result would be: { "message" : "\ud83d\udc4a" } Explanation: Characters from the supplementory plane are usually represented in utf-16 surrogate pairs within JSON results. The above result is in particular incorrect because JSON allows only 4 hex digits after '\u'. utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit window which is most probably the reason why utf-16 is used. This has been greatly fixed in the JSON parser by Paul (see mp: https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248 ), but it still needs to be fixed in the serializer. @Paul: I'm not sure if you are the right person to assign this bug to? thanks To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp