[Zorba-coders] [Bug 1025622] Re: Incorrect JSON serialization of supplementory plane code points

2012-07-18 Thread Paul J. Lucas
** Summary changed:

- incorrect JSON serialization of supplementory plane code points
+ Incorrect JSON serialization of supplementory plane code points

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  Incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  In Progress

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-18 Thread Paul J. Lucas
** Branch linked: lp:~paul-lucas/zorba/bug-1025622

** Changed in: zorba
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  In Progress

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-18 Thread Paul J. Lucas
The problem with that code is that it serializes the string as a
sequence of bytes (which is wrong) and not a sequence of either Unicode
code-points or UTF-8 characters.

I'll fix it myself.

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Confirmed

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-17 Thread Chris Hillery
** Changed in: zorba
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Confirmed

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-17 Thread Chris Hillery
The problem is almost certainly in void
serializer::json_emitter::emit_json_string(zstring string),
serializer.cpp line 1206 or thereabouts, where it escapes invalid
characters into unicode escape sequences. I have no idea how to do that
any differently than it is, so Paul, please take a look and see if there
are obvious logic problems.

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Confirmed

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-17 Thread Paul J. Lucas
I put some breakpoints in and it never hits my serialization code, so
it's probably in the JSoniq serialization code.

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Incomplete

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-17 Thread Dennis Knochenwefel
I think that building zorba with option -DZORBA_WITH_JSON=ON is
sufficient.

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Incomplete

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points

2012-07-17 Thread Paul J. Lucas
First, how does one execute a JSoniq query? If I put the above query
into a file and do:

  bin/zorba -f -i -r --trailing-nl -q /tmp/foo.xq

I get:

  :2,8: static error [err:XPST0003]: invalid expression;
raised at .../src/compiler/translator/translator.cpp:11081

** Changed in: zorba
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622

Title:
  incorrect JSON serialization of supplementory plane code points

Status in Zorba - The XQuery Processor:
  Incomplete

Bug description:
  this bug is a follow-up of bug #1024448

  Currently, the result of the following JSONiq query:

let $message := "👊"
return { "message": $message }

  is serialized into incorrect JSON:

{ "message" : "\ufff0\uff9f\uff91\uff8a" }

  the correct result would be:

{ "message" : "\ud83d\udc4a" }

  Explanation:

  Characters from the supplementory plane are usually represented in
  utf-16 surrogate pairs within JSON results. The above result is in
  particular incorrect because JSON allows only 4 hex digits after '\u'.
  utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
  window which is most probably the reason why utf-16 is used.

  This has been greatly fixed in the JSON parser by Paul (see mp:
  https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
  ), but it still needs to be fixed in the serializer.

  @Paul: I'm not sure if you are the right person to assign this bug to?

  thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp