Fwd: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-19 Thread Joey Adams
Forwarding because the mailing list rejected the original message.

-- Forwarded message --
From: Joey Adams joeyadams3.14...@gmail.com
Date: Tue, Jul 19, 2011 at 11:23 PM
Subject: Re: Initial Review: JSON contrib modul was: Re: [HACKERS]
Another swing at JSON
To: Alvaro Herrera alvhe...@commandprompt.com
Cc: Florian Pflug f...@phlo.org, Tom Lane t...@sss.pgh.pa.us, Robert
Haas robertmh...@gmail.com, Bernd Helmle maili...@oopsware.de,
Dimitri Fontaine dimi...@2ndquadrant.fr, David Fetter
da...@fetter.org, Josh Berkus j...@agliodbs.com, Pg Hackers
pgsql-hackers@postgresql.org


On Tue, Jul 19, 2011 at 10:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Would it work to have a separate entry point into mbutils.c that lets
 you cache the conversion proc caller-side?

That sounds like a really good idea.  There's still the overhead of
calling the proc, but I imagine it's a lot less than looking it up.

 I think the main problem is
 determining the byte length of each source character beforehand.

I'm not sure what you mean.  The idea is to convert the \u escape
to UTF-8 with unicode_to_utf8 (the length of the resulting UTF-8
sequence is easy to compute), call the conversion proc to get the
null-terminated database-encoded character, then append the result to
whatever StringInfo the string is going into.

The only question mark is how big the destination buffer will need to
be.  The maximum number of bytes per char in any supported encoding is
4, but is it possible for one Unicode character to turn into multiple
characters in the database encoding?

While we're at it, should we provide the same capability to the SQL
parser?  Namely, the ability to use \u escapes above U+007F when
the server encoding is not UTF-8?

- Joey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-19 Thread Bruce Momjian
Joey Adams wrote:
 Forwarding because the mailing list rejected the original message.

Yes, I am seeing email failures to the 'core' email list.

---


 
 -- Forwarded message --
 From: Joey Adams joeyadams3.14...@gmail.com
 Date: Tue, Jul 19, 2011 at 11:23 PM
 Subject: Re: Initial Review: JSON contrib modul was: Re: [HACKERS]
 Another swing at JSON
 To: Alvaro Herrera alvhe...@commandprompt.com
 Cc: Florian Pflug f...@phlo.org, Tom Lane t...@sss.pgh.pa.us, Robert
 Haas robertmh...@gmail.com, Bernd Helmle maili...@oopsware.de,
 Dimitri Fontaine dimi...@2ndquadrant.fr, David Fetter
 da...@fetter.org, Josh Berkus j...@agliodbs.com, Pg Hackers
 pgsql-hackers@postgresql.org
 
 
 On Tue, Jul 19, 2011 at 10:01 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Would it work to have a separate entry point into mbutils.c that lets
  you cache the conversion proc caller-side?
 
 That sounds like a really good idea. ?There's still the overhead of
 calling the proc, but I imagine it's a lot less than looking it up.
 
  I think the main problem is
  determining the byte length of each source character beforehand.
 
 I'm not sure what you mean. ?The idea is to convert the \u escape
 to UTF-8 with unicode_to_utf8 (the length of the resulting UTF-8
 sequence is easy to compute), call the conversion proc to get the
 null-terminated database-encoded character, then append the result to
 whatever StringInfo the string is going into.
 
 The only question mark is how big the destination buffer will need to
 be. ?The maximum number of bytes per char in any supported encoding is
 4, but is it possible for one Unicode character to turn into multiple
 characters in the database encoding?
 
 While we're at it, should we provide the same capability to the SQL
 parser? ?Namely, the ability to use \u escapes above U+007F when
 the server encoding is not UTF-8?
 
 - Joey
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-19 Thread Bruce Momjian
Bruce Momjian wrote:
 Joey Adams wrote:
  Forwarding because the mailing list rejected the original message.
 
 Yes, I am seeing email failures to the 'core' email list.

Marc says it is now fixed.

---


 
  
  -- Forwarded message --
  From: Joey Adams joeyadams3.14...@gmail.com
  Date: Tue, Jul 19, 2011 at 11:23 PM
  Subject: Re: Initial Review: JSON contrib modul was: Re: [HACKERS]
  Another swing at JSON
  To: Alvaro Herrera alvhe...@commandprompt.com
  Cc: Florian Pflug f...@phlo.org, Tom Lane t...@sss.pgh.pa.us, Robert
  Haas robertmh...@gmail.com, Bernd Helmle maili...@oopsware.de,
  Dimitri Fontaine dimi...@2ndquadrant.fr, David Fetter
  da...@fetter.org, Josh Berkus j...@agliodbs.com, Pg Hackers
  pgsql-hackers@postgresql.org
  
  
  On Tue, Jul 19, 2011 at 10:01 PM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
   Would it work to have a separate entry point into mbutils.c that lets
   you cache the conversion proc caller-side?
  
  That sounds like a really good idea. ?There's still the overhead of
  calling the proc, but I imagine it's a lot less than looking it up.
  
   I think the main problem is
   determining the byte length of each source character beforehand.
  
  I'm not sure what you mean. ?The idea is to convert the \u escape
  to UTF-8 with unicode_to_utf8 (the length of the resulting UTF-8
  sequence is easy to compute), call the conversion proc to get the
  null-terminated database-encoded character, then append the result to
  whatever StringInfo the string is going into.
  
  The only question mark is how big the destination buffer will need to
  be. ?The maximum number of bytes per char in any supported encoding is
  4, but is it possible for one Unicode character to turn into multiple
  characters in the database encoding?
  
  While we're at it, should we provide the same capability to the SQL
  parser? ?Namely, the ability to use \u escapes above U+007F when
  the server encoding is not UTF-8?
  
  - Joey
  
  -- 
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers