Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Petr Jelinek

Dne 6.4.2010 7:57, Joseph Adams napsal(a):

On Tue, Apr 6, 2010 at 1:00 AM, Petr Jelinekpjmo...@pjmodos.net  wrote:
   

Not really sure about this myself, but keep in mind that NULL has special
meaning in SQL.
 

To me, the most logical approach is to do the obvious thing: make
JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
NULLs in it and converting the result set to JSON would yield a
structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
sure what startling results would come of this approach, but I'm
guessing this would be most intuitive and useful.
   


+1


Just a note, but PostgreSQL has some UTF-8 validation code, you might want
to look at it maybe, at least once you start the actual integration into
core, so that you are not reinventing too many wheels. I can see how your
own code is good thing for general library which this can (and I am sure
will be) used as, but for the datatype itself, it might be better idea to
use what's already there, unless it's somehow incompatible of course.
 

Indeed.  My plan is to first get a strong standalone JSON library
written and tested so it can be used as a general-purpose library.  As
the JSON code is merged into PostgreSQL, it can be adapted.  Part of
this adaptation would most likely be removing the UTF-8 validation
function I wrote and using PostgreSQL's Unicode support code instead.

There are probably other bits that could be PostgreSQLified as well.
I wonder if I should consider leveraging PostgreSQL's regex support or
if it would be a bad fit/waste of time/slower/not worth it.
   


Regex ? What for ? You certainly don't need it for parsing, you have 
good parser IMHO and regex would probably be all of the above.


--
Regards
Petr Jelinek (PJMODOS)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Alvaro Herrera
Joseph Adams escribió:

 http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2
 
 My json.c is now 1161 lines long, so I can't quite call it small anymore.

Just noticed you don't check the return value of malloc and friends.
How do you intend to handle that?  There are various places that would
simply dump core with the 0.0.2 code.  Within Postgres it's easy -- a
failed palloc aborts the transaction and doesn't continue running your
code.  But in a standalone library that's probably not acceptable.

If we were to import this there are some lines that could be ripped out,
like 60 lines in the string buffer stuff and 130 lines for Unicode.
That brings your code just under 1000 lines.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Tom Lane
Petr Jelinek pjmo...@pjmodos.net writes:
 Dne 6.4.2010 7:57, Joseph Adams napsal(a):
 To me, the most logical approach is to do the obvious thing: make
 JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
 NULLs in it and converting the result set to JSON would yield a
 structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
 sure what startling results would come of this approach, but I'm
 guessing this would be most intuitive and useful.

 +1

I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
there is no other standard datatype for which the input converter can
yield NULL from a non-null input string, and I'm not even sure that the
InputFunctionCall protocol allows it.  (In fact a quick look indicates
that it doesn't...)

To me, what this throws into question is not so much whether JSON null
should equate to SQL NULL (it should), but whether it's sane to accept
atomic values.  If I understood the beginning of this discussion, that's
not strictly legal.  I think it would be better for strict input mode
to reject this, and permissive mode to convert it to a non-atomic value.
Thus jsonify('null') wouldn't yield NULL but a structure containing a
null.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Robert Haas
On Tue, Apr 6, 2010 at 11:05 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Joseph Adams escribió:

 http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2

 My json.c is now 1161 lines long, so I can't quite call it small anymore.

 Just noticed you don't check the return value of malloc and friends.
 How do you intend to handle that?  There are various places that would
 simply dump core with the 0.0.2 code.  Within Postgres it's easy -- a
 failed palloc aborts the transaction and doesn't continue running your
 code.  But in a standalone library that's probably not acceptable.

 If we were to import this there are some lines that could be ripped out,
 like 60 lines in the string buffer stuff and 130 lines for Unicode.
 That brings your code just under 1000 lines.

Let me be the first to suggest putting this code under the PostgreSQL license.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Robert Haas
On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Petr Jelinek pjmo...@pjmodos.net writes:
 Dne 6.4.2010 7:57, Joseph Adams napsal(a):
 To me, the most logical approach is to do the obvious thing: make
 JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
 NULLs in it and converting the result set to JSON would yield a
 structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
 sure what startling results would come of this approach, but I'm
 guessing this would be most intuitive and useful.

 +1

 I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
 there is no other standard datatype for which the input converter can
 yield NULL from a non-null input string, and I'm not even sure that the
 InputFunctionCall protocol allows it.  (In fact a quick look indicates
 that it doesn't...)

Oh.  I missed this aspect of the proposal.  I agree - that's a bad idea.

 To me, what this throws into question is not so much whether JSON null
 should equate to SQL NULL (it should), but whether it's sane to accept
 atomic values.

With this, I disagree.  I see no reason to suppose that a JSON NULL
and an SQL NULL are the same thing.

 If I understood the beginning of this discussion, that's
 not strictly legal.  I think it would be better for strict input mode
 to reject this, and permissive mode to convert it to a non-atomic value.
 Thus jsonify('null') wouldn't yield NULL but a structure containing a
 null.

There's no obvious structure to convert this into.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 To me, what this throws into question is not so much whether JSON null
 should equate to SQL NULL (it should), but whether it's sane to accept
 atomic values.

 With this, I disagree.  I see no reason to suppose that a JSON NULL
 and an SQL NULL are the same thing.

Oh.  If they're not the same, then the problem is easily dodged, but
then what *is* a JSON null?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Robert Haas
On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 To me, what this throws into question is not so much whether JSON null
 should equate to SQL NULL (it should), but whether it's sane to accept
 atomic values.

 With this, I disagree.  I see no reason to suppose that a JSON NULL
 and an SQL NULL are the same thing.

 Oh.  If they're not the same, then the problem is easily dodged, but
 then what *is* a JSON null?

I assume we're going to treat JSON much like XML: basically text, but
with some validation (and perhaps canonicalization) under the hood.
So a JSON null will be null, just a JSON boolean true value will be
true.  It would be pretty weird if storing true or false or 4
or [3,1,4,1,5,9] into a json column and then reading it back
returned the input string; but at the same time storing null into
the column returned a SQL NULL.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh.  If they're not the same, then the problem is easily dodged, but
 then what *is* a JSON null?

 I assume we're going to treat JSON much like XML: basically text, but
 with some validation (and perhaps canonicalization) under the hood.
 So a JSON null will be null, just a JSON boolean true value will be
 true.  It would be pretty weird if storing true or false or 4
 or [3,1,4,1,5,9] into a json column and then reading it back
 returned the input string; but at the same time storing null into
 the column returned a SQL NULL.

Hmm.  So the idea is that all JSON atomic values are considered to be
text strings, even when they look like something else (like bools or
numbers)?  That would simplify matters I guess, but I'm not sure about
the usability.  In particular I'd want to have something that dequotes
the value so that I can get foo not foo when converting to SQL text.
(I'm assuming that quotes would be there normally, so as not to lose
the distinction between 3 and 3 in the JSON representation.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Robert Haas
On Tue, Apr 6, 2010 at 2:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh.  If they're not the same, then the problem is easily dodged, but
 then what *is* a JSON null?

 I assume we're going to treat JSON much like XML: basically text, but
 with some validation (and perhaps canonicalization) under the hood.
 So a JSON null will be null, just a JSON boolean true value will be
 true.  It would be pretty weird if storing true or false or 4
 or [3,1,4,1,5,9] into a json column and then reading it back
 returned the input string; but at the same time storing null into
 the column returned a SQL NULL.

 Hmm.  So the idea is that all JSON atomic values are considered to be
 text strings, even when they look like something else (like bools or
 numbers)?  That would simplify matters I guess, but I'm not sure about
 the usability.

I'm not sure what the other option is.  If you do SELECT col FROM
table, I'm not aware that you can return differently-typed values for
different rows...

 In particular I'd want to have something that dequotes
 the value so that I can get foo not foo when converting to SQL text.
 (I'm assuming that quotes would be there normally, so as not to lose
 the distinction between 3 and 3 in the JSON representation.)

Yes, that seems like a useful support function.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Yeb Havinga

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:
  

With this, I disagree.  I see no reason to suppose that a JSON NULL
and an SQL NULL are the same thing.



Oh.  If they're not the same, then the problem is easily dodged, but
then what *is* a JSON null?
  

Probably the same as the javascript null.

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Joseph Adams
On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Petr Jelinek pjmo...@pjmodos.net writes:
 Dne 6.4.2010 7:57, Joseph Adams napsal(a):
 To me, the most logical approach is to do the obvious thing: make
 JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
 NULLs in it and converting the result set to JSON would yield a
 structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
 sure what startling results would come of this approach, but I'm
 guessing this would be most intuitive and useful.

 +1

 I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
 there is no other standard datatype for which the input converter can
 yield NULL from a non-null input string, and I'm not even sure that the
 InputFunctionCall protocol allows it.  (In fact a quick look indicates
 that it doesn't...)

 To me, what this throws into question is not so much whether JSON null
 should equate to SQL NULL (it should), but whether it's sane to accept
 atomic values.  If I understood the beginning of this discussion, that's
 not strictly legal.  I think it would be better for strict input mode
 to reject this, and permissive mode to convert it to a non-atomic value.
 Thus jsonify('null') wouldn't yield NULL but a structure containing a
 null.

                        regards, tom lane


Actually, I kind of made a zany mistake here.  If 'null'::JSON yielded
NULL, that would mean some type of automatic conversion was going on.
Likewise, '3.14159'::JSON shouldn't magically turn into a FLOAT.

I think the JSON datatype should behave more like TEXT.  'null'::JSON
would yield a JSON fragment containing 'null'.  'null'::JSON::TEXT
would yield the literal text 'null'.  However, '3.14159'::JSON::FLOAT
should probably not be allowed as a precaution, as
'hello'::JSON::TEXT would yield 'hello', not 'hello'.  In other
words, casting to the target type directly isn't the same as parsing
JSON and extracting a value.

Perhaps there could be conversion functions.  E.g.:

json_to_string('hello') yields 'hello'
json_to_number('3.14159') yields '3.14159' as text
(it is up to the user to cast it to the number type s/he wants)
json_to_bool('true') yields TRUE
json_to_null('null') yields NULL, json_null('nonsense') fails

string_to_json('hello') yields 'hello' as JSON
number_to_json(3.14159) yields '3.14159' as JSON
bool_to_json(TRUE) yields 'true' as JSON
null_to_json(NULL) yields 'null' as JSON (kinda useless)

I wonder if these could all be reduced to two generic functions, like
json_to_value and value_to_json.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-06 Thread Tom Lane
Joseph Adams joeyadams3.14...@gmail.com writes:
 Perhaps there could be conversion functions.  E.g.:

Yeah, that's what I was thinking about.

 json_to_string('hello') yields 'hello'
 json_to_number('3.14159') yields '3.14159' as text
 (it is up to the user to cast it to the number type s/he wants)
 json_to_bool('true') yields TRUE
 json_to_null('null') yields NULL, json_null('nonsense') fails

 string_to_json('hello') yields 'hello' as JSON
 number_to_json(3.14159) yields '3.14159' as JSON
 bool_to_json(TRUE) yields 'true' as JSON
 null_to_json(NULL) yields 'null' as JSON (kinda useless)

The null cases seem a bit useless.  What might be helpful is to
translate JSON 'null' to and from SQL NULL in each of the other
conversions, in addition to their primary capability.

I'd go with using NUMERIC as the source/result type for the numeric
conversions.  Forcing people to insert explicit coercions from text
isn't going to be particularly convenient to use.

 I wonder if these could all be reduced to two generic functions, like
 json_to_value and value_to_json.

value_to_json(any) might work, but the other way could not; and it seems
better to keep some symmetry between the directions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-05 Thread Joseph Adams
Another JSON strictness issue:  the JSON standard (
http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
be an array or object.  However, my implementation currently accepts
any valid value.  Thus, '3', 'hello', 'true', 'false', and 'null'
are all accepted by my implementation, but are not strictly JSON text.
 The question is: should the JSON datatype accept atomic values (those
that aren't arrays or objects) as valid JSON?

I tried a few other JSON implementations to see where they stand
regarding atomic types as input:

JSON_checker (C) does not accept them.
JSON.parse() (JavaScript) accepts them.
json_decode() (PHP) accepts them.  However, support is currently buggy
(e.g. '1' is accepted, but '1 ' is not).
cJSON (C) accepts them.
JSON.pm (Perl) accepts them if you specify the allow_nonref option.
Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
a string by itself.

In my opinion, we should accept an atomic value as valid JSON content.
 I suppose we could get away with calling it a content fragment as
is done with XML without a doctype.

Accepting atomic values as valid JSON would be more orthagonal, as it
would be possible to have a function like this:

json_values(object_or_array JSON) RETURNS SETOF JSON
-- extracts values from an object or members from an array, returning
them as JSON fragments.

Also, should we go even further and accept key:value pairs by themselves? :

'key:value'::JSON

I don't think we should because doing so would be rather zany.  It
would mean JSON content could be invalid in value context, as in:

// JavaScript
var content = key : value;

I improved my JSON library.  It now only accepts strict, UTF-8 encoded
JSON values (that is, objects, arrays, strings, numbers, true, false,
and null).  It also has a json_decode_liberal() function that accepts
a string, cleans it up, and passes it through the stricter
json_decode().  json_decode_liberal() filters out comments, allows
single quoted strings, and accepts a lax number format compared to
strict JSON.  I may add Unicode repair to it later on, but
implementing that well really depends on what type of Unicode errors
appear in real life, I think.

http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2

My json.c is now 1161 lines long, so I can't quite call it small anymore.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-05 Thread Robert Haas
On Mon, Apr 5, 2010 at 11:50 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 In my opinion, we should accept an atomic value as valid JSON content.

That seems right to me.

 Also, should we go even further and accept key:value pairs by themselves? :

 'key:value'::JSON

Definitely not.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-05 Thread Petr Jelinek

Dne 6.4.2010 5:50, Joseph Adams napsal(a):

Another JSON strictness issue:  the JSON standard (
http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
be an array or object.  However, my implementation currently accepts
any valid value.  Thus, '3', 'hello', 'true', 'false', and 'null'
are all accepted by my implementation, but are not strictly JSON text.
  The question is: should the JSON datatype accept atomic values (those
that aren't arrays or objects) as valid JSON?
   


Not really sure about this myself, but keep in mind that NULL has 
special meaning in SQL.



Also, should we go even further and accept key:value pairs by themselves? :

'key:value'::JSON

   


No, especially considering that '{key:value}' is a valid JSON value.


I improved my JSON library.  It now only accepts strict, UTF-8 encoded
JSON values (that is, objects, arrays, strings, numbers, true, false,
and null).
   


Just a note, but PostgreSQL has some UTF-8 validation code, you might 
want to look at it maybe, at least once you start the actual integration 
into core, so that you are not reinventing too many wheels. I can see 
how your own code is good thing for general library which this can (and 
I am sure will be) used as, but for the datatype itself, it might be 
better idea to use what's already there, unless it's somehow 
incompatible of course.


--
Regards
Petr Jelinek (PJMODOS)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-03 Thread Joseph Adams
I've been wondering whether the JSON datatype should be strict or conservative.

For one, there's strict JSON (following the exact specification).
Then there's more conservative JSON variants.  Some JSON parsers
support comments, some support invalid number formats (e.g. '3.' or
'+5'), etc..

The consensus seems to be that JSON content should be stored verbatim
(it should store the exact string the client sent to it), as is done
with XML.  However, this notion is somewhat incompatible with Be
conservative in what you do; be liberal in what you accept from
others because we can't accept loose JSON, then spit out conservative
JSON without messing with the content.

Here's my idea: the datatype should only allow strict JSON, but there
should be a function that accepts a liberal format, cleans it up to
make it strict JSON, and converts it to JSON.  I think making strict
JSON the default makes the most sense because:
 * Inputs to the database will most likely be coming from programs, not humans.
 * Output is expected to be valid JSON and work anywhere JSON should work.
 * Strict JSON is what more people would expect, I'd think.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-03 Thread Mike Rylander
On Sat, Apr 3, 2010 at 8:59 AM, Joseph Adams joeyadams3.14...@gmail.com wrote:
 I've been wondering whether the JSON datatype should be strict or 
 conservative.

 For one, there's strict JSON (following the exact specification).
 Then there's more conservative JSON variants.  Some JSON parsers
 support comments, some support invalid number formats (e.g. '3.' or
 '+5'), etc..

 The consensus seems to be that JSON content should be stored verbatim
 (it should store the exact string the client sent to it), as is done
 with XML.  However, this notion is somewhat incompatible with Be
 conservative in what you do; be liberal in what you accept from
 others because we can't accept loose JSON, then spit out conservative
 JSON without messing with the content.

 Here's my idea: the datatype should only allow strict JSON, but there
 should be a function that accepts a liberal format, cleans it up to
 make it strict JSON, and converts it to JSON.  I think making strict
 JSON the default makes the most sense because:
  * Inputs to the database will most likely be coming from programs, not 
 humans.
  * Output is expected to be valid JSON and work anywhere JSON should work.
  * Strict JSON is what more people would expect, I'd think.

+1

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-03 Thread Andrew Dunstan



Mike Rylander wrote:


Here's my idea: the datatype should only allow strict JSON, but there
should be a function that accepts a liberal format, cleans it up to
make it strict JSON, and converts it to JSON.  I think making strict
JSON the default makes the most sense because:
 * Inputs to the database will most likely be coming from programs, not humans.
 * Output is expected to be valid JSON and work anywhere JSON should work.
 * Strict JSON is what more people would expect, I'd think.



+1

  


Yeah. That's the only thing that makes sense to me. We don't allow badly 
formed XML, for example, although we do allow document fragments (as 
required by the standard, IIRC). But we could sensibly have some 
function like 'cleanup_json(almost_json text) returns json'.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-02 Thread David E. Wheeler
On Apr 1, 2010, at 9:34 PM, Petr Jelinek wrote:

 I ended up reinventing the wheel and writing another JSON library:
 
 http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2
 
 This is a first release, and it doesn't really have a name besides
 json.  It's very similar to cJSON, except it is (sans unknown bugs)
 more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
  It has a simple test suite.  It is not prone to stack overflows, as
 it doesn't recurse.  It is strict, requires input to be UTF-8 (it
 validates it first) and only outputs UTF-8.  Other than treating
 numbers liberally, my implementation only accepts valid JSON code (it
 doesn't try to correct anything, even Unicode problems).  It is under
 the MIT license.
   
 
 I did some testing on my own, it passed everything I have thrown at it so far.
 I also did tests using MSVC for both 32bit and 64bit targets and it worked 
 fine too (except for missing stdbool.h in msvc which is no big deal).
 
 The coding style compared to cJSON (or other libs I've seen) seems closer to 
 the style of PostgreSQL, it would however still require pgindent run and 
 maybe some minor adjustments.

Someone approve this project for the GSoC quick, before Joseph finishes it!

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-04-01 Thread Petr Jelinek

Dne 1.4.2010 5:39, Joseph Adams napsal(a):

I ended up reinventing the wheel and writing another JSON library:

http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2

This is a first release, and it doesn't really have a name besides
json.  It's very similar to cJSON, except it is (sans unknown bugs)
more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
  It has a simple test suite.  It is not prone to stack overflows, as
it doesn't recurse.  It is strict, requires input to be UTF-8 (it
validates it first) and only outputs UTF-8.  Other than treating
numbers liberally, my implementation only accepts valid JSON code (it
doesn't try to correct anything, even Unicode problems).  It is under
the MIT license.
   


I did some testing on my own, it passed everything I have thrown at it 
so far.
I also did tests using MSVC for both 32bit and 64bit targets and it 
worked fine too (except for missing stdbool.h in msvc which is no big deal).


The coding style compared to cJSON (or other libs I've seen) seems 
closer to the style of PostgreSQL, it would however still require 
pgindent run and maybe some minor adjustments.


--
Regards
Petr Jelinek (PJMODOS)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-31 Thread Chris Browne
robertmh...@gmail.com (Robert Haas) writes:
 On Tue, Mar 30, 2010 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote:
 I'd think that you could get quite a long ways on this, at least doing
 something like dbslayer without *necessarily* needing to do terribly
 much work inside the DB engine.

 There's actually an HTTP framework tool for Postgres which already does
 something of the sort.  It was introduced at pgCon 2 years ago ... will
 look for.

 While it might be interesting to have/find/write a tool that puts an
 HTTP/JSON layer around the DB connection, it's pretty much entirely
 unrelated to the proposed project of creating a json type with
 PostgreSQL analagous to the xml type we already have, which is what
 the OP is proposing to do.

 Personally, I suspect that a JSON type is both a more interesting
 project to work on and a more useful result for this community.

No disagreement here; I'd expect that a JSON type would significantly
ease building such a framework.  Indeed, that could be a demonstration
of success...  

We then implemented an HTTP/JSON proxy in 27 lines of Python code...
:-)
-- 
Unless  you used  NetInfo.   _Then_ changing  network settings  could
often require torching  of the existing system, salting  of the ground
it had rested on, and termination of anyone who used it.
-- JFW jwi...@biff.com on comp.sys.next.advocacy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-31 Thread Joseph Adams
I ended up reinventing the wheel and writing another JSON library:

http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2

This is a first release, and it doesn't really have a name besides
json.  It's very similar to cJSON, except it is (sans unknown bugs)
more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
 It has a simple test suite.  It is not prone to stack overflows, as
it doesn't recurse.  It is strict, requires input to be UTF-8 (it
validates it first) and only outputs UTF-8.  Other than treating
numbers liberally, my implementation only accepts valid JSON code (it
doesn't try to correct anything, even Unicode problems).  It is under
the MIT license.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-30 Thread Chris Browne
joeyadams3.14...@gmail.com (Joseph Adams) writes:
 I introduced myself in the thread Proposal: access control jails (and
 introduction as aspiring GSoC student), and we discussed jails and
 session-local variables.  But, as Robert Haas suggested, implementing
 variable support in the backend would probably be way too ambitious a
 project for a newbie like me.  I decided instead to pursue the task of
 adding JSON support to PostgreSQL, hence the new thread.

Interesting...

I had a discussion about much this sort of thing with a local LUG
associate; he was interested in this from a doing CouchDB-ish things
using PostgreSQL perspective.

There were a couple perspectives there, which may be somewhat orthogonal
to what you're trying to do.  I'll mention them as they may suggest
useful operations.

1.  Buddy Myles pointed out a NYTimes project which does something
pretty analagous...
http://code.nytimes.com/projects/dbslayer

This is a proxy that allows clients to submit requests via HTTP,
returning responses in JSON form.  Note that the HTTP request has the
SQL query embedded into it.

2.  CouchDB's interface is much the same, where clients submit HTTP
requests and receive JSON responses back, but with the difference that
the query is a stylized sorta-JSON form.

I'd think that you could get quite a long ways on this, at least doing
something like dbslayer without *necessarily* needing to do terribly
much work inside the DB engine.

Mapping a tuple, or a list of tuples, into a forest of JSON documents
should be pretty straightforward; whether or not it's really desirable
to operate a JSON-flavoured query inside PostgreSQL may be the
difference between *this year's* GSOC and *next year's* :-).
-- 
...the  Jedi learned  early   on  what  language   the universe   was
programmed in. Then they took advantage of  an accident of language to
obscure this  fact from the  unwashed.  They all affected  an inverted
lisp. so, a Jedi to be, you the Forth must use.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-30 Thread Josh Berkus

 I'd think that you could get quite a long ways on this, at least doing
 something like dbslayer without *necessarily* needing to do terribly
 much work inside the DB engine.

There's actually an HTTP framework tool for Postgres which already does
something of the sort.  It was introduced at pgCon 2 years ago ... will
look for.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-30 Thread Robert Haas
On Tue, Mar 30, 2010 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote:
 I'd think that you could get quite a long ways on this, at least doing
 something like dbslayer without *necessarily* needing to do terribly
 much work inside the DB engine.

 There's actually an HTTP framework tool for Postgres which already does
 something of the sort.  It was introduced at pgCon 2 years ago ... will
 look for.

While it might be interesting to have/find/write a tool that puts an
HTTP/JSON layer around the DB connection, it's pretty much entirely
unrelated to the proposed project of creating a json type with
PostgreSQL analagous to the xml type we already have, which is what
the OP is proposing to do.

Personally, I suspect that a JSON type is both a more interesting
project to work on and a more useful result for this community.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-30 Thread Andrew Dunstan



Robert Haas wrote:

While it might be interesting to have/find/write a tool that puts an
HTTP/JSON layer around the DB connection, it's pretty much entirely
unrelated to the proposed project of creating a json type with
PostgreSQL analagous to the xml type we already have, which is what
the OP is proposing to do.

Personally, I suspect that a JSON type is both a more interesting
project to work on and a more useful result for this community.


  


I agree.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Dimitri Fontaine
Hi,

Joseph Adams joeyadams3.14...@gmail.com writes:
 As for reinventing the wheel, I'm in the process of writing yet
 another JSON implementation simply because I didn't find the other
 ones I looked at palatable.

Even this one (ANSI C, MIT Licenced)?

  cJSON -- An ultra-lightweight, portable, single-file, simple-as-can-be
  ANSI-C compliant JSON parser, under MIT license.

  http://sourceforge.net/projects/cjson/
  http://cjson.svn.sourceforge.net/viewvc/cjson/README?revision=7view=markup
  http://cjson.svn.sourceforge.net/viewvc/cjson/cJSON.c?revision=33view=markup

And from the cJSON.h we read that it could be somewhat easy to integrate
into PostgreSQL's memory management:
   56 typedef struct cJSON_Hooks {
   57   void *(*malloc_fn)(size_t sz);
   58   void (*free_fn)(void *ptr);
   59 } cJSON_Hooks;

Just adding some data points, hoping that's not adding only confusion.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Peter Eisentraut
On sön, 2010-03-28 at 23:24 -0400, Joseph Adams wrote:
 Thus, here's an example of how (in my opinion) character sets and such
 should be handled in the JSON code:
 
 Suppose the client's encoding is UTF-16, and the server's encoding is
 Latin-1.  When JSON is stored to the database:
  1. The client is responsible and sends a valid UTF-16 JSON string.
  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
 it to UTF-8.
  3. The JSON code parses it (to ensure it's valid).
  4. The JSON code unparses it (to get a representation without
 needless whitespace).  It is given a flag indicating it should only
 output ASCII text.
  5. The ASCII is stored in the server, since it is valid Latin-1.
 
 When JSON is retrieved from the database:
  1. ASCII is retrieved from the server
  2. If user needs to extract one or more fields, the JSON is parsed,
 and the fields are extracted.
  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

The problem I see here is that a data type output function is normally
not aware of the client encoding.  The alternatives that I see is that
you always escape everything you see to plain ASCII, so it's valid in
every server encoding, but that would result in pretty sad behavior for
users of languages that don't use a lot of ASCII characters, or you
decree a nonstandard JSON variant that momentarily uses whatever
encoding you decide.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Andrew Dunstan



Robert Haas wrote:

I feel pretty strongly that the data should be stored in the database
in the format in which it will be returned to the user - any
conversion which is necessary should happen on the way in.  I am not
100% sure to what extent we should attempt to canonicalize the input
and to what extend we should simply store it in whichever way the user
chooses to provide it.

  


ISTM that implies that, with a possible exception when the server 
encoding is utf8, you would have to \u escape the data on the way in 
fairly pessimistically.


I'd be inclined to say we should store and validate it exactly as the 
client gives it to us (converted to the server encoding, as it would be, 
of course). In practice that would mean that for non-utf8 databases the 
client would need to \u escape it. I suspect most uses of this would be 
in utf8-encoded databases anyway.


I also think we should provide a function to do the escaping, so users 
could do something like:


   insert into foo (myjson) values (json_escape('some jason text here'));

I also thought about a switch to turn on \u escaping on output - that 
might be useful for pg_dump for instance.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 My reasoning for It should be built-in is:
  * It would be nice to have a built-in serialization format that's
 available by default.
  * It might be a little faster because it doesn't have to link to an
 external library.

 I don't think either of these reasons is valid.

FWIW, our track record with relying on external libraries has been less
than great --- upstream will maintain it sounds good but has fallen
over with respect to both the regex engine and the snowball stemmers,
to take two examples.  And libxml2 has been nothing but a source of pain.

If this is going to end up being one fairly small C file implementing
a spec that is not a moving target, I'd vote against depending on an
external library instead, no matter how spiffy and license-compatible
the external library might be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Robert Haas
On Mon, Mar 29, 2010 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 My reasoning for It should be built-in is:
  * It would be nice to have a built-in serialization format that's
 available by default.
  * It might be a little faster because it doesn't have to link to an
 external library.

 I don't think either of these reasons is valid.

 FWIW, our track record with relying on external libraries has been less
 than great --- upstream will maintain it sounds good but has fallen
 over with respect to both the regex engine and the snowball stemmers,
 to take two examples.  And libxml2 has been nothing but a source of pain.

 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

Fair enough.  Note that I did go on to say which reasons I did think
were potentially valid.  ;-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Josh Berkus
On 3/28/10 8:52 PM, Hitoshi Harada wrote:
 There's another choice, called BSON.

 http://www.mongodb.org/display/DOCS/BSON

 I've not researched it yet deeply, it seems reasonable to be stored in
 databases as it is invented for MongoDB.

I wouldn't take that for granted.  The MongoDB project involves a lot of
re-inventing the wheel and I'd scrutinize any of their innovations
pretty thoroughly.

Besides, I thought the point of a JSON type was to be compatible with
the *majority* of JSON users?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread David E. Wheeler
On Mar 29, 2010, at 9:02 AM, Tom Lane wrote:

 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

Perhaps you could fork one, in that case.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Joseph Adams
On Mon, Mar 29, 2010 at 2:23 PM, David E. Wheeler da...@kineticode.com wrote:
 On Mar 29, 2010, at 9:02 AM, Tom Lane wrote:

 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

 Perhaps you could fork one, in that case.

 Best,

 David



I'm considering using and adapting cJSON instead of continuing with my
redundant implementation.  I could run `indent -kr -i4` on it (will
that match PostgreSQL's coding style?), add support for UTF-16
surrogate pairs (pair of \u... escapes for each character above U+
as required by the JSON spec), and add a switch to turn on/off pure
ASCII output.

P.S.: Sorry for the repeat, David.  I forgot to CC the mailing list.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

My understanding is that it's possible to include (fork) a MIT or BSD
source code into our source tree, right? (Some other licenses certainly
apply too).

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-29 Thread Tom Lane
Dimitri Fontaine dfonta...@hi-media.com writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 If this is going to end up being one fairly small C file implementing
 a spec that is not a moving target, I'd vote against depending on an
 external library instead, no matter how spiffy and license-compatible
 the external library might be.

 My understanding is that it's possible to include (fork) a MIT or BSD
 source code into our source tree, right? (Some other licenses certainly
 apply too).

MIT or 2-clause BSD would be ok for such a thing, other licenses
probably not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I'm wondering whether the internal representation of JSON should be
 plain JSON text, or some binary code that's easier to traverse and
 whatnot.  For the sake of code size, just keeping it in text is
 probably best.

+1 for text.

 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.

 As far as character encodings, I'd rather keep that out of the JSON
 parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
 I'll just throw encode/decode/validate operations at it.

I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Robert Haas wrote:

On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
  

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot.  For the sake of code size, just keeping it in text is
probably best.



+1 for text.
  


Agreed.
  

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does).



Why?  I'm not saying you aren't right, but you need to make an
argument rather than an assertion.  This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.
  



Yeah, why? We should not be in the business of reinventing the wheel 
(and then maintaining the reinvented wheel), unless the code in question 
is *really* small.


  

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.



I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.


  



The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple I'll just throw some encoding/decoding function calls at it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 I think you need to assume that the encoding will be the server
 encoding, not UTF-8.  Although others on this list are better
 qualified to speak to that than I am.

 The trouble is that JSON is defined to be specifically Unicode, and in 
 practice for us that means UTF8 on the server side.  It could get a bit 
 hairy, and it's definitely not something I think you can wave away with 
 a simple I'll just throw some encoding/decoding function calls at it.

It's just text, no?  Are there any operations where this actually makes
a difference?

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Robert Haas wrote:


I think you need to assume that the encoding will be the server
encoding, not UTF-8.  Although others on this list are better
qualified to speak to that than I am.
  


  
The trouble is that JSON is defined to be specifically Unicode, and in 
practice for us that means UTF8 on the server side.  It could get a bit 
hairy, and it's definitely not something I think you can wave away with 
a simple I'll just throw some encoding/decoding function calls at it.



It's just text, no?  Are there any operations where this actually makes
a difference?
  


If we're going to provide operations on it that might involve some. I 
don't know.

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

  


I agree entirely. All I'm suggesting is that there could be many 
wrinkles here.


Here's another thought. Given that JSON is actually specified to consist 
of a string of Unicode characters, what will we deliver to the client 
where the client encoding is, say Latin1? Will it actually be a legal 
JSON byte stream?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist 
 of a string of Unicode characters, what will we deliver to the client 
 where the client encoding is, say Latin1? Will it actually be a legal 
 JSON byte stream?

No, it won't.  We will *not* be sending anything but latin1 in such a
situation, and I really couldn't care less what the JSON spec says about
it.  Delivering wrongly-encoded data to a client is a good recipe for
all sorts of problems, since the client-side code is very unlikely to be
expecting that.  A datatype doesn't get to make up its own mind whether
to obey those rules.  Likewise, data on input had better match
client_encoding, because it's otherwise going to fail the encoding
checks long before a json datatype could have any say in the matter.

While I've not read the spec, I wonder exactly what consist of a string
of Unicode characters should actually be taken to mean.  Perhaps it
only means that all the characters must be members of the Unicode set,
not that the string can never be represented in any other encoding.
There's more than one Unicode encoding anyway...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist
 of a string of Unicode characters, what will we deliver to the client
 where the client encoding is, say Latin1? Will it actually be a legal
 JSON byte stream?

 No, it won't.  We will *not* be sending anything but latin1 in such a
 situation, and I really couldn't care less what the JSON spec says about
 it.  Delivering wrongly-encoded data to a client is a good recipe for
 all sorts of problems, since the client-side code is very unlikely to be
 expecting that.  A datatype doesn't get to make up its own mind whether
 to obey those rules.  Likewise, data on input had better match
 client_encoding, because it's otherwise going to fail the encoding
 checks long before a json datatype could have any say in the matter.

 While I've not read the spec, I wonder exactly what consist of a string
 of Unicode characters should actually be taken to mean.  Perhaps it
 only means that all the characters must be members of the Unicode set,
 not that the string can never be represented in any other encoding.
 There's more than one Unicode encoding anyway...

See sections 2.5 and 3 of:

http://www.ietf.org/rfc/rfc4627.txt?number=4627

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander
On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Here's another thought. Given that JSON is actually specified to consist
 of a string of Unicode characters, what will we deliver to the client
 where the client encoding is, say Latin1? Will it actually be a legal
 JSON byte stream?

 No, it won't.  We will *not* be sending anything but latin1 in such a
 situation, and I really couldn't care less what the JSON spec says about
 it.  Delivering wrongly-encoded data to a client is a good recipe for
 all sorts of problems, since the client-side code is very unlikely to be
 expecting that.  A datatype doesn't get to make up its own mind whether
 to obey those rules.  Likewise, data on input had better match
 client_encoding, because it's otherwise going to fail the encoding
 checks long before a json datatype could have any say in the matter.

 While I've not read the spec, I wonder exactly what consist of a string
 of Unicode characters should actually be taken to mean.  Perhaps it
 only means that all the characters must be members of the Unicode set,
 not that the string can never be represented in any other encoding.
 There's more than one Unicode encoding anyway...

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.

Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely.  If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \u equivalents.  I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)

UPDATE:  Thanks, Robert, for pointing to the RFC.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
 In practice, every parser/serializer I've used (including the one I
 helped write) allows (and, often, forces) any non-ASCII character to
 be encoded as \u followed by a string of four hex digits.

Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Robert Haas wrote:

On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
  

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in utf8 
is 4 (and that's pretty rare, IIRC).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Andrew Dunstan



Andrew Dunstan wrote:



Robert Haas wrote:
On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com 
wrote:
 

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.



Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants?  If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves.  I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.


  


We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in 
utf8 is 4 (and that's pretty rare, IIRC).





Sorry. Disregard this. I see what you mean.

Yeah, I thing *requiring* non-ascii character to be escaped would be evil.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Mike Rylander
On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander mrylan...@gmail.com wrote:
 In practice, every parser/serializer I've used (including the one I
 helped write) allows (and, often, forces) any non-ASCII character to
 be encoded as \u followed by a string of four hex digits.

 Is it correct to say that the only feasible place where non-ASCII
 characters can be used is within string constants?

Yes.  That includes object property strings -- they are quoted string literals.

 If so, it might be
 reasonable to disallow characters with the high-bit set unless the
 server encoding is one of the flavors of Unicode of which the spec
 approves.  I'm tempted to think that when the server encoding is
 Unicode we really ought to allow Unicode characters natively, because
 turning a long string of two-byte wide chars into a long string of
 six-byte wide chars sounds pretty evil from a performance point of
 view.


+1

As an aside, \u-encoded (escaped) characters and native multi-byte
sequences (of any RFC-allowable Unicode encoding) are exactly
equivalent in JSON -- it's a storage and transmission format, and
doesn't prescribe the application-internal representation of the data.

If it's faster (which it almost certainly is) to not mangle the data
when it's all staying server side, that seems like a useful
optimization.  For output to the client, however, it would be useful
to provide a \u-escaping function, which (AIUI) should always be safe
regardless of client encoding.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Joseph Adams
On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

 Why?  I'm not saying you aren't right, but you need to make an
 argument rather than an assertion.  This is a community, so no one is
 entitled to decide anything unilaterally, and people want to be
 convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback.  I didn't want people to have to wade
through too many I thinks .  I'll be sure to use opinion tags in
the future :-)

My reasoning for It should be built-in is:
 * It would be nice to have a built-in serialization format that's
available by default.
 * It might be a little faster because it doesn't have to link to an
external library.
 * The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.
 * The externally-maintained and packaged libjson implementations I
saw brought in lots of dependencies (e.g. glib).
 * Everyone else (e.g. PHP) uses a statically-linked JSON implementation.

Is the code in question *really* small?  Well, not really, but it's
not enormous either.  By the way, I found a bug in PHP's JSON_parser
(json_decode(true ); /* with a space */ returns null instead of
true).  I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8.  I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... .  If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes.  Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1.  When JSON is stored to the database:
 1. The client is responsible and sends a valid UTF-16 JSON string.
 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
it to UTF-8.
 3. The JSON code parses it (to ensure it's valid).
 4. The JSON code unparses it (to get a representation without
needless whitespace).  It is given a flag indicating it should only
output ASCII text.
 5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:
 1. ASCII is retrieved from the server
 2. If user needs to extract one or more fields, the JSON is parsed,
and the fields are extracted.
 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable.  I am aiming for simple code, not fast
code.  I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks.  Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Robert Haas
On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I apologize; I was just starting the conversation with some of my
 ideas to receive feedback.  I didn't want people to have to wade
 through too many I thinks .  I'll be sure to use opinion tags in
 the future :-)

FWIW, I don't care at all whether you say I think or I know; the
point is that you have to provide backup for any position you choose
to take.

 My reasoning for It should be built-in is:
  * It would be nice to have a built-in serialization format that's
 available by default.
  * It might be a little faster because it doesn't have to link to an
 external library.

I don't think either of these reasons is valid.

  * The code to interface between JSON logic and PostgreSQL will
 probably be much larger than the actual JSON encoding/decoding itself.

If true, this is a good argument.

  * The externally-maintained and packaged libjson implementations I
 saw brought in lots of dependencies (e.g. glib).

As is this.

  * Everyone else (e.g. PHP) uses a statically-linked JSON implementation.

But this isn't.

 Is the code in question *really* small?  Well, not really, but it's
 not enormous either.  By the way, I found a bug in PHP's JSON_parser
 (json_decode(true ); /* with a space */ returns null instead of
 true).  I'll have to get around to reporting that.

 Now, assuming JSON support is built-in to PostgreSQL and is enabled by
 default, it is my opinion that encoding issues should not be dealt
 with in the JSON code itself, but that the JSON code itself should
 assume UTF-8.  I think conversions should be done to/from UTF-8 before
 passing it through the JSON code because this would likely be the
 smallest way to implement it (not necessarily the fastest, though).

 Mike Rylander pointed out something wonderful, and that is that JSON
 code can be stored in plain old ASCII using \u... .  If a target
 encoding supports all of Unicode, the JSON serializer could be told
 not to generate \u escapes.  Otherwise, the \u escapes would be
 necessary.

 Thus, here's an example of how (in my opinion) character sets and such
 should be handled in the JSON code:

 Suppose the client's encoding is UTF-16, and the server's encoding is
 Latin-1.  When JSON is stored to the database:
  1. The client is responsible and sends a valid UTF-16 JSON string.
  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
 it to UTF-8.
  3. The JSON code parses it (to ensure it's valid).
  4. The JSON code unparses it (to get a representation without
 needless whitespace).  It is given a flag indicating it should only
 output ASCII text.
  5. The ASCII is stored in the server, since it is valid Latin-1.

 When JSON is retrieved from the database:
  1. ASCII is retrieved from the server
  2. If user needs to extract one or more fields, the JSON is parsed,
 and the fields are extracted.
  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

 Note that I am being biased toward optimizing code size rather than speed.

Can you comment on my proposal elsewhere on this thread and compare
your proposal to mine?  In what ways are they different, and which is
better, and why?

 Here's a question about semantics: should converting JSON to text
 guarantee that Unicode will be \u escaped, or should it render actual
 Unicode whenever possible (when the client uses a Unicode-complete
 charset) ?

I feel pretty strongly that the data should be stored in the database
in the format in which it will be returned to the user - any
conversion which is necessary should happen on the way in.  I am not
100% sure to what extent we should attempt to canonicalize the input
and to what extend we should simply store it in whichever way the user
chooses to provide it.

 As for reinventing the wheel, I'm in the process of writing yet
 another JSON implementation simply because I didn't find the other
 ones I looked at palatable.  I am aiming for simple code, not fast
 code.  I am using malloc for structures and realloc for strings/arrays
 rather than resorting to clever buffering tricks.  Of course, I'll
 switch it over to palloc/repalloc before migrating it to PostgreSQL.

I'm not sure that optimizing for simplicity over speed is a good idea.
 I think we can reject implementations as unpalatable because they are
slow or feature-poor or have licensing issues or are not actively
maintained, but rejecting them because they use complex code in order
to be fast doesn't seem like the right trade-off to me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Add JSON support

2010-03-28 Thread Hitoshi Harada
2010/3/29 Andrew Dunstan and...@dunslane.net:
 Robert Haas wrote:
 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I'm wondering whether the internal representation of JSON should be
 plain JSON text, or some binary code that's easier to traverse and
 whatnot.  For the sake of code size, just keeping it in text is
 probably best.

 +1 for text.

 Agreed.

There's another choice, called BSON.

http://www.mongodb.org/display/DOCS/BSON

I've not researched it yet deeply, it seems reasonable to be stored in
databases as it is invented for MongoDB.

 Now my thoughts and opinions on the JSON parsing/unparsing itself:

 It should be built-in, rather than relying on an external library
 (like XML does).

 Why?  I'm not saying you aren't right, but you need to make an
 argument rather than an assertion.  This is a community, so no one is
 entitled to decide anything unilaterally, and people want to be
 convinced - including me.

 Yeah, why? We should not be in the business of reinventing the wheel (and
 then maintaining the reinvented wheel), unless the code in question is
 *really* small.

Many implementations in many languages of JSON show that parsing JSON
is not so difficult to code and the needs vary. Hence, I wonder if we
can have it very our own.

Never take it wrongly, I don't disagree text format nor disagree to
use an external library.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers