Re: [HACKERS] JSON for PG 9.2
On Tue, 2012-01-31 at 12:58 -0500, Andrew Dunstan wrote: On 01/30/2012 10:37 AM, Andrew Dunstan wrote: Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. I started with this, again by analogy with query_to_xml(). But I agree it's a bit ugly. If we're not going to do it, then we definitely need to look at caching the output funcs in the function info. A closer approximation is actually: SELECT array_to_json(array_agg(q)) FROM ( your query here ) q; But then I'd want the ability to break that up a bit with line feeds, so we'd need to adjust the interface slightly. (Hint: don't try the above with select * from pg_class.) I'll wait on further comments, but I can probably turn these changes around very quickly once we're agreed. based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array(yourquery) call in SELECT array_to_json(array(yourquery)); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['row1[,rowN]']' ? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 04/16/2012 09:34 AM, Hannu Krosing wrote: based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array(yourquery) call in SELECT array_to_json(array(yourquery)); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['row1[,rowN]']' ? The way I usually write this is: select array_to_json(array_agg(q)) from (yourquery) q; It's a pity you didn't make this comment back in January when we were talking about this. I think it's too late now in this release cycle to be talking about adding the aggregate function. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Apr 16, 2012 at 9:10 AM, Andrew Dunstan and...@dunslane.net wrote: On 04/16/2012 09:34 AM, Hannu Krosing wrote: based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array(yourquery) call in SELECT array_to_json(array(yourquery)); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['row1[,rowN]']' ? The way I usually write this is: select array_to_json(array_agg(q)) from (yourquery) q; It's a pity you didn't make this comment back in January when we were talking about this. I think it's too late now in this release cycle to be talking about adding the aggregate function. I find array_agg to be pretty consistently slower than array()...although not much, say around 5-10%. I use array_agg only when grouping. try timing select array_to_json(array_agg(v)) from (select v from generate_series(1,100) v) q; vs select array_to_json(array(select v from generate_series(1,100) v)); I agree with Hannu but as things stand if I'm trying to avoid the extra buffer I've found myself doing the final aggregation on the client -- it's easy enough. BTW, I'm using the json stuff heavily and it's just absolutely fantastic. Finally I can write web applications without wondering exactly where it was that computer science went off the rails. I've already demoed a prototype app that integrates pg directly with the many high quality js libraries out there and it makes things very easy and quick by making writing data services trivial. Data pushes are still quite a pain but I figure something can be worked out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, 2012-04-16 at 10:10 -0400, Andrew Dunstan wrote: On 04/16/2012 09:34 AM, Hannu Krosing wrote: based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. The only comment I have here is that query_to_json could have been replaced with json_agg, so thet you don't need to do double-buffering for the results of array(yourquery) call in SELECT array_to_json(array(yourquery)); Or is there some other way to avoid it except to wrap row_to_json() calls in own aggregate function which adds enclosing brackets and comma separator ( like this : '['row1[,rowN]']' ? The way I usually write this is: select array_to_json(array_agg(q)) from (yourquery) q; It's a pity you didn't make this comment back in January when we were talking about this. I think it's too late now in this release cycle to be talking about adding the aggregate function. My comment is not meant to propose changing anything in 9.2. I think what we have here is absolutely fantastic :) If doing something in 9.3 then what I would like is some way to express multiple queries. Basically a variant of query_to_json(query text[]) where queries would be evaluated in order and then all the results aggregated into on json object. But aggregation on client as suggested by Merlin may be a better way to do it for larger result(set)s. Especially as it could enable streaming of the resultsets without having to first buffer everything on the server. If we can add something, then perhaps a deeper pretty_print feature samples: hannu=# \d test Table public.test Column |Type | Modifiers +-+--- id | integer | not null default nextval('test_id_seq'::regclass) data | text| tstamp | timestamp without time zone | default now() Indexes: test_pkey PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test from test),true); -[ RECORD 1 ]-+ array_to_json | [{id:1,data:0.262814193032682,tstamp:2012-04-05 13:21:03.235204}, | {id:2,data:0.157406373415142,tstamp:2012-04-05 13:21:05.2033}] This is OK hannu=# \d test2 Table public.test2 Column |Type | Modifiers +-+ id | integer | not null default nextval('test2_id_seq'::regclass) data2 | test| tstamp | timestamp without time zone | default now() Indexes: test2_pkey PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test2 from test2),true); -[ RECORD 1 ]-+--- array_to_json | [{id:1,data2:{id:1,data:0.262814193032682,tstamp:2012-04-05 13:21:03.235204},tstamp:2012-04-05 13:25:03.644497}, | {id:2,data2:{id:2,data:0.157406373415142,tstamp:2012-04-05 13:21:05.2033},tstamp:2012-04-05 13:25:03.644497}] This is kind of OK hannu=# \d test3 Table public.test3 Column |Type | Modifiers +-+ id | integer | not null default nextval('test3_id_seq'::regclass) data3 | test2[] | tstamp | timestamp without time zone | default now() Indexes: test3_pkey PRIMARY KEY, btree (id) hannu=# select array_to_json(array(select test3 from test3),true); -[ RECORD 1 ]-+--- array_to_json | [{id:1,data3:[{id:1,data2:{id:1,data:0.262814193032682,tstamp:2012-04-05 13:21:03.235204},tstamp:2012-04-05 13:25:03.644497},{id:2,data2:{id:2,data:0.157406373415142,tstamp:2012-04-05 13:21:05.2033},tstamp:2012-04-05 13:25:03.644497}],tstamp:2012-04-16 14:40:15.795947}] but this would be nicer if printed like pythons pprint : pprint.pprint(d) [{'data3': [{'data2': {'data': '0.262814193032682', 'id': 1, 'tstamp': '2012-04-05 13:21:03.235204'}, 'id': 1, 'tstamp': '2012-04-05 13:25:03.644497'}, {'data2': {'data': '0.157406373415142', 'id': 2, 'tstamp': '2012-04-05 13:21:05.2033'}, 'id': 2,
Re: [HACKERS] JSON for PG 9.2
On Mon, Apr 16, 2012 at 11:19 AM, Hannu Krosing ha...@2ndquadrant.com wrote: If doing something in 9.3 then what I would like is some way to express multiple queries. Basically a variant of query_to_json(query text[]) where queries would be evaluated in order and then all the results aggregated into on json object. I personally don't like variants of to_json that push the query in as text. They defeat parameterization and have other issues. Another point for client side processing is the new row level processing in libpq, so I'd argue that if the result is big enough to warrant worring about buffering (and it'd have to be a mighty big json doc), the best bet is to extract it as rows. I'm playing around with node.js for the json serving and the sending code looks like this: var first = true; query.on('row', function(row) { if(first) { first = false; response.write('['); } else response.write(','); response.write(row.jsondata); }); query.on('end', function() { response.write(']'); response.end(); }); -- not too bad merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-02-01 11:28:50 -0500, robertmh...@gmail.com wrote: It's also pretty clear that JSON string - PG text data type is going to admit of a number of error conditions (transcoding errors and perhaps invalid surrogate pairs) so throwing one more on the pile doesn't cost much. Hi Robert. I'm sorry for being slow, but I don't understand what you're proposing to do here (if anything). Could I ask you to explain, please? Are you talking about allowing the six literal bytes \u to be present in the JSON? If so, I agree, there seems to be no reason to disallow it. Are you also saying we should allow any \u sequence, without checking for errors (e.g. invalid surrogate pairs or parts thereof)? And what transcoding errors are you referring to? -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen a...@toroid.org wrote: At 2012-02-01 11:28:50 -0500, robertmh...@gmail.com wrote: It's also pretty clear that JSON string - PG text data type is going to admit of a number of error conditions (transcoding errors and perhaps invalid surrogate pairs) so throwing one more on the pile doesn't cost much. I'm sorry for being slow, but I don't understand what you're proposing to do here (if anything). Could I ask you to explain, please? Are you talking about allowing the six literal bytes \u to be present in the JSON? If so, I agree, there seems to be no reason to disallow it. Are you also saying we should allow any \u sequence, without checking for errors (e.g. invalid surrogate pairs or parts thereof)? And what transcoding errors are you referring to? Consider the following JSON object: abc This is a JSON string. Someone is eventually going to propose a function with a name like json_to_string() which, when given this JSON object, returns a three-character string with the PostgreSQL text type. That's useful and I support it. But now suppose we pass this JSON object to that same function: a\u0062c The user will quite rightly expect that since this JSON object represents the same value as the first JSON object, they're going to get the same answer back from json_to_string(), namely abc. So far, so good. But now suppose we pass this JSON object to that same function: a\uc This is going to have to be an error condition, because PostgreSQL does not allow values of type text to contain embedded NUL characters. Now consider this: a\uABCDc Suppose that \uABCD represents a character that exists in Unicode, but the server-encoding is SHIFT-JIS or EUC-JP or some other system which has no equivalent for the character represented by \uABCD. Again, when json_to_string() is applied to a value of this type, it must fail. In other words, we're knowingly allowing JSON strings to contain characters which might not be representable as PostgreSQL strings, because JSON allows any Unicode character, and the server encoding might not be Unicode, and the server doesn't allow embedded NULs in any encoding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-02-02 08:54:32 -0500, robertmh...@gmail.com wrote: Someone is eventually going to propose a function with a name like json_to_string() which, when given this JSON object, returns a three-character string with the PostgreSQL text type. Ah, that's the bit I was missing. I thought you were talking about an immediate error condition. That's useful and I support it. Agreed. Also, now I understand that you are saying that json_to_string() (json_string_to_text?) would fail if the result couldn't be represented as a text in the current encoding, and that's sensible as well. I had misunderstood is going to admit of a number of error… in your mail. As for surrogate pairs, just to be clear, what I was proposing earlier in the thread was to change json.c:json_lex_string() to detect errors (e.g. only one half of a surrogate pair, which is the commonest error I've encountered in the wild) and reject such strings. Thanks for the explanation. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Jan 31, 2012 at 11:46 PM, Andrew Dunstan and...@dunslane.net wrote: The array(select...) locution turns out to have less flexibility than the array_agg(record-ref) locution. Less flexible maybe, but it can cleaner for exactly the type of queries that will tend to come up in exactly the type of functionality people are looking for with JSON output. libpqtypes does exactly the same stuff but for C clients -- so I've done tons of this kind of programming and am maybe a bit ahead of the curve here. Note: while the following contrived example may seem a bit complex it has a certain elegance and shows how the postgres type system can whip out document style 'nosql' objects to clients who can handle them. Perhaps there is more simplification through syntax possible, but as it stands things are pretty functional. The equivalent production through array_agg I find to be pretty awful looking although it can produce a better plan since it doesn't force everything through flattened subqueries: create table foo ( foo_id serial primary key, a int ); create table bar ( bar_id serial primary key, foo_id int references foo, b int ); create table baz ( baz_id serial primary key, bar_id int references bar, c int ); create type baz_t as ( c int ); create type bar_t as ( bazs baz_t[], b int ); create type foo_t as ( bars bar_t[], a int ); INSERT INTO foo(a) VALUES (1); INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 100); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 1000); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 2000); INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 200); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 3000); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 4000); INSERT INTO foo(a) VALUES (2); INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 300); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 5000); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 6000); INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 400); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 7000); INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 8000); -- nosql! select array( select row( array( select row( array( select row( c )::baz_t from baz where baz.bar_id = bar.bar_id )::baz_t[], b )::bar_t from bar where bar.foo_id = foo.foo_id )::bar_t[], a )::foo_t from foo )::foo_t[]; foo_t --- {(\{\\(\\{(1000),(2000)}\\,100)\\,\\(\\{(3000),(4000)}\\,200)\\}\,1),(\{\\(\\{(5000),(6000)}\\,300)\\,\\(\\{(7000),(8000)}\\,400)\\}\,2)} as you can see, the postgres default escaping format sucks for sending nested data -- throw even one quote or backslash in there and your data can explode in size 10+ times -- this is why we insisted on binary. json, of course, is much better suited for this type of communication. despite the complicated-ness look of the above, this type of code is in fact very easy to write once you get the knack. This type of coding also leads to much simpler coding on the cilent since relationships are directly built into the structure and don't have to be inferred or duplicated. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Jan 31, 2012 at 3:47 PM, Joey Adams joeyadams3.14...@gmail.com wrote: I'm mostly in favor of allowing \u. Banning \u means users can't use JSON strings to marshal binary blobs, e.g. by escaping non-printable characters and only using U+..U+00FF. Instead, they have to use base64 or similar. I agree. I mean, representing data using six bytes per source byte is a bit unattractive from an efficiency point of view, but I'm sure someone is going to want to do it. It's also pretty clear that JSON string - PG text data type is going to admit of a number of error conditions (transcoding errors and perhaps invalid surrogate pairs) so throwing one more on the pile doesn't cost much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Jan 30, 2012 at 9:37 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote: At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote: I've started reviewing this patch, but it'll take me a bit longer to go through json.c properly. OK, I finished reading json.c. I don't have an answer to the detoasting question in the XXX comment, but the code looks fine. Looking at somewhat analogous code in xml.c, it doesn't seem to be done there. SO maybe we don't need to worry about it. Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. I started with this, again by analogy with query_to_xml(). But I agree it's a bit ugly. If we're not going to do it, then we definitely need to look at caching the output funcs in the function info. A closer approximation is actually: SELECT array_to_json(array_agg(q)) FROM ( your query here ) q; yup -- although I'd probably write it like this most of the time: select array_to_json(array( query )); if we did have a 'query_to_json', the array() constructor would be a lot more pleasant to to deal with than a textual query for obvious reasons even though it's highly irregular syntax. however, since arrays can already handle it, I wouldn't miss it at all. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Jan 23, 2012 at 3:20 PM, Peter Eisentraut pete...@gmx.net wrote: On sön, 2012-01-22 at 11:43 -0500, Andrew Dunstan wrote: Actually, given recent discussion I think that test should just be removed from json.c. We don't actually have any test that the code point is valid (e.g. that it doesn't refer to an unallocated code point). We don't do that elsewhere either - the unicode_to_utf8() function the scanner uses to turn \u escapes into utf8 doesn't look for unallocated code points. I'm not sure how much other validation we should do - for example on correct use of surrogate pairs. We do check the correctness of surrogate pairs elsewhere. Search for surrogate in scan.l; should be easy to copy. I've committed a version of this that does NOT do surrogate pair validation. Per discussion elsewhere, I also removed the check for \u with 007F and database encoding != UTF8. This will complicate things somewhat when we get around to doing canonicalization and comparison, but Tom seems confident that those issues are manageble. I did not commit Andrew's further changes, either; I'm assuming he'll do that himself. With respect to the issue of whether we ought to check surrogate pairs, the JSON spec is not a whole lot of help. RFC4627 says: To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as \uD834\uDD1E. That fails to answer the question of what we ought to do if we get an invalid sequence there. You could make an argument that we ought to just allow it; it doesn't particularly hinder our ability to canonicalize or compare strings, because our notion of sort-ordering for characters that may span multiple encodings is going to be pretty funky anyway. We can just leave those bits as \u sequences and call it good. However, it would hinder our ability to convert a JSON string to a string in the database encoding: we could find an invalidate surrogate pair that was allowable as JSON but unrepresentable in the database encoding. On the flip side, given our decision to allow all \u sequences even when not using UTF-8, we could also run across a perfectly valid UTF-8 sequence that's not representable as a character in the server encoding, so it seems we have that problem anyway, so maybe it's not much worse to have two reasons why it can happen rather than one. On the third hand, most people are probably using UTF-8, and those people aren't going to have any transcoding issues, so the invalid surrogate pair case may be the only one they can hit (unless invalid code points are also an issue?), so maybe it's worth avoiding on that basis. Anyway, I defer to the wisdom of the collective on this one: how should we handle this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/30/2012 10:37 AM, Andrew Dunstan wrote: Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. I started with this, again by analogy with query_to_xml(). But I agree it's a bit ugly. If we're not going to do it, then we definitely need to look at caching the output funcs in the function info. A closer approximation is actually: SELECT array_to_json(array_agg(q)) FROM ( your query here ) q; But then I'd want the ability to break that up a bit with line feeds, so we'd need to adjust the interface slightly. (Hint: don't try the above with select * from pg_class.) I'll wait on further comments, but I can probably turn these changes around very quickly once we're agreed. based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Andrew, based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. If we do that, what would getting complete query results back from a query look like? It's important to make this as simple for developers as possible. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-01-31 12:04:31 -0500, robertmh...@gmail.com wrote: That fails to answer the question of what we ought to do if we get an invalid sequence there. I think it's best to categorically reject invalid surrogates as early as possible, considering the number of bugs that are related to them (not in Postgres, just in general). I can't see anything good coming from letting them in and leaving them to surprise someone in future. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkus j...@agliodbs.com wrote: Andrew, based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. If we do that, what would getting complete query results back from a query look like? It's important to make this as simple for developers as possible. two options: 1. row_to_json(rowvar) SELECT row_to_json(foo) from foo; SELECT row_to_json(row(a,b,c)) from foo; 2. array_to_json(array_agg()/array()) SELECT array_to_json(array(select foo from foo)); SELECT array_to_json(array[1,2,3]); #1 I expect will be the more used version -- most json handling client side api (for example node.js drivers) are optimized for row by row processing, but via #2 you can stuff a whole query into single json object if you're so inclined. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/31/2012 01:32 PM, Merlin Moncure wrote: On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkusj...@agliodbs.com wrote: Andrew, based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. If we do that, what would getting complete query results back from a query look like? It's important to make this as simple for developers as possible. two options: 1. row_to_json(rowvar) SELECT row_to_json(foo) from foo; SELECT row_to_json(row(a,b,c)) from foo; 2. array_to_json(array_agg()/array()) SELECT array_to_json(array(select foo from foo)); SELECT array_to_json(array[1,2,3]); #1 I expect will be the more used version -- most json handling client side api (for example node.js drivers) are optimized for row by row processing, but via #2 you can stuff a whole query into single json object if you're so inclined. You could also write a wrapper something like this: create function query_to_json(qtext text) returns json language plpgsql as $$ begin return query execute 'select array_to_json(array(' || qtext || '))'; end; $$; cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Jan 31, 2012 at 12:48 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/31/2012 01:32 PM, Merlin Moncure wrote: On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkusj...@agliodbs.com wrote: Andrew, based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. If we do that, what would getting complete query results back from a query look like? It's important to make this as simple for developers as possible. two options: 1. row_to_json(rowvar) SELECT row_to_json(foo) from foo; SELECT row_to_json(row(a,b,c)) from foo; 2. array_to_json(array_agg()/array()) SELECT array_to_json(array(select foo from foo)); SELECT array_to_json(array[1,2,3]); #1 I expect will be the more used version -- most json handling client side api (for example node.js drivers) are optimized for row by row processing, but via #2 you can stuff a whole query into single json object if you're so inclined. You could also write a wrapper something like this: create function query_to_json(qtext text) returns json language plpgsql as $$ begin return query execute 'select array_to_json(array(' || qtext || '))'; end; $$; right -- then you can leverage execute/using parameterization etc. and/or rig a variadic version. The major hole in functionality I see for heavy json users is the reverse; how do you get json back into the database? With xml, at least you could (ab)use xpath for that...with json you have to rely on add-on support and/or ad hoc string parsing (that is, unless I'm missing something -- I just noted Robert's commit of the JSON type). since we can do: select array_to_json(array(select foo from foo)); it seems natural to be able to want do do something like: WITH foos AS (SELECT a_json_var::foo[] AS f) ( INSERT INTO foo SELECT (f).* FROM foos ); Of course, you'd have to have non-anonymous (that is, defined with CREATE TYPE AS) types defined to receive all the data, but that's not so bad. Also, could xxx_to_json be hypothetically executed via casts? e.g. select array(select foo from foo)::json; merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/31/2012 02:49 PM, Merlin Moncure wrote: The major hole in functionality I see for heavy json users is the reverse; how do you get json back into the database? With xml, at least you could (ab)use xpath for that...with json you have to rely on add-on support and/or ad hoc string parsing (that is, unless I'm missing something -- I just noted Robert's commit of the JSON type). since we can do: select array_to_json(array(select foo from foo)); it seems natural to be able to want do do something like: WITH foos AS (SELECT a_json_var::foo[] AS f) ( INSERT INTO foo SELECT (f).* FROM foos ); Of course, you'd have to have non-anonymous (that is, defined with CREATE TYPE AS) types defined to receive all the data, but that's not so bad. Also, could xxx_to_json be hypothetically executed via casts? e.g. select array(select foo from foo)::json; At some stage there will possibly be some json-processing (as opposed to json-producing) functions, but not in 9.2 - it's too late for that. Until then there is PLV8: see http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html for example. Or someone could write an extension. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Jan 31, 2012 at 1:29 PM, Abhijit Menon-Sen a...@toroid.org wrote: At 2012-01-31 12:04:31 -0500, robertmh...@gmail.com wrote: That fails to answer the question of what we ought to do if we get an invalid sequence there. I think it's best to categorically reject invalid surrogates as early as possible, considering the number of bugs that are related to them (not in Postgres, just in general). I can't see anything good coming from letting them in and leaving them to surprise someone in future. -- ams +1 Another sequence to beware of is \u. While escaped NUL characters are perfectly valid in JSON, NUL characters aren't allowed in TEXT values. This means not all JSON strings can be converted to TEXT, even in UTF-8. This may also complicate collation, if comparison functions demand null-terminated strings. I'm mostly in favor of allowing \u. Banning \u means users can't use JSON strings to marshal binary blobs, e.g. by escaping non-printable characters and only using U+..U+00FF. Instead, they have to use base64 or similar. Banning \u doesn't quite violate the RFC: An implementation may set limits on the length and character contents of strings. -Joey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/31/2012 01:48 PM, Andrew Dunstan wrote: On 01/31/2012 01:32 PM, Merlin Moncure wrote: On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkusj...@agliodbs.com wrote: Andrew, based on Abhijit's feeling and some discussion offline, the consensus seems to be to remove query_to_json. If we do that, what would getting complete query results back from a query look like? It's important to make this as simple for developers as possible. two options: 1. row_to_json(rowvar) SELECT row_to_json(foo) from foo; SELECT row_to_json(row(a,b,c)) from foo; 2. array_to_json(array_agg()/array()) SELECT array_to_json(array(select foo from foo)); SELECT array_to_json(array[1,2,3]); #1 I expect will be the more used version -- most json handling client side api (for example node.js drivers) are optimized for row by row processing, but via #2 you can stuff a whole query into single json object if you're so inclined. You could also write a wrapper something like this: create function query_to_json(qtext text) returns json language plpgsql as $$ begin return query execute 'select array_to_json(array(' || qtext || '))'; end; $$; The array(select...) locution turns out to have less flexibility than the array_agg(record-ref) locution. Anyway, for those playing along, I have removed query_to_json, and added optional pretty printing to array_to_json and row_to_json. WIP can be seen at https://bitbucket.org/adunstan/pgdevel. I still have docs and output function caching to do, and should post a revised patch within the next 48 hours. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote: I've started reviewing this patch, but it'll take me a bit longer to go through json.c properly. OK, I finished reading json.c. I don't have an answer to the detoasting question in the XXX comment, but the code looks fine. Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote: At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote: I've started reviewing this patch, but it'll take me a bit longer to go through json.c properly. OK, I finished reading json.c. I don't have an answer to the detoasting question in the XXX comment, but the code looks fine. Looking at somewhat analogous code in xml.c, it doesn't seem to be done there. SO maybe we don't need to worry about it. Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. I started with this, again by analogy with query_to_xml(). But I agree it's a bit ugly. If we're not going to do it, then we definitely need to look at caching the output funcs in the function info. A closer approximation is actually: SELECT array_to_json(array_agg(q)) FROM ( your query here ) q; But then I'd want the ability to break that up a bit with line feeds, so we'd need to adjust the interface slightly. (Hint: don't try the above with select * from pg_class.) I'll wait on further comments, but I can probably turn these changes around very quickly once we're agreed. Cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-01-15 11:08:05 -0500, and...@dunslane.net wrote: Here's an update that adds row_to_json, plus a bit more cleanup. I've started reviewing this patch, but it'll take me a bit longer to go through json.c properly. Here are a few preliminary notes: 1. The patch has a lot of whitespace errors (primarily lines ending in whitespace), but applying with git apply --whitespace=fix isn't safe, because the test results need some (but not all) of those spaces. I applied the patch, backed out the changes to expected/json.out, and created the file from the patch, then removed the superfluous whitespace. 2. I bumped some function OIDs to avoid conflicts. 3. One documentation typo. Everything other than json.c (which I haven't read yet) looks fine (builds, passes tests). I've attached a patch covering the changes I made. More later. -- ams diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 4f8b35e..ce4c4f6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9626,7 +9626,7 @@ table2-mapping /indexterm para -This section descripbes the functions that are available for creating +This section describes the functions that are available for creating JSON (see xref linkend=datatype-json) data. /para diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index cdfa4cc..02c8679 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -4027,11 +4027,11 @@ DATA(insert OID = 323 ( json_recv PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 11 DESCR(I/O); DATA(insert OID = 324 ( json_send PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 17 114 _null_ _null_ _null_ _null_ json_send _null_ _null_ _null_ )); DESCR(I/O); -DATA(insert OID = 3144 ( query_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 2 0 114 25 16 _null_ _null_ _null_ _null_ query_to_json _null_ _null_ _null_ )); +DATA(insert OID = 3153 ( query_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 2 0 114 25 16 _null_ _null_ _null_ _null_ query_to_json _null_ _null_ _null_ )); DESCR(I/O); -DATA(insert OID = 3145 ( array_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2277 _null_ _null_ _null_ _null_ array_to_json _null_ _null_ _null_ )); +DATA(insert OID = 3154 ( array_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2277 _null_ _null_ _null_ _null_ array_to_json _null_ _null_ _null_ )); DESCR(I/O); -DATA(insert OID = 3146 ( row_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2249 _null_ _null_ _null_ _null_ row_to_json _null_ _null_ _null_ )); +DATA(insert OID = 3155 ( row_to_json PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2249 _null_ _null_ _null_ _null_ row_to_json _null_ _null_ _null_ )); DESCR(I/O); /* uuid */ diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out index b975d72..1984106 100644 --- a/src/test/regress/expected/json.out +++ b/src/test/regress/expected/json.out @@ -278,11 +278,11 @@ SELECT query_to_json('select x as b, x * 2 as c from generate_series(1,3) x',tru (1 row) SELECT query_to_json(' - SELECT $$a$$ || x AS b, - y AS c, + SELECT $$a$$ || x AS b, + y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), - ROW(y.*,ARRAY[4,5,6])] AS z - FROM generate_series(1,2) x, + ROW(y.*,ARRAY[4,5,6])] AS z + FROM generate_series(1,2) x, generate_series(4,5) y',true); query_to_json -- @@ -299,19 +299,19 @@ SELECT query_to_json('select array_agg(x) as d from generate_series(5,10) x',fal (1 row) -- array_to_json -SELECT array_to_json(array_agg(x)) +SELECT array_to_json(array_agg(x)) FROM generate_series(1,10) x; array_to_json [1,2,3,4,5,6,7,8,9,10] (1 row) -SELECT array_to_json(array_agg(q)) -FROM (SELECT $$a$$ || x AS b, - y AS c, +SELECT array_to_json(array_agg(q)) +FROM (SELECT $$a$$ || x AS b, + y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), - ROW(y.*,ARRAY[4,5,6])] AS z - FROM generate_series(1,2) x, + ROW(y.*,ARRAY[4,5,6])] AS z + FROM generate_series(1,2) x, generate_series(4,5) y) q; array_to_json --- @@ -331,12 +331,12 @@ SELECT row_to_json(row(1,'foo')); {f1:1,f2:foo} (1 row) -SELECT row_to_json(q) -FROM (SELECT $$a$$ || x AS b, - y AS c, +SELECT
Re: [HACKERS] JSON for PG 9.2
On sön, 2012-01-22 at 11:43 -0500, Andrew Dunstan wrote: Actually, given recent discussion I think that test should just be removed from json.c. We don't actually have any test that the code point is valid (e.g. that it doesn't refer to an unallocated code point). We don't do that elsewhere either - the unicode_to_utf8() function the scanner uses to turn \u escapes into utf8 doesn't look for unallocated code points. I'm not sure how much other validation we should do - for example on correct use of surrogate pairs. We do check the correctness of surrogate pairs elsewhere. Search for surrogate in scan.l; should be easy to copy. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote: Here's an update that adds row_to_json, plus a bit more cleanup. why not call all these functions 'to_json' and overload them? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/23/2012 05:21 PM, Merlin Moncure wrote: On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstanand...@dunslane.net wrote: Here's an update that adds row_to_json, plus a bit more cleanup. why not call all these functions 'to_json' and overload them? I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options that are specific to the datum type. We have various foo_to_xml() functions now. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/23 Merlin Moncure mmonc...@gmail.com: On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan and...@dunslane.net wrote: Here's an update that adds row_to_json, plus a bit more cleanup. why not call all these functions 'to_json' and overload them? -1 older proposal is more consistent with xml functions Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/21/2012 11:40 PM, Jeff Janes wrote: On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstanand...@dunslane.net wrote: On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. This is bit-rotted such that initdb fails creating template1 database in /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3145) is duplicated. I bumped up those oids in the patch, and it passes make check once I figure out how to get the test run under UTF-8. Is it supposed to pass under other encodings? I can't tell from the rest of thread whether it supposed to pass in other encodings or not. Yeah, regression tests generally are supposed to run in all encodings. Either we could knock out the offending test, or we could supply an alternative result file. If we do the latter, maybe we should modify the query slightly, so it reads SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, \uaBcD'::json; cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/22/2012 04:28 AM, Andrew Dunstan wrote: On 01/21/2012 11:40 PM, Jeff Janes wrote: On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstanand...@dunslane.net wrote: On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. This is bit-rotted such that initdb fails creating template1 database in /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3145) is duplicated. I bumped up those oids in the patch, and it passes make check once I figure out how to get the test run under UTF-8. Is it supposed to pass under other encodings? I can't tell from the rest of thread whether it supposed to pass in other encodings or not. Yeah, regression tests generally are supposed to run in all encodings. Either we could knock out the offending test, or we could supply an alternative result file. If we do the latter, maybe we should modify the query slightly, so it reads SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, \uaBcD'::json; Actually, given recent discussion I think that test should just be removed from json.c. We don't actually have any test that the code point is valid (e.g. that it doesn't refer to an unallocated code point). We don't do that elsewhere either - the unicode_to_utf8() function the scanner uses to turn \u escapes into utf8 doesn't look for unallocated code points. I'm not sure how much other validation we should do - for example on correct use of surrogate pairs. I'd rather get this as right as possible now - every time we tighten encoding rules to make sure incorrectly encoded data doesn't get into the database it causes someone real pain. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. This is bit-rotted such that initdb fails creating template1 database in /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3145) is duplicated. I bumped up those oids in the patch, and it passes make check once I figure out how to get the test run under UTF-8. Is it supposed to pass under other encodings? I can't tell from the rest of thread whether it supposed to pass in other encodings or not. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net wrote: The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Yes. I think this argument has been mostly theologizing, along the lines of how many JSON characters can dance on the head of a pin. From a user's perspective, the database encoding is only a constraint on which characters he can store. Bingo. He does not know or care what the bit representation is inside the server. As such, if we store a non-ASCII character in a JSON string, it's valid JSON as far as the user is concerned, so long as that character exists in the Unicode standard. If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? Also agreed. Personally, I think it may not have been a great idea to tie the JSON spec so closely to Unicode, but I understand that it would have been difficult to define an encoding-agnostic equivalent of \u, since it's hard to know for sure whether an arbitrary encoding even has a (sensible?) definition of code points, and they probably wanted to avoid ambiguity. But, it's bound to cause problems for any system that runs in some other encoding, which, when so requested, we do. Even if we had the ability to support multiple encodings in the same database, I'm not sure I'd be very excited about insisting that JSON data always be stored in UTF-8, because that would introduce a lot of unnecessary transcoding for people using other encodings and basically unnecessarily handicap the functionality provided by the datatype. But at least if we had that, people would have the *option* to use JSON with UTF-8 and get the fully spec-compliant behavior. As it is, they don't; the system we have forces the database encoding on all datatypes whether they like it or not, and that ain't changing for 9.2. So I'm with Robert: we should just plain not care. I would further suggest that maybe what we should do with incoming JSON escape sequences is convert them to Unicode code points and then to the equivalent character in the database encoding (or throw error if there is none). The code I've written so far does no canonicalization of the input value of any kind, just as we do for XML. I'm inclined to leave it that way. Eventually, we might want to make the JSON datatype support equality comparisons and so on, and that will require the system to knowing that the letter r can be encoded as some \u sequence and that the escape \r is equivalent to some other escape \u, but right now all the code does is try to validate that the JSON is legal, NOT second-guess the user's choice about how to spell things or where to insert whitespace. I think that's a good choice because (1) AFAIK, there's no official canonicalization method for JSON, so whatever we pick will be something we think is best, not an official method sanction by the spec, (2) users might prefer the way they chose to represent a given value over the way we choose to represent it, and (3) by simply validating and storing the JSON object, rather than doing any canonicalization, the input function avoids the need to do any data copying, hopefully maximizing speed. Canonicalization can be added on top of what I've done here and people who want or need it can use it; I have some ideas around how to make that leverage the existing code that I intend to pursue for 9.3, but right now I'd rather not go there. So, given that framework, what the patch does is this: if you're using UTF-8, then \u is accepted, provided that is something that equates to a legal Unicode code point. It isn't converted to the corresponding character: it's just validated. If you're NOT using UTF-8, then it allows \u for code points up through 127 (which we assume are the same in all encodings) and anything higher than that is rejected. If someone knows an easy way to check whether a \u sequence for 007F is a legal Unicode code point that has an equivalent in the current server encoding, then we can add logic to allow that case also, but personally I'm not that excited about it. Anyone who is using \u escapes with a non-Unicode coding
Re: [HACKERS] JSON for PG 9.2
Robert Haas robertmh...@gmail.com writes: The code I've written so far does no canonicalization of the input value of any kind, just as we do for XML. Fair enough. So, given that framework, what the patch does is this: if you're using UTF-8, then \u is accepted, provided that is something that equates to a legal Unicode code point. It isn't converted to the corresponding character: it's just validated. If you're NOT using UTF-8, then it allows \u for code points up through 127 (which we assume are the same in all encodings) and anything higher than that is rejected. This seems a bit silly. If you're going to leave the escape sequence as ASCII, then why not just validate that it names a legal Unicode code point and be done? There is no reason whatever that that behavior needs to depend on the database encoding. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The code I've written so far does no canonicalization of the input value of any kind, just as we do for XML. Fair enough. So, given that framework, what the patch does is this: if you're using UTF-8, then \u is accepted, provided that is something that equates to a legal Unicode code point. It isn't converted to the corresponding character: it's just validated. If you're NOT using UTF-8, then it allows \u for code points up through 127 (which we assume are the same in all encodings) and anything higher than that is rejected. This seems a bit silly. If you're going to leave the escape sequence as ASCII, then why not just validate that it names a legal Unicode code point and be done? There is no reason whatever that that behavior needs to depend on the database encoding. Mostly because that would prevent us from adding canonicalization in the future, AFAICS, and I don't want to back myself into a corner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/20/2012 09:19 AM, Robert Haas wrote: On Fri, Jan 20, 2012 at 12:07 AM, Tom Lanet...@sss.pgh.pa.us wrote: Andrew Dunstanand...@dunslane.net writes: On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.netwrote: The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Yes. I think this argument has been mostly theologizing, along the lines of how many JSON characters can dance on the head of a pin. From a user's perspective, the database encoding is only a constraint on which characters he can store. Bingo. He does not know or care what the bit representation is inside the server. As such, if we store a non-ASCII character in a JSON string, it's valid JSON as far as the user is concerned, so long as that character exists in the Unicode standard. If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? Also agreed. Personally, I think it may not have been a great idea to tie the JSON spec so closely to Unicode, but I understand that it would have been difficult to define an encoding-agnostic equivalent of \u, since it's hard to know for sure whether an arbitrary encoding even has a (sensible?) definition of code points, and they probably wanted to avoid ambiguity. But, it's bound to cause problems for any system that runs in some other encoding, which, when so requested, we do. Even if we had the ability to support multiple encodings in the same database, I'm not sure I'd be very excited about insisting that JSON data always be stored in UTF-8, because that would introduce a lot of unnecessary transcoding for people using other encodings and basically unnecessarily handicap the functionality provided by the datatype. But at least if we had that, people would have the *option* to use JSON with UTF-8 and get the fully spec-compliant behavior. As it is, they don't; the system we have forces the database encoding on all datatypes whether they like it or not, and that ain't changing for 9.2. So I'm with Robert: we should just plain not care. I would further suggest that maybe what we should do with incoming JSON escape sequences is convert them to Unicode code points and then to the equivalent character in the database encoding (or throw error if there is none). The code I've written so far does no canonicalization of the input value of any kind, just as we do for XML. I'm inclined to leave it that way. Eventually, we might want to make the JSON datatype support equality comparisons and so on, and that will require the system to knowing that the letter r can be encoded as some \u sequence and that the escape \r is equivalent to some other escape \u, but right now all the code does is try to validate that the JSON is legal, NOT second-guess the user's choice about how to spell things or where to insert whitespace. I think that's a good choice because (1) AFAIK, there's no official canonicalization method for JSON, so whatever we pick will be something we think is best, not an official method sanction by the spec, (2) users might prefer the way they chose to represent a given value over the way we choose to represent it, and (3) by simply validating and storing the JSON object, rather than doing any canonicalization, the input function avoids the need to do any data copying, hopefully maximizing speed. Canonicalization can be added on top of what I've done here and people who want or need it can use it; I have some ideas around how to make that leverage the existing code that I intend to pursue for 9.3, but right now I'd rather not go there. So, given that framework, what the patch does is this: if you're using UTF-8, then \u is accepted, provided that is something that equates to a legal Unicode code point. It isn't converted to the corresponding character: it's just validated. If you're NOT using UTF-8, then it allows \u for code points up through 127 (which we assume are the same in all encodings) and anything higher than that is rejected. If someone knows an easy way to check whether a \u sequence for 007F is a legal Unicode code point that has an equivalent in the current server encoding, then we can add logic to allow that case also, but personally I'm not that excited about it. Anyone who is using \u escapes
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote: XML's #; escape mechanism is more or less the equivalent of JSON's \u. But XML documents can be encoded in a variety of encodings, including non-unicode encodings such as Latin-1. However, no matter what the document encoding, #; designates the character with Unicode code point , whether or not that is part of the document encoding's charset. OK. Given that precedent, I'm wondering if we do need to enforce anything other than that it is a valid unicode code point. Equivalence comparison is going to be difficult anyway if you're not resolving all \u escapes. Possibly we need some sort of canonicalization function to apply for comparison purposes. But we're not providing any comparison ops today anyway, so I don't think we need to make that decision now. As you say, there doesn't seem to be any defined canonical form - the spec is a bit light on in this respect. Well, we clearly have to resolve all \u to do either comparison or canonicalization. The current patch does neither, but presumably we want to leave the door open to such things. If we're using UTF-8 and comparing two strings, and we get to a position where one of them has a character and the other has \u, it's pretty simple to do the comparison: we just turn into a wchar_t and test for equality. That should be trivial, unless I'm misunderstanding. If, however, we're not using UTF-8, we have to first turn \u into a Unicode code point, then covert that to a character in the database encoding, and then test for equality with the other character after that. I'm not sure whether that's possible in general, how to do it, or how efficient it is. Can you or anyone shed any light on that topic? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Jan 19, 2012, at 9:07 PM, Tom Lane wrote: If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? Don’t entirely agree with this. Some folks are stuck with other encodings and cannot change them for one reason or another. That said, they can convert JSON from their required encoding into UTF-8 on the client side, so there is a workaround. Not that this changes anything, and I agree with the overall direction of the discussion here. I just want to make sure we keep in mind folks who don’t necessarily have the freedom to switch to UTF-8. (And I say this as someone who *always* uses UTF-8!) David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Jan 20, 2012, at 8:58 AM, Robert Haas wrote: If, however, we're not using UTF-8, we have to first turn \u into a Unicode code point, then covert that to a character in the database encoding, and then test for equality with the other character after that. I'm not sure whether that's possible in general, how to do it, or how efficient it is. Can you or anyone shed any light on that topic? If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no? At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 09:12:13AM -0800, David E. Wheeler wrote: On Jan 19, 2012, at 9:07 PM, Tom Lane wrote: If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? Don’t entirely agree with this. Some folks are stuck with other encodings and cannot change them for one reason or another. That said, they can convert JSON from their required encoding into UTF-8 on the client side, so there is a workaround. Perhaps in addition to trying to just 'do the right thing by default', it makes sense to have a two canonicalization functions? Say: json_utf8() and json_ascii(). They could give the same output no matter what encoding was set? json_utf8 would give nice output where characters were canonicalized to native utf8 characters and json_ascii() would output only non-control ascii characters literally and escape everything else or something like that? Garick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/20/2012 11:58 AM, Robert Haas wrote: On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstanand...@dunslane.net wrote: XML's#; escape mechanism is more or less the equivalent of JSON's \u. But XML documents can be encoded in a variety of encodings, including non-unicode encodings such as Latin-1. However, no matter what the document encoding,#; designates the character with Unicode code point , whether or not that is part of the document encoding's charset. OK. Given that precedent, I'm wondering if we do need to enforce anything other than that it is a valid unicode code point. Equivalence comparison is going to be difficult anyway if you're not resolving all \u escapes. Possibly we need some sort of canonicalization function to apply for comparison purposes. But we're not providing any comparison ops today anyway, so I don't think we need to make that decision now. As you say, there doesn't seem to be any defined canonical form - the spec is a bit light on in this respect. Well, we clearly have to resolve all \u to do either comparison or canonicalization. The current patch does neither, but presumably we want to leave the door open to such things. If we're using UTF-8 and comparing two strings, and we get to a position where one of them has a character and the other has \u, it's pretty simple to do the comparison: we just turn into a wchar_t and test for equality. That should be trivial, unless I'm misunderstanding. If, however, we're not using UTF-8, we have to first turn \u into a Unicode code point, then covert that to a character in the database encoding, and then test for equality with the other character after that. I'm not sure whether that's possible in general, how to do it, or how efficient it is. Can you or anyone shed any light on that topic? We know perfectly well how to turn two strings from encoding x to utf8 (see mb_utils.c::pg_do_encoding_conversion() ). Once we've done that ISTM we have reduced this to the previous problem, as the mathematicians like to say. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 20, 2012, at 8:58 AM, Robert Haas wrote: If, however, we're not using UTF-8, we have to first turn \u into a Unicode code point, then covert that to a character in the database encoding, and then test for equality with the other character after that. I'm not sure whether that's possible in general, how to do it, or how efficient it is. Can you or anyone shed any light on that topic? If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no? Well, you can pick which way you want to do the conversion. If the database encoding is SJIS, and there's an SJIS character in a string that gets passed to json_in(), and there's another string which also gets passed to json_in() which contains \u, then any sort of canonicalization or equality testing is going to need to convert the SJIS character to a Unicode code point, or the Unicode code point to an SJIS character, to see whether they match. Err, actually, now that I think about it, that might be a problem: what happens if we're trying to test two characters for equality and the encoding conversion fails? We really just want to return false - the strings are clearly not equal if either contains even one character that can't be converted to the other encoding - so it's not good if an error gets thrown in there anywhere. At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Robert Haas robertmh...@gmail.com writes: Err, actually, now that I think about it, that might be a problem: what happens if we're trying to test two characters for equality and the encoding conversion fails? This is surely all entirely doable given the encoding infrastructure we already have. We might need some minor refactoring, eg to have a way of not throwing an error, but it's not going to be that hard to achieve if somebody wants to do it. So I still see little reason for making the JSON type behave visibly differently in non-UTF8 database encodings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Fri, Jan 20, 2012 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Err, actually, now that I think about it, that might be a problem: what happens if we're trying to test two characters for equality and the encoding conversion fails? This is surely all entirely doable given the encoding infrastructure we already have. We might need some minor refactoring, eg to have a way of not throwing an error, but it's not going to be that hard to achieve if somebody wants to do it. So I still see little reason for making the JSON type behave visibly differently in non-UTF8 database encodings. OK. It feels a little grotty to me, but I'll go with the flow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan and...@dunslane.net wrote: Second, what should be do when the database encoding isn't UTF8? I'm inclined to emit a \u escape for any non-ASCII character (assuming it has a unicode code point - are there any code points in the non-unicode encodings that don't have unicode equivalents?). The alternative would be to fail on non-ASCII characters, which might be ugly. Of course, anyone wanting to deal with JSON should be using UTF8 anyway, but we still have to deal with these things. What about SQL_ASCII? If there's a non-ASCII sequence there we really have no way of telling what it should be. There at least I think we should probably error out. I don't see any reason to escape anything more than the minimum required by the spec, which only requires it for control characters. If somebody's got a non-ASCII character in there, we can simply allow it to be represented by itself. That's almost certainly more compact (and very possibly more readable) than emitting \u for each such instance, and it also matches what the current EXPLAIN (FORMAT JSON) output does. In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. And specifically, the \u syntax won't work, and there might be some non-Unicode characters in there. If that's not the behavior you want, then use UTF-8. It seems pretty clear that we're going to have to make some trade-off to handle non-UTF8 encodings, and I think what I'm suggesting is a lot less painful than disabling high-bit characters altogether. If we do that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his column label has a non-Unicode character in there? Should we say, oh, sorry, you can't explain that in JSON format? That is mighty unfriendly, and probably mighty complicated and expensive to figure out, too. We *do not support* mixing encodings in the same database, and if we make it the job of this patch to fix that problem, we're going to be in the same place for 9.2 that we have been for the last several releases: nowhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net wrote: On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. And specifically, the \u syntax won't work, and there might be some non-Unicode characters in there. If that's not the behavior you want, then use UTF-8. It seems pretty clear that we're going to have to make some trade-off to handle non-UTF8 encodings, and I think what I'm suggesting is a lot less painful than disabling high-bit characters altogether. If we do that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his column label has a non-Unicode character in there? Should we say, oh, sorry, you can't explain that in JSON format? That is mighty unfriendly, and probably mighty complicated and expensive to figure out, too. We *do not support* mixing encodings in the same database, and if we make it the job of this patch to fix that problem, we're going to be in the same place for 9.2 that we have been for the last several releases: nowhere. OK, then we need to say that very clearly and up front (including in the EXPLAIN docs.) Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Thu, Jan 19, 2012 at 5:59 PM, Andrew Dunstan and...@dunslane.net wrote: OK, then we need to say that very clearly and up front (including in the EXPLAIN docs.) Can do. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Well, that would be nice, but I don't think it's practical. It will certainly be the case, under the scheme I'm proposing, or probably any other sensible scheme also, that if a client whose encoding is UTF-8 gets a value of type json back fro the database, it's strictly valid JSON. But it won't be possible to store every legal JSON value in the database if the database encoding is anything other than UTF-8, even if the client encoding is UTF-8. The backend will get the client's UTF-8 bytes and transcode them to the server encoding before calling the type-input function, so if there are characters in there that can't be represented in UTF-8, then we'll error out before the JSON data type ever gets control. In theory, it would be possible to accept such strings if the client chooses to represent them using a \u sequence, but I'm unexcited about doing the work required to make that happen, because it will still be a pretty half-baked: we'll be able to accept some representations of the same JSON constant but not others. I think the real fix for this problem is to introduce an infrastructure inside the database that allows us to have different columns stored in different encodings. People use bytea for that right now, but that's pretty unfriendly: it would be nice to have a better system. However, I expect that to take a lot of work and break a lot of things, and until we do it I don't feel that compelled to provide buggy and incomplete support for it under the guise of implementing a JSON datatype. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Jan 19, 2012, at 4:27 PM, Robert Haas wrote: I think the real fix for this problem is to introduce an infrastructure inside the database that allows us to have different columns stored in different encodings. People use bytea for that right now, but that's pretty unfriendly: it would be nice to have a better system. However, I expect that to take a lot of work and break a lot of things, and until we do it I don't feel that compelled to provide buggy and incomplete support for it under the guise of implementing a JSON datatype. +1 This seems like a reasonable compromise and course of action, especially if someone is interested in taking on column-level encodings at some point in the next year or two. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Andrew Dunstan and...@dunslane.net writes: On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net wrote: The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Yes. I think this argument has been mostly theologizing, along the lines of how many JSON characters can dance on the head of a pin. From a user's perspective, the database encoding is only a constraint on which characters he can store. He does not know or care what the bit representation is inside the server. As such, if we store a non-ASCII character in a JSON string, it's valid JSON as far as the user is concerned, so long as that character exists in the Unicode standard. If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? So I'm with Robert: we should just plain not care. I would further suggest that maybe what we should do with incoming JSON escape sequences is convert them to Unicode code points and then to the equivalent character in the database encoding (or throw error if there is none). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. Example: andrew=# SELECT row_to_json(q) FROM (SELECT $$a$$ || x AS b, y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), ROW(y.*,ARRAY[4,5,6])] AS z FROM generate_series(1,2) x, generate_series(4,5) y) q; row_to_json {b:a1,c:4,z:[{f1:1,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]} {b:a1,c:5,z:[{f1:1,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]} {b:a2,c:4,z:[{f1:2,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]} {b:a2,c:5,z:[{f1:2,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]} (4 rows) (This might be more to Robert's taste than query_to_json() :-) ) cheers andrew diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 152ef2f..f45b10b 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -269,6 +269,12 @@ entry/entry entryXML data/entry /row + + row + entrytypejson/type/entry + entry/entry + entryJSON data/entry + /row /tbody /tgroup /table @@ -4169,6 +4175,21 @@ SET xmloption TO { DOCUMENT | CONTENT }; /sect2 /sect1 + sect1 id=datatype-json + titleacronymJSON/ Type/title + + indexterm zone=datatype-json +primaryJSON/primary + /indexterm + + para +The typejson/type data type can be used to store JSON data. Such +data can also be stored as typetext/type, but the +typejson/type data type has the advantage of checking that each +stored value is a valid JSON value. + /para + /sect1 + array; rowtypes; diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2e06346..9368739 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9615,6 +9615,77 @@ table2-mapping /sect2 /sect1 + sect1 id=functions-json + titleJSON functions/title + + indexterm zone=datatype-json + primaryJSON/primary + secondaryFunctions and operators/secondary + /indexterm + + para +This section descripbes the functions that are available for creating +JSON (see xref linkend=datatype-json) data. + /para + + table id=functions-json-table +titleJSON Support Functions/title +tgroup cols=4 + thead + row + entryFunction/entry + entryDescription/entry + entryExample/entry + entryExample Result/entry + /row + /thead + tbody + row + entry + indexterm + primaryquery_to_json/primary + /indexterm + literalquery_to_json(text, boolean)/literal + /entry + entry + Returns the result of running the query as JSON. If the + second parameter is true, there will be a line feed between records. + /entry + entryliteralquery_to_json('select 1 as a, $$foo$$ as b', false)/literal/entry + entryliteral[{a:1,b:foo}]/literal/entry + /row + row + entry + indexterm + primaryarray_to_json/primary + /indexterm + literalarray_to_json(anyarray)/literal + /entry + entry + Returns the array as JSON. A Postgres multi-dimensional array becomes a JSON + array of arrays. + /entry + entryliteralarray_to_json('{{1,5},{99,100}}'::int[])/literal/entry + entryliteral[[1,5],[99,100]]/literal/entry + /row + row + entry + indexterm + primaryrow_to_json/primary + /indexterm + literalrow_to_json(record)/literal + /entry + entry + Returns the row as JSON. + /entry + entryliteralrow_to_json(row(1,'foo'))/literal/entry + entryliteral{f1:1,f2:foo}/literal/entry + /row + /tbody +/tgroup + /table + + /sect1 sect1 id=functions-sequence titleSequence Manipulation Functions/title diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8b48105..ddb2784 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -24,6 +24,7 @@ #include rewrite/rewriteHandler.h #include tcop/tcopprot.h #include utils/builtins.h +#include utils/json.h #include utils/lsyscache.h #include utils/rel.h #include utils/snapmgr.h @@ -99,7 +100,6 @@ static void ExplainDummyGroup(const char *objtype, const char *labelname, static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es); static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); -static void escape_json(StringInfo buf, const char *str); static void escape_yaml(StringInfo buf, const char
Re: [HACKERS] JSON for PG 9.2
2012/1/14 Andrew Dunstan and...@dunslane.net: On 01/12/2012 10:51 AM, Andrew Dunstan wrote: On 01/12/2012 10:44 AM, Pavel Stehule wrote: 2012/1/12 Andrew Dunstanand...@dunslane.net: On 01/12/2012 09:00 AM, Joey Adams wrote: I wrote an array_to_json function during GSoC 2010: http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289 It's not exposed as a procedure called array_to_json: it's part of the to_json function, which decides what to do based on the argument type. Excellent, this is just the point at which I stopped work last night, so with your permission I'll steal this and it will save me a good chunk of time. this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html I will be covering composites. OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel A couple of things to note. First, the problem about us losing column names that I noted a couple of months ago and Tom did a bit of work on is exercised by this. We really need to fix it. Example: support SELECT ROW (x AS real name, y AS real name) is good idea and should be used more time than only here. Regards Pavel andrew=# select array_to_json(array_agg(row(z.*))) from (select $$a$$ || x as b, y as c, array[row(x.*,array[1,2,3]), row(y.*,array[4,5,6])] as z from generate_series(1,1) x, generate_series(4,4) y) z; array_to_json - [{f1:a1,f2:4,f3:[{f1:1,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]}] (1 row) Here we've lost b, c and z as column names. Second, what should be do when the database encoding isn't UTF8? I'm inclined to emit a \u escape for any non-ASCII character (assuming it has a unicode code point - are there any code points in the non-unicode encodings that don't have unicode equivalents?). The alternative would be to fail on non-ASCII characters, which might be ugly. Of course, anyone wanting to deal with JSON should be using UTF8 anyway, but we still have to deal with these things. What about SQL_ASCII? If there's a non-ASCII sequence there we really have no way of telling what it should be. There at least I think we should probably error out. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan and...@dunslane.net wrote: Second, what should be do when the database encoding isn't UTF8? I'm inclined to emit a \u escape for any non-ASCII character (assuming it has a unicode code point - are there any code points in the non-unicode encodings that don't have unicode equivalents?). The alternative would be to fail on non-ASCII characters, which might be ugly. Of course, anyone wanting to deal with JSON should be using UTF8 anyway, but we still have to deal with these things. What about SQL_ASCII? If there's a non-ASCII sequence there we really have no way of telling what it should be. There at least I think we should probably error out. I don't think there is a satisfying solution to this problem. Things working against us: * Some server encodings support characters that don't map to Unicode characters (e.g. unused slots in Windows-1252). Thus, converting to UTF-8 and back is lossy in general. * We want a normalized representation for comparison. This will involve a mixture of server and Unicode characters, unless the encoding is UTF-8. * We can't efficiently convert individual characters to and from Unicode with the current API. * What do we do about \u ? TEXT datums cannot contain NUL characters. I'd say just ban Unicode escapes and non-ASCII characters unless the server encoding is UTF-8, and ban all \u escapes. It's easy, and whatever we support later will be a superset of this. Strategies for handling this situation have been discussed in prior emails. This is where things got stuck last time. - Joey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/14/2012 06:11 PM, Joey Adams wrote: On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstanand...@dunslane.net wrote: Second, what should be do when the database encoding isn't UTF8? I'm inclined to emit a \u escape for any non-ASCII character (assuming it has a unicode code point - are there any code points in the non-unicode encodings that don't have unicode equivalents?). The alternative would be to fail on non-ASCII characters, which might be ugly. Of course, anyone wanting to deal with JSON should be using UTF8 anyway, but we still have to deal with these things. What about SQL_ASCII? If there's a non-ASCII sequence there we really have no way of telling what it should be. There at least I think we should probably error out. I don't think there is a satisfying solution to this problem. Things working against us: * Some server encodings support characters that don't map to Unicode characters (e.g. unused slots in Windows-1252). Thus, converting to UTF-8 and back is lossy in general. * We want a normalized representation for comparison. This will involve a mixture of server and Unicode characters, unless the encoding is UTF-8. * We can't efficiently convert individual characters to and from Unicode with the current API. * What do we do about \u ? TEXT datums cannot contain NUL characters. I'd say just ban Unicode escapes and non-ASCII characters unless the server encoding is UTF-8, and ban all \u escapes. It's easy, and whatever we support later will be a superset of this. Strategies for handling this situation have been discussed in prior emails. This is where things got stuck last time. Well, from where I'm coming from, nuls are not a problem. But escape_json() is currently totally encoding-unaware. It produces \u escapes for low ascii characters, and just passes through characters with the high bit set. That's possibly OK for EXPLAIN output - we really don't want don't want EXPLAIN failing. But maybe we should ban JSON output for EXPLAIN if the encoding isn't UTF8. Another question in my mind is what to do when the client encoding isn't UTF8. None of these is an insurmountable problem, ISTM - we just need to make some decisions. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
I am very interested in experimenting with functional indexes into JSON structures. I think this could be very powerful combined with full text search as well as constraints. It would allow for using postgres as an unstructured data store without sacrificing the powerful indexing features, durability, and transactional semantics that comes with using postgres or RDBMSes in general. One use case in particular I have been trying to solve for lately is persisting and synchronizing client-side state (in a mobile application) with a server. It would be nice to have a flat, schemaless table (maybe a table that's like (id, type, owner, data) where data would be a JSON blob). I could do this now without JSON support, but I think indexing inside that JSON blob and having validation database side is valuable as well. And as I mentioned before, i'd rather not throw out the baby with the bathwater by using a different type of database because ACID, replication, and constraints are also very important to this. As is being consistent with the rest of our technology stack. (I'd essentially be using a relational database to persist an object database) I'm also not too concerned about storage consumption with this (even though columnar compression would help a lot in the future) since it's easily partitionable by user ID. For my case the equivalent of postgres's XPath would work. Also having it as a maintained contrib module would be sufficient, although it being part of core as XPath is would be even better. Just my $0.02... even if I'm a bit late to the conversation. Thanks! Mike
Re: [HACKERS] JSON for PG 9.2
I wrote an array_to_json function during GSoC 2010: http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289 It's not exposed as a procedure called array_to_json: it's part of the to_json function, which decides what to do based on the argument type. - Joey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/12/2012 09:00 AM, Joey Adams wrote: I wrote an array_to_json function during GSoC 2010: http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289 It's not exposed as a procedure called array_to_json: it's part of the to_json function, which decides what to do based on the argument type. Excellent, this is just the point at which I stopped work last night, so with your permission I'll steal this and it will save me a good chunk of time. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/12 Andrew Dunstan and...@dunslane.net: On 01/12/2012 09:00 AM, Joey Adams wrote: I wrote an array_to_json function during GSoC 2010: http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289 It's not exposed as a procedure called array_to_json: it's part of the to_json function, which decides what to do based on the argument type. Excellent, this is just the point at which I stopped work last night, so with your permission I'll steal this and it will save me a good chunk of time. this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html Regards Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/12/2012 10:44 AM, Pavel Stehule wrote: 2012/1/12 Andrew Dunstanand...@dunslane.net: On 01/12/2012 09:00 AM, Joey Adams wrote: I wrote an array_to_json function during GSoC 2010: http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289 It's not exposed as a procedure called array_to_json: it's part of the to_json function, which decides what to do based on the argument type. Excellent, this is just the point at which I stopped work last night, so with your permission I'll steal this and it will save me a good chunk of time. this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html I will be covering composites. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/12/2012 10:44 AM, Pavel Stehule wrote: this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html I will be covering composites. curious, will your function work on unregistered composites? What would this do? select array_to_json(array[row('a', 1), row('b', 2)]); merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/12 Merlin Moncure mmonc...@gmail.com: On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/12/2012 10:44 AM, Pavel Stehule wrote: this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html I will be covering composites. curious, will your function work on unregistered composites? What would this do? select array_to_json(array[row('a', 1), row('b', 2)]); it can do it - but it has to use some defaults for names. Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/12/2012 11:21 AM, Merlin Moncure wrote: On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstanand...@dunslane.net wrote: On 01/12/2012 10:44 AM, Pavel Stehule wrote: this should be little bit more enhanced to support a row arrays - it can be merged with some routines from pst tool http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html I will be covering composites. curious, will your function work on unregistered composites? What would this do? select array_to_json(array[row('a', 1), row('b', 2)]); Expected behaviour is something like this: andrew=# select q2json(' select $$a$$ || x as b, y as c, array[row(x.*,array[1,2,3]), row(y.*,array[4,5,6])] as z from generate_series(1,2) x, generate_series(4,5) y'); [{b:a1,c:4,z:[{f1:1,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]}, {b:a1,c:5,z:[{f1:1,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]}, {b:a2,c:4,z:[{f1:2,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]}, {b:a2,c:5,z:[{f1:2,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]}] cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/11/2012 01:18 AM, Pavel Stehule wrote: I like this patch and this feature. I'm about to read the patch in detail - I certainly like the feature. I see only one issue - there is not functionality that helps generate JSON in pg. What do you think about functions: array_to_json(anyarray), row_to_json(any) and format_json(text, text, ...) Actually, more than these, I (and at least one very interested client) want query_to_json, which would do something like: # select q2json('select $$a$$ || x as b, y as c from generate_series(1,3) x, generate_series(4,5) y'); q2json - [{b:a1,c:4},{b:a1,c:5},{b:a2,c:4},{b:a2,c:5},{b:a3,c:4},{b:a3,c:5}] No doubt several variants are possible such as returning a setof json, one per row, instead of a single json, and allowing query parameters as separate arguments (maybe just using variadic functions), but probably for a first go just something as simple as this would meet the case. Given the short time span available before patches must be in, I am prepared to work on this ASAP. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/11 Andrew Dunstan and...@dunslane.net: On 01/11/2012 01:18 AM, Pavel Stehule wrote: I like this patch and this feature. I'm about to read the patch in detail - I certainly like the feature. I see only one issue - there is not functionality that helps generate JSON in pg. What do you think about functions: array_to_json(anyarray), row_to_json(any) and format_json(text, text, ...) Actually, more than these, I (and at least one very interested client) want query_to_json, which would do something like: # select q2json('select $$a$$ || x as b, y as c from generate_series(1,3) x, generate_series(4,5) y'); q2json - [{b:a1,c:4},{b:a1,c:5},{b:a2,c:4},{b:a2,c:5},{b:a3,c:4},{b:a3,c:5}] we have a query_to_xml - so there should similar query_to_json. But this is not enough for usage from SP. What about two rich functions * query_to_json - by your proposal * array_to_json - with possibility to serialize array of records This can be a basic set Regards Pavel No doubt several variants are possible such as returning a setof json, one per row, instead of a single json, and allowing query parameters as separate arguments (maybe just using variadic functions), but probably for a first go just something as simple as this would meet the case. Given the short time span available before patches must be in, I am prepared to work on this ASAP. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Wed, Jan 11, 2012 at 1:18 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2012/1/11 Robert Haas robertmh...@gmail.com: On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com wrote: On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. On that note, here's an updated version of the patch I posted upthread, with some regression tests and minimal documentation. I like this patch and this feature. I see only one issue - there is not functionality that helps generate JSON in pg. What do you think about functions: array_to_json(anyarray), row_to_json(any) and format_json(text, text, ...) I think we might want all of that stuff, but I doubt there is time to do it for 9.2. Actually, I think the next logical step would be to define equality (is there an official definition of that for JSON?) and build a btree opclass. I believe the code I've already written could be extended to construct an abstract syntax tree for those operations that need it. But we need to make some decisions first. A btree opclass requires a total ordering, so we have to arbitrarily define whether 1 true, 1 [1], 1 1, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/11 Robert Haas robertmh...@gmail.com: On Wed, Jan 11, 2012 at 1:18 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2012/1/11 Robert Haas robertmh...@gmail.com: On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com wrote: On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. On that note, here's an updated version of the patch I posted upthread, with some regression tests and minimal documentation. I like this patch and this feature. I see only one issue - there is not functionality that helps generate JSON in pg. What do you think about functions: array_to_json(anyarray), row_to_json(any) and format_json(text, text, ...) I think we might want all of that stuff, but I doubt there is time to do it for 9.2. Actually, I think the next logical step would be to define equality (is there an official definition of that for JSON?) and build a btree opclass. I believe the code I've already written could be extended to construct an abstract syntax tree for those operations that need it. But we need to make some decisions first. A btree opclass requires a total ordering, so we have to arbitrarily define whether 1 true, 1 [1], 1 1, etc. I don't understand why we have to do it? We don't support similar functionality for XML, so why for JSON? Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Wed, Jan 11, 2012 at 8:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I don't understand why we have to do it? We don't support similar functionality for XML, so why for JSON? Hrm. Well, that's an interesting point. Maybe we don't. I assumed that people would eventually want to optimize queries of the form SELECT whatever FROM tab WHERE jsoncol = 'constant'. If that's a sufficiently marginal use case that we don't care, then fine. One difference between JSON and XML is that XML really has no well-defined comparison semantics. For example, consider: foobar1.0/bar/foo foobar1.0/bar /foo If the XML is being used as a transport mechanism, then the extra space is semantically insignificant, but if this is markup, then it might matter a lot. Also, consider: foobar1.00/bar/foo That one might be equal if we think 1.0 is intended to be a number, but if it's intended as a string then it's not. We could perhaps do comparisons in XML relative to some DTD or schema if those provide details about what the values mean, but in a vacuum it's not well-defined. On the other hand, in JSON, it's pretty clear that { 1, 2, 3 } is the same value as {1,2,3} but 1,2,3 is different from 1, 2, 3. There are some borderline cases that might need some sweat, like whether 1 = 1.0 = 1.00 = 1e0, but in general the level of ambiguity seems to me to be much less, making it more feasible here than it would be for XML. That having been said, uncertainties about whether we want this at all (and if so in what form) are exactly why I didn't include this kind of stuff in the patch to begin with, and I think that if we get this much committed for 9.2 we'll be doing pretty well. If we can agree on and do more, great; if not, we'll at least have this much, which IMHO would be an improvement over what we have now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/11 Robert Haas robertmh...@gmail.com: On Wed, Jan 11, 2012 at 8:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I don't understand why we have to do it? We don't support similar functionality for XML, so why for JSON? Hrm. Well, that's an interesting point. Maybe we don't. I assumed that people would eventually want to optimize queries of the form SELECT whatever FROM tab WHERE jsoncol = 'constant'. If that's a sufficiently marginal use case that we don't care, then fine. One difference between JSON and XML is that XML really has no well-defined comparison semantics. For example, consider: foobar1.0/bar/foo foobar1.0/bar /foo If the XML is being used as a transport mechanism, then the extra space is semantically insignificant, but if this is markup, then it might matter a lot. Also, consider: foobar1.00/bar/foo That one might be equal if we think 1.0 is intended to be a number, but if it's intended as a string then it's not. We could perhaps do comparisons in XML relative to some DTD or schema if those provide details about what the values mean, but in a vacuum it's not well-defined. On the other hand, in JSON, it's pretty clear that { 1, 2, 3 } is the same value as {1,2,3} but 1,2,3 is different from 1, 2, 3. There are some borderline cases that might need some sweat, like whether 1 = 1.0 = 1.00 = 1e0, but in general the level of ambiguity seems to me to be much less, making it more feasible here than it would be for XML. That having been said, uncertainties about whether we want this at all (and if so in what form) are exactly why I didn't include this kind of stuff in the patch to begin with, and I think that if we get this much committed for 9.2 we'll be doing pretty well. If we can agree on and do more, great; if not, we'll at least have this much, which IMHO would be an improvement over what we have now. I understand it now. My opinion is so some operators and index search can be in 9.2 - so use a JSON just as communication format now. * we need to build JSON * we need to check if some is valid JSON * we need to store JSON other steps should be (9.2) * basic operators eq, neeq * some tool like XQuery - simple query on JSON document available from SQL that can be used for functional indexes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I understand it now. My opinion is so some operators and index search can be in 9.2 - so use a JSON just as communication format now. * we need to build JSON * we need to check if some is valid JSON * we need to store JSON other steps should be (9.2) * basic operators eq, neeq * some tool like XQuery - simple query on JSON document available from SQL that can be used for functional indexes. That would be nice, but let's not let the perfect be the enemy of the good. We don't have a lot of time here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/11 Robert Haas robertmh...@gmail.com: On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I understand it now. My opinion is so some operators and index search can be in 9.2 - so use a JSON just as communication format now. * we need to build JSON * we need to check if some is valid JSON * we need to store JSON other steps should be (9.2) * basic operators eq, neeq * some tool like XQuery - simple query on JSON document available from SQL that can be used for functional indexes. That would be nice, but let's not let the perfect be the enemy of the good. We don't have a lot of time here. sorry - replace 9.2 by 9.3 - I am sorry I am able to write array_to_json fce and Andrew can write query_to_json Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
I am able to write array_to_json fce and Andrew can write query_to_json +1 Thanks guys... We are using a lot of JSON as communication protocol... having core support for JSON, And those functions, will be a real life saver... Many thanks, Misa Sent from my Windows Phone From: Pavel Stehule Sent: 11/01/2012 16:22 To: Robert Haas Cc: David E. Wheeler; Claes Jakobsson; Dimitri Fontaine; Merlin Moncure; Andrew Dunstan; Magnus Hagander; Jan Urbański; Simon Riggs; Joey Adams; Bruce Momjian; PostgreSQL-development Hackers; Jan Wieck Subject: Re: [HACKERS] JSON for PG 9.2 2012/1/11 Robert Haas robertmh...@gmail.com: On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I understand it now. My opinion is so some operators and index search can be in 9.2 - so use a JSON just as communication format now. * we need to build JSON * we need to check if some is valid JSON * we need to store JSON other steps should be (9.2) * basic operators eq, neeq * some tool like XQuery - simple query on JSON document available from SQL that can be used for functional indexes. That would be nice, but let's not let the perfect be the enemy of the good. We don't have a lot of time here. sorry - replace 9.2 by 9.3 - I am sorry I am able to write array_to_json fce and Andrew can write query_to_json Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/11/2012 10:21 AM, Pavel Stehule wrote: 2012/1/11 Robert Haasrobertmh...@gmail.com: On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehulepavel.steh...@gmail.com wrote: I understand it now. My opinion is so some operators and index search can be in 9.2 - so use a JSON just as communication format now. * we need to build JSON * we need to check if some is valid JSON * we need to store JSON other steps should be (9.2) * basic operators eq, neeq * some tool like XQuery - simple query on JSON document available from SQL that can be used for functional indexes. That would be nice, but let's not let the perfect be the enemy of the good. We don't have a lot of time here. sorry - replace 9.2 by 9.3 - I am sorry I am able to write array_to_json fce and Andrew can write query_to_json For those who want to play along, see https://bitbucket.org/adunstan/pgdevel which has Robert's patch and my additions to it. I'm actually half way through writing an array_to_json function, since it it necessary anyway for query_to_json. I hope to have a fairly complete working function in about 24 hours. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com wrote: On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. On that note, here's an updated version of the patch I posted upthread, with some regression tests and minimal documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company json-rmh-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2012/1/11 Robert Haas robertmh...@gmail.com: On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com wrote: On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. On that note, here's an updated version of the patch I posted upthread, with some regression tests and minimal documentation. I like this patch and this feature. I see only one issue - there is not functionality that helps generate JSON in pg. What do you think about functions: array_to_json(anyarray), row_to_json(any) and format_json(text, text, ...) Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Let me mention another lightweight data-interchange format. At http://www.janestreet.com we have developed a small c module to deal with S-expressions (sexp) as a way to store arbitrary data. As we write most of our code in OCaml sexps are a natural way for us to store data. http://hg.ocaml.info/release/sexplib/ provides automatic ways to convert any ocaml value into a sexp). The extension is still pretty new but we use it successfully on a daily basis. After we have upgraded to 9.x we will pack it as an extension and releast it opensource. API wise the module at the moment offers the following: sexp_validate(text) returns boolean Validate that the passed in text is a valid s expression. create domain sexp as text check (sexp_validate(value)); BTW: It is a PITA that arrays of domains are not valid types. And several functions to manipulate take apart sexp's or modify sexp's using a path into the sexp (similar to what xpath does for xml). Such as: sexp_get(sexp, text) returns sexp Get the sub sexp of sexp identified by the path. Returns NULL if path is not a valid path in sexp. Example: path=.a space.b.[1].x ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) ))) - The Value And sexp_get_atom(sexp, text) returns text Get the sub atom of sexp identified by the path. Returns NULL if path is not a valid path in sexp or does not identify an atom. Example: path=.a space.b.[1].x ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) ))) ^ - The Value Cheers, Bene On 20/12/11 19:39, Claes Jakobsson wrote: On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote: On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? if b) then this might get a lot more interesting Cheers, Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
David E. Wheeler da...@justatheory.com writes: holds: it is not a programming language, and one does not need a PL to have a JSON data type. Exactly. That does not contradict the fact that if you have pl/ecmascript you already have JSON. And that we might as well have had the ecmascript PL for some time now, we just need to check about that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tuesday, December 20, 2011 07:23:43 PM Dimitri Fontaine wrote: David E. Wheeler da...@justatheory.com writes: holds: it is not a programming language, and one does not need a PL to have a JSON data type. Exactly. That does not contradict the fact that if you have pl/ecmascript you already have JSON. And that we might as well have had the ecmascript PL for some time now, we just need to check about that. Not really. You need to be able to evaluate json without it possibly executing code. Many js implementations are likely to have such a feature though. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote: On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? if b) then this might get a lot more interesting Cheers, Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Dec 19, 2011 at 5:49 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: My understanding is that JSON is a subset of ECMAscript, so if you get the latter you already have the former. Now, someone would have to check if plscheme still build with guile-2.0, and given that, how exactly you get pl/ecmascript (or pl/js) out of that. I don't think so. I checked it out (still on pgfoundry, still on CVS, and code hasn't been touched since 2008), and run into some issues. - It looks for libguile.h #include libguile.h which, on 2.0, has shifted around from /usr/include/libguile.h (1.8) to /usr/include/guile/2.0/libguile.h It's not doing enough indirections internally; there is a guile-config that is analogous to pg_config postgres@cbbrowne [03:48:43] [~/PostgreSQL/plscheme] - % guile-config compile -pthread -I/usr/include/guile/2.0 postgres@cbbrowne [03:48:45] [~/PostgreSQL/plscheme] - % guile-config link -lguile-2.0 -lgc It looks like there's something PG-related as a next issue: - % ./install.sh pg_config : /var/lib/postgresql/dbs/postgresql-HEAD/bin/pg_config module-dir: /var/lib/postgresql/dbs/postgresql-HEAD/lib max-cache-size: 64 dbname: postgres safe-r5rs : NO dbacreate : NO PSQL : /var/lib/postgresql/dbs/postgresql-HEAD/bin/psql postgres CPPFLAGS : -g -Wall -fpic -c -I/var/lib/postgresql/dbs/postgresql-HEAD/include/server -I/usr/include/guile/2.0 LDFLAGS : -shared -lguile Compiling... failed! plscheme.c: In function '_PG_init': plscheme.c:647:2: warning: implicit declaration of function 'DefineCustomStringVariable' [-Wimplicit-function-declaration] plscheme.c:650:30: error: 'PGC_BACKEND' undeclared (first use in this function) plscheme.c:650:30: note: each undeclared identifier is reported only once for each function it appears in plscheme.c:652:2: warning: implicit declaration of function 'DefineCustomIntVariable' [-Wimplicit-function-declaration] plscheme.c: In function 'plscheme_func_handler': plscheme.c:742:2: warning: implicit declaration of function 'GetTopTransactionId' [-Wimplicit-function-declaration] plscheme.c: In function 'parse_trig_args': plscheme.c:1623:44: error: dereferencing pointer to incomplete type plscheme.c:1628:38: error: dereferencing pointer to incomplete type ... (See error.log for details.) I'm not sure to what degree this is bitrot relating to: a) Postgres changes b) Guile changes but there's doubtless a bit of both. I'd think it interesting to get this back to working order, whether it's useful for JavaScript or not. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 9:00 PM, Robert Haas wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. +1 Sounds good to me. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 2:13 AM, Magnus Hagander wrote: Yes, that's why I said few not none. Though in my experience, most companies are a lot more restrictive about addons to their database than addons to their development environments. Yeah, we’re getting off-topic here, so I’ll just say something we can agree on: We’ll see. I do still want to see some processes for getting PGXN distributions into RPM/.deb/StackBuilder, though. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Dec 20, 2011 at 00:26, David E. Wheeler da...@kineticode.com wrote: On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote: We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN. Yes, that's why I said few not none. Though in my experience, most companies are a lot more restrictive about addons to their database than addons to their development environments. And note that it's not PGXN that's the problem I'm pointing at, neither is it CPAN or easy_install or gem. The problem is the vulnerability of the addon, and the maintenance. Meaning if it has a single maintainer, that's a whole different thing from being maintained by the PGDG. Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from. I think we're talking about different things here. While we can certainly provide support on specific modules, after that is entered into the agreement with the customer, we won't support a customer who just calls up and says hey, I'm using module xyz which you've never heard of, and it crashes my database, please come fix it now. Are you saying you do that - providing SLAs, 24/7 and similar things, on modules you didn't even know the customer was using? And FWIW, I'm talking about the quality, and not the ecosystem as well. I'm just saying it takes a lot more work to verify the quality and maintenance of an external module - if it's part of postgresql, you have *already* got a quality stamp and a maintenance promise from that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Dec 20, 2011 at 06:00, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. +1 for getting the basics in first, and then adding more to it later. There's still a fair amount of time to do that for 9.2, but not if we get stuck bikeshedding again... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Merlin Moncure mmonc...@gmail.com writes: Getting back on point, I'm curious about your statement: without writing a single line of C. I took a look at the pl/scheme docs and was pretty impressed -- what exactly would be involved to get a guile-based ECMAscript working over the pl/scheme implementation? How would that interact exactly with the stated topic -- JSON support? Do you even need a json type if you have strong library based parsing and composition features? My understanding is that JSON is a subset of ECMAscript, so if you get the latter you already have the former. Now, someone would have to check if plscheme still build with guile-2.0, and given that, how exactly you get pl/ecmascript (or pl/js) out of that. I won't be in a position to spend time on that this year… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote: We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN. Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from. If we can find a way to have a stable part in core and then have addons that can provide these tons of interesting features (which I agree there are) until such time that they can be considered stable enough for core, I think that's the best compromise. +1, though I think the core type will at least need some basic operators and indexing support. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 3:39 PM, David E. Wheeler wrote: Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. Bah, it says “It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999.” But my point still holds: it is not a programming language, and one does not need a PL to have a JSON data type. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ IOW, one does not need a new PL to get this type. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 18/12/11 04:21, Robert Haas wrote: On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler da...@kineticode.com wrote: Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here: Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions? There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries. That's fair enough, but we've had *many* requests for this functionality in core, I don't see what we lose by having at least some basic functionality built in. I think having a JSON data type in core would drastically limit the exposure third-party JSON extensions would get and that's bad. There are tons of interesting features a JSON type could have and tying its development to a one year release cycle might be a disservice both for people who are willing to provide these features earlier, the users which are faced with a choice between a fast-moving third-party addon and a blessed core type and would cause overall confusion. How about we try the tsearch way and let JSON extensions live outside core for some time and perhaps if one emerges dominant and would benefit from inclusion then consider it? If we keep treating extensions as second-class citizens, they'll never get the mindshare and importance we seem to want for them (or otherwise why go through all the trouble to provide an infrastructure for them). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
2011/12/18 Jan Urbański wulc...@wulczer.org: On 18/12/11 04:21, Robert Haas wrote: On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler da...@kineticode.com wrote: Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here: Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions? There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries. That's fair enough, but we've had *many* requests for this functionality in core, I don't see what we lose by having at least some basic functionality built in. I think having a JSON data type in core would drastically limit the exposure third-party JSON extensions would get and that's bad. There are tons of interesting features a JSON type could have and tying its development to a one year release cycle might be a disservice both for people who are willing to provide these features earlier, the users which are faced with a choice between a fast-moving third-party addon and a blessed core type and would cause overall confusion. How about we try the tsearch way and let JSON extensions live outside core for some time and perhaps if one emerges dominant and would benefit from inclusion then consider it? it should be contrib modules Pavel If we keep treating extensions as second-class citizens, they'll never get the mindshare and importance we seem to want for them (or otherwise why go through all the trouble to provide an infrastructure for them). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Robert Haas robertmh...@gmail.com writes: On Sat, Dec 17, 2011 at 5:02 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I'd like to add some confusion on the implementation choice, because it looks damn too easy now… Guile 2.0 offers an implementation of the ECMAscript language and plscheme already exists as a PostgreSQL PL extension for integrating with Guile. It seems like the licensing there could potentially be problematic. It's GPL with a linking exception. Not sure we want to go there. It's LGPL so it's compatible (only the readline part is subject to GPL, we're familiar enough with that though). http://www.gnu.org/software/guile/docs/docs-2.0/guile-ref/Guile-License.html The Guile library (libguile) and supporting files are published under the terms of the GNU Lesser General Public License version 3 or later. See the files COPYING.LESSER and COPYING. C code linking to the Guile library is subject to terms of that library. Basically such code may be published on any terms, provided users can re-link against a new or modified version of Guile. Scheme level code written to be run by Guile (but not derived from Guile itself) is not restricted in any way, and may be published on any terms. We encourage authors to publish on Free terms. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Dec 18, 2011 at 10:49, Jan Urbański wulc...@wulczer.org wrote: On 18/12/11 04:21, Robert Haas wrote: On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler da...@kineticode.com wrote: Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here: Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions? There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries. That's fair enough, but we've had *many* requests for this functionality in core, I don't see what we lose by having at least some basic functionality built in. I think having a JSON data type in core would drastically limit the exposure third-party JSON extensions would get and that's bad. There are The same way that having replication in core is bad for the rest of the replication engines? While it has certainly decreased the usage of for example Slony, I don't think anybody can say it's a bad thing that we have this in core... And of course, *not* having it in core, we didn't have people claiming for many years that postgres has no replication or anything like that... The fact is that a *lot* of our users, particularly in large companies, will never install an extension that's not part of core. Just look at other discussions about it even being a problem with it being in *contrib*, which is still maintained and distributed by the same developers. We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) So I'm not sure it would really lessen the exposure much at all - those that are willing to install such extensions already, are surely capable of finding it themselves (using pgxn for example - or even google) tons of interesting features a JSON type could have and tying its development to a one year release cycle might be a disservice both for people who are willing to provide these features earlier, the users which are faced with a choice between a fast-moving third-party addon and a blessed core type and would cause overall confusion. And the other option would be to *only* have a fast-moving third-party addon, which simply disqualifies it completely in many environments. Keeping it as a third party addon is better for the developer. Keeping it in core is better for the user (if the user is a large company - not a hacker). If we can find a way to have a stable part in core and then have addons that can provide these tons of interesting features (which I agree there are) until such time that they can be considered stable enough for core, I think that's the best compromise. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sat, Dec 17, 2011 at 4:02 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: to add some confusion on the implementation choice, because it looks damn too easy now… Guile 2.0 offers an implementation of the ECMAscript language and plscheme already exists as a PostgreSQL PL extension for integrating with Guile. TBH, I think that's PFC (pretty cool). On Sun, Dec 18, 2011 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote: We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) 100% agree on all points. with the new extension system, contrib modules that are packaged with the core system can be considered to be in core because they are: *) documented in standard docs *) supported and bugfixed with postgresql releases *) ready to be used without compiler support or even shell access through most binary distributions One small note about the json type being an extension -- this probably means the json type oid won't be fixed -- not a huge deal but it could affect some corner cases with binary format consumers. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 12/18/2011 12:17 PM, Merlin Moncure wrote: One small note about the json type being an extension -- this probably means the json type oid won't be fixed -- not a huge deal but it could affect some corner cases with binary format consumers. Why would that matter more for JSON than for any other non-core type? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Dec 18, 2011 at 11:21 AM, Andrew Dunstan and...@dunslane.net wrote: On 12/18/2011 12:17 PM, Merlin Moncure wrote: One small note about the json type being an extension -- this probably means the json type oid won't be fixed -- not a huge deal but it could affect some corner cases with binary format consumers. Why would that matter more for JSON than for any other non-core type? well, it's a minor headache for all the oid-isn't-in-pgtypes.h types, and only then for high traffic types (which presumably json will be). a while back we coded up a reworked dblink that was variadic and could optionally transfer data between database with the binary wire format. any container of a user defined (by oid) type had to be sent strictly as text which is a big performance hit for certain types. recent postgres has an undocumented facility to force type oids to a particular value, but the type definition being inside the create extension script makes this problematic. this is a pretty far out objection though, and I could certainly work around the problem if necessary, but there is some dependency on pg_types.h in the wild. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Merlin Moncure mmonc...@gmail.com writes: Why would that matter more for JSON than for any other non-core type? well, it's a minor headache for all the oid-isn't-in-pgtypes.h types, and only then for high traffic types (which presumably json will be). Extensions are going to be more and more used and “pervasive” in next years, and binary wire transfers is a good goal. What about creating something like the PostgreSQL types IANA? New type authors would register their OID and as a benefit would get listed on some public reference sheet, and we could add some mechanism so that default CREATE TYPE calls will not use reserved OID numbers. Then it would be all cooperative only, so not a security thing, just a way to ease binary and extension co-existence. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Dec 18, 2011 at 12:26 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Merlin Moncure mmonc...@gmail.com writes: Why would that matter more for JSON than for any other non-core type? well, it's a minor headache for all the oid-isn't-in-pgtypes.h types, and only then for high traffic types (which presumably json will be). Extensions are going to be more and more used and “pervasive” in next years, and binary wire transfers is a good goal. What about creating something like the PostgreSQL types IANA? New type authors would register their OID and as a benefit would get listed on some public reference sheet, and we could add some mechanism so that default CREATE TYPE calls will not use reserved OID numbers. Then it would be all cooperative only, so not a security thing, just a way to ease binary and extension co-existence. I think that's a fabulous idea,although we're drifting off the stated topic here. Getting back on point, I'm curious about your statement: without writing a single line of C. I took a look at the pl/scheme docs and was pretty impressed -- what exactly would be involved to get a guile-based ECMAscript working over the pl/scheme implementation? How would that interact exactly with the stated topic -- JSON support? Do you even need a json type if you have strong library based parsing and composition features? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sat, Dec 17, 2011 at 2:26 AM, Robert Haas robertmh...@gmail.com wrote: In the spirit of Simon's suggestion that we JFDI, I cooked up a patch today that JFDI. See attached. Which looks very good. Comments * Comment for IDENTIFICATION of json.c says contrib/json/json.c * json.c contains a duplicate of a line from header file extern Datum json_in(PG_FUNCTION_ARGS); And additionally, a quote from our fine manual... Caution: Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xpath() in particular. so I think this approach works for JSON too. Adding tests and docs is a must, nothing else is right now. Once we have this, others can add the bells and whistles, possibly in 9.2 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Hi, Peter Eisentraut pete...@gmx.net writes: The way forward here is to maintain this as an extension, provide debs and rpms, and show that that is maintainable. I can see numerous advantages in maintaining a PL outside the core; especially if you are still starting up and want to iterate quickly. I'd like to add some confusion on the implementation choice, because it looks damn too easy now… Guile 2.0 offers an implementation of the ECMAscript language and plscheme already exists as a PostgreSQL PL extension for integrating with Guile. http://plscheme.projects.postgresql.org/ http://wingolog.org/archives/2009/02/22/ecmascript-for-guile http://packages.debian.org/sid/guile-2.0 http://www.gnu.org/software/guile/ Guile is an extension language platform Guile is an efficient virtual machine that executes a portable instruction set generated by its optimizing compiler, and integrates very easily with C and C++ application code. In addition to Scheme, Guile includes compiler front-ends for ECMAScript and Emacs Lisp (support for Lua is underway), which means your application can be extended in the language (or languages) most appropriate for your user base. And Guile's tools for parsing and compiling are exposed as part of its standard module set, so support for additional languages can be added without writing a single line of C. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers