Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Robert Haas
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net wrote:
 So my order of preference for the options would be:

 1. Have the JSON type collapse objects so the last instance of a key wins
 and is actually stored

 2. Throw an error when a JSON type has duplicate keys

 3. Have the accessors find the last instance of a key and return that
 value

 4. Let things remain as they are now

 On second though, I don't like 4 at all. It means that the JSON type
 things a value is valid while the accessor does not. They contradict one
 another.

 You can forget 1. We are not going to have the parser collapse anything.
 Either the JSON it gets is valid or it's not. But the parser isn't going to
 try to MAKE it valid.

Why not?  Because it's the wrong thing to do, or because it would be slower?

What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
 numeric
-
  10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Craig Ringer
On 03/13/2013 08:17 PM, Robert Haas wrote:
 I think Andrew and I had envisioned this as basically a text data type
 that enforces some syntax checking on its input, hence the current
 design.  But I'm not sure that's the ONLY sensible design.
We're probably stuck with it at this point, but it may well be worth
considering the later introduction of a compatible `jsonobj` that stores
parsed and normalized json objects in some internal format the client
doesn't have to care about, like serialized V8 JS VM objects.

I suspect that such a type is better offered by a contrib until/unless
PL/V8 or a similar becomes a core language. It'd be nuts to try to
re-implement all of the JSON and javascript object functionality in a
javascript engine when we can just plug an existing one in and use its
JSON and javascript object manipulation. The minimalist approach makes
sense for the json type precisely because it's just validated text, but
I don't think it makes sense to continually extend it and slowly
reinvent a whole javascript engine in Pg.

If we're going to do things like normalizing json I think that's a job
for a real JavaScript engine that understands Javascript objects.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 08:17 AM, Robert Haas wrote:

On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key wins
and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They contradict one
another.

You can forget 1. We are not going to have the parser collapse anything.
Either the JSON it gets is valid or it's not. But the parser isn't going to
try to MAKE it valid.

Why not?  Because it's the wrong thing to do, or because it would be slower?

What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other implementations 
allow duplicate keys, b) it's trivially easy to make Postgres generate 
such json, and c) there is some dispute about exactly what the spec 
mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andres Freund
On 2013-03-13 20:48:57 +0800, Craig Ringer wrote:
 On 03/13/2013 08:17 PM, Robert Haas wrote:
  I think Andrew and I had envisioned this as basically a text data type
  that enforces some syntax checking on its input, hence the current
  design.  But I'm not sure that's the ONLY sensible design.
 We're probably stuck with it at this point, but it may well be worth
 considering the later introduction of a compatible `jsonobj` that stores
 parsed and normalized json objects in some internal format the client
 doesn't have to care about, like serialized V8 JS VM objects.
 
 I suspect that such a type is better offered by a contrib until/unless
 PL/V8 or a similar becomes a core language. It'd be nuts to try to
 re-implement all of the JSON and javascript object functionality in a
 javascript engine when we can just plug an existing one in and use its
 JSON and javascript object manipulation. The minimalist approach makes
 sense for the json type precisely because it's just validated text, but
 I don't think it makes sense to continually extend it and slowly
 reinvent a whole javascript engine in Pg.

While I am not convinced - but not the contrary either - that using
something like V8 is a good idea, I wish the patch adding json had
reserved the first byte in the varlena for the 'json encoding' or
something similar. That would have left the road open for easily adding
different encodings in the future. Now youre left of marking it with a
nullbyte in the beginning or similar atrocities...

Just wanted to say that we might want to think about such stuff now that
we preserve cross-version compatibility of on-disk data the next time we
add a type.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:

 What I think is tricky here is that there's more than one way to
 conceptualize what the JSON data type really is.  Is it a key-value
 store of sorts, or just a way to store text values that meet certain
 minimalist syntactic criteria?  I had imagined it as the latter, in
 which case normalization isn't sensible.  But if you think of it the
 first way, then normalization is not only sensible, but almost
 obligatory.

That makes a lot of sense. Given the restrictions I tend to prefer in my 
database data types, I had imagined it as the former. And since I'm using it 
now to store key/value pairs (killing off some awful EAV implementations in the 
process, BTW), I certainly think of it more formally as an object.


But I can live with the other interpretation, as long as the differences are 
clearly understood and documented. Perhaps a note could be added to the docs 
explaining this difference, and what one can do to adapt for it. A normalizing 
function would certainly help.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Gavin Flower

On 14/03/13 02:02, Andrew Dunstan wrote:


On 03/13/2013 08:17 AM, Robert Haas wrote:
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net 
wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a 
key wins

and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They 
contradict one

another.
You can forget 1. We are not going to have the parser collapse 
anything.
Either the JSON it gets is valid or it's not. But the parser isn't 
going to

try to MAKE it valid.
Why not?  Because it's the wrong thing to do, or because it would be 
slower?


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other 
implementations allow duplicate keys, b) it's trivially easy to make 
Postgres generate such json, and c) there is some dispute about 
exactly what the spec mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




How about adding a new function with '_strict' added to the existing 
name, with an extra parameter 'coalesce' - or using other names, if 
considered more appropriate!


That way slower more stringent functionality can be added where 
required.  This way, the existing function need not be changed.


If coalesce = true,
then: the last duplicate is used
else: an error is returned when the new key is a duplicate.


Cheers,
Gavin





Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 12:51 PM, Gavin Flower wrote:

On 14/03/13 02:02, Andrew Dunstan wrote:


On 03/13/2013 08:17 AM, Robert Haas wrote:
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan and...@dunslane.net 
wrote:

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a 
key wins

and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. They 
contradict one

another.
You can forget 1. We are not going to have the parser collapse 
anything.
Either the JSON it gets is valid or it's not. But the parser isn't 
going to

try to MAKE it valid.
Why not?  Because it's the wrong thing to do, or because it would be 
slower?


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
-
   10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.




I think we've moved on from this point, because a) other 
implementations allow duplicate keys, b) it's trivially easy to make 
Postgres generate such json, and c) there is some dispute about 
exactly what the spec mandates.


I'll be posting a revised patch shortly that doesn't error out but 
simply uses the value for the later key lexically.


cheers

andrew




How about adding a new function with '_strict' added to the existing 
name, with an extra parameter 'coalesce' - or using other names, if 
considered more appropriate!


That way slower more stringent functionality can be added where 
required.  This way, the existing function need not be changed.


If coalesce = true,
then: the last duplicate is used
else: an error is returned when the new key is a duplicate.






For good or ill, we now already have a json type that will accept 
strings with duplicate keys, and generator functions which can now 
generate such strings. If someone wants functions to enforce a stricter 
validity check (e.g. via a check constraint on a domain), or to convert 
json to a canonical version which strips out prior keys of the same name 
and their associated values, then these should be relatively simple to 
implement given the parser API in the current patch. But they aren't 
part of the current patch, and I think it's way too late to be adding 
such things. I have been persuaded by arguments made upthread that the 
best thing to do is exactly what other well known json-accepting 
implementations do (e.g. V8), which is to accept json with duplicate 
keys and to treat the later key/value as overriding the former 
key/value. If I'd done that from the start nobody would now be talking 
about this at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote:

 If someone wants functions to enforce a stricter validity check (e.g. via a 
 check constraint on a domain), or to convert json to a canonical version 
 which strips out prior keys of the same name and their associated values, 
 then these should be relatively simple to implement given the parser API in 
 the current patch. But they aren't part of the current patch, and I think 
 it's way too late to be adding such things.

I think it might be good to get something like this into core eventually, 
otherwise I suspect that there will be a different version of it for every 
JSON-using project out there. And my first cut at it won’t descend into 
sub-objects.

 I have been persuaded by arguments made upthread that the best thing to do is 
 exactly what other well known json-accepting implementations do (e.g. V8), 
 which is to accept json with duplicate keys and to treat the later key/value 
 as overriding the former key/value. If I'd done that from the start nobody 
 would now be talking about this at all.

That’s true, though I might have started thinking about a canonicalizing 
function before long. :-)

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Andrew Dunstan


On 03/13/2013 01:50 PM, David E. Wheeler wrote:

On Mar 13, 2013, at 10:45 AM, Andrew Dunstan and...@dunslane.net wrote:


If someone wants functions to enforce a stricter validity check (e.g. via a 
check constraint on a domain), or to convert json to a canonical version which 
strips out prior keys of the same name and their associated values, then these 
should be relatively simple to implement given the parser API in the current 
patch. But they aren't part of the current patch, and I think it's way too late 
to be adding such things.

I think it might be good to get something like this into core eventually, 
otherwise I suspect that there will be a different version of it for every 
JSON-using project out there. And my first cut at it won’t descend into 
sub-objects.



The you wouldn't be doing it right. The whole thing about a recursive 
descent parser is that it's, well, recursive.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread David E. Wheeler
On Mar 13, 2013, at 10:59 AM, Andrew Dunstan and...@dunslane.net wrote:

 And my first cut at it won’t descend into sub-objects.
 
 
 The you wouldn't be doing it right. The whole thing about a recursive descent 
 parser is that it's, well, recursive.

Right, but it would serve my immediate needs. I have a column that just stores 
key/value pairs with no nesting. So I know I can write something like this and 
have it be good enough:

create or replace function json_smash(
json
) RETURNS JSON language SQL STRICT IMMUTABLE AS $$
SELECT format('{%s}', array_to_string(ARRAY(
SELECT format('%s: %s', to_json(key), value)
  FROM (
  SELECT key, value, row_number() OVER (
 partition by key order by rnum desc
 ) AS rnum
FROM (
SELECT key, value, row_number() OVER (
   partition by key
   ) AS rnum
  FROM json_each($1)
) a
 ) b
 WHERE rnum = 1
), ','))::json;
$$;

And do you really want to see that unloosed on the world? :-P (Yes, I know 
there is no guarantee on the order of rows returned by json_each()).

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-13 Thread Hannu Krosing

On 03/13/2013 12:40 PM, David E. Wheeler wrote:

On Mar 13, 2013, at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:


What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.

That makes a lot of sense. Given the restrictions I tend to prefer in my 
database data types, I had imagined it as the former. And since I'm using it 
now to store key/value pairs (killing off some awful EAV implementations in the 
process, BTW), I certainly think of it more formally as an object.


But I can live with the other interpretation, as long as the differences are 
clearly understood and documented. Perhaps a note could be added to the docs 
explaining this difference, and what one can do to adapt for it. A normalizing 
function would certainly help.
I guess the easiest and most generic way to normalize is to actually 
convert to some internal representation and back.


in pl/python this would look like this:

hannu=# create function normalize(IN ij json, OUT oj json) language 
plpythonu as $$

import json
return json.dumps(json.loads(ij))
$$;
CREATE FUNCTION
hannu=# select normalize('{a:1, a:b, a:true}');
  normalize
-
 {a: true}
(1 row)

If we could want to be really fancy we could start storing our json in 
some format which
is faster to parse, like tnetstrings, but probably it is too late in 
release cycle to change this now.


Hannu


Best,

David





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Robert Haas
On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com wrote:
 In the spirit of being liberal about what we accept but strict about what we 
 store, it seems to me that JSON object key uniqueness should be enforced 
 either by throwing an error on duplicate keys, or by flattening so that the 
 latest key wins (as happens in JavaScript). I realize that tracking keys will 
 slow parsing down, and potentially make it more memory-intensive, but such is 
 the price for correctness.

I'm with Andrew.  That's a rathole I emphatically don't want to go
down.  I wrote this code originally, and I had the thought clearly in
mind that I wanted to accept JSON that was syntactically well-formed,
not JSON that met certain semantic constraints.  We could add
functions like json_is_non_stupid(json) so that people can easily add
a CHECK constraint that enforces this if they so desire.  But
enforcing it categorically seems like a bad plan, especially since at
this point it would require a compatibility break with previous
releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Hannu Krosing

On 03/08/2013 09:39 PM, Robert Haas wrote:

On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com wrote:

In the spirit of being liberal about what we accept but strict about what we 
store, it seems to me that JSON object key uniqueness should be enforced either 
by throwing an error on duplicate keys, or by flattening so that the latest key 
wins (as happens in JavaScript). I realize that tracking keys will slow parsing 
down, and potentially make it more memory-intensive, but such is the price for 
correctness.

I'm with Andrew.  That's a rathole I emphatically don't want to go
down.  I wrote this code originally, and I had the thought clearly in
mind that I wanted to accept JSON that was syntactically well-formed,
not JSON that met certain semantic constraints.


If it does not meet these semantic constraints, then it is not
really JSON - it is merely JSON-like.

this sounds very much like MySQLs decision to support timestamp
-00-00 00:00 - syntactically correct, but semantically wrong.


We could add
functions like json_is_non_stupid(json) so that people can easily add
a CHECK constraint that enforces this if they so desire.  But
enforcing it categorically seems like a bad plan, especially since at
this point it would require a compatibility break with previous
releases

If we ever will support real spec-compliant JSON (maybe based
on recursive hstore ?) then there will be a compatibility break
anyway, so why not do it now.

Or do you seriously believe that somebody is using PostgreSQL JSON
to store these kind of json documents

Cheers
Hannu Krosing



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Alvaro Herrera
Hannu Krosing escribió:
 On 03/08/2013 09:39 PM, Robert Haas wrote:
 On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com 
 wrote:
 In the spirit of being liberal about what we accept but strict about what 
 we store, it seems to me that JSON object key uniqueness should be enforced 
 either by throwing an error on duplicate keys, or by flattening so that the 
 latest key wins (as happens in JavaScript). I realize that tracking keys 
 will slow parsing down, and potentially make it more memory-intensive, but 
 such is the price for correctness.
 I'm with Andrew.  That's a rathole I emphatically don't want to go
 down.  I wrote this code originally, and I had the thought clearly in
 mind that I wanted to accept JSON that was syntactically well-formed,
 not JSON that met certain semantic constraints.
 
 If it does not meet these semantic constraints, then it is not
 really JSON - it is merely JSON-like.
 
 this sounds very much like MySQLs decision to support timestamp
 -00-00 00:00 - syntactically correct, but semantically wrong.

Is it wrong?  The standard cited says SHOULD, not MUST.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread David E. Wheeler
On Mar 8, 2013, at 1:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 If it does not meet these semantic constraints, then it is not
 really JSON - it is merely JSON-like.
 
 this sounds very much like MySQLs decision to support timestamp
 -00-00 00:00 - syntactically correct, but semantically wrong.
 
 Is it wrong?  The standard cited says SHOULD, not MUST.

Yes, it is wrong, because multiple keys are specifically disallowed for 
accessing values. Hence this new error:

   david=# select json_get('{foo: 1, foo: 2}', 'foo');
   ERROR:  field name is not unique in json object

I really don’t think that should be possible.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Gavin Flower
Well I would much prefer to find out sooner rather than later that there 
is a problem, so I would much prefer know I've created a duplicate as 
soon as the system can detect it.  In general, Postgresql appears much 
better at this than MySQL



On 09/03/13 10:01, Alvaro Herrera wrote:

Hannu Krosing escribió:

On 03/08/2013 09:39 PM, Robert Haas wrote:

On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com wrote:

In the spirit of being liberal about what we accept but strict about what we 
store, it seems to me that JSON object key uniqueness should be enforced either 
by throwing an error on duplicate keys, or by flattening so that the latest key 
wins (as happens in JavaScript). I realize that tracking keys will slow parsing 
down, and potentially make it more memory-intensive, but such is the price for 
correctness.

I'm with Andrew.  That's a rathole I emphatically don't want to go
down.  I wrote this code originally, and I had the thought clearly in
mind that I wanted to accept JSON that was syntactically well-formed,
not JSON that met certain semantic constraints.

If it does not meet these semantic constraints, then it is not
really JSON - it is merely JSON-like.

this sounds very much like MySQLs decision to support timestamp
-00-00 00:00 - syntactically correct, but semantically wrong.

Is it wrong?  The standard cited says SHOULD, not MUST.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Andrew Dunstan


On 03/08/2013 04:01 PM, Alvaro Herrera wrote:

Hannu Krosing escribió:

On 03/08/2013 09:39 PM, Robert Haas wrote:

On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com wrote:

In the spirit of being liberal about what we accept but strict about what we 
store, it seems to me that JSON object key uniqueness should be enforced either 
by throwing an error on duplicate keys, or by flattening so that the latest key 
wins (as happens in JavaScript). I realize that tracking keys will slow parsing 
down, and potentially make it more memory-intensive, but such is the price for 
correctness.

I'm with Andrew.  That's a rathole I emphatically don't want to go
down.  I wrote this code originally, and I had the thought clearly in
mind that I wanted to accept JSON that was syntactically well-formed,
not JSON that met certain semantic constraints.

If it does not meet these semantic constraints, then it is not
really JSON - it is merely JSON-like.

this sounds very much like MySQLs decision to support timestamp
-00-00 00:00 - syntactically correct, but semantically wrong.

Is it wrong?  The standard cited says SHOULD, not MUST.



Here's what rfc2119 says about that wording:

   4. SHOULD NOT This phrase, or the phrase NOT RECOMMENDED mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.


So we're allowed to do as Robert chose, and I think there are good 
reasons for doing so (apart from anything else, checking it would slow 
down the parser enormously).


Now you could argue that in that case the extractor functions should 
allow it too, and it's probably fairly easy to change them to allow it. 
In that case we need to decide who wins. We could treat a later field 
lexically as overriding an earlier field of the same name, which I think 
is what David expected. That's what plv8 does (i.e. it's how v8 
interprets JSON):


   andrew=# create or replace function jget(t json, fld text) returns
   text language plv8 as ' return t[fld]; ';
   CREATE FUNCTION
   andrew=# select jget('{f1:x,f1:y}','f1');
 jget
   --
 y
   (1 row)


Or you could take the view I originally took that in view of the RFC 
wording we should raise an error if this was found.


I can live with either view.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread David E. Wheeler
On Mar 8, 2013, at 1:21 PM, Andrew Dunstan and...@dunslane.net wrote:

 Here's what rfc2119 says about that wording:
 
   4. SHOULD NOT This phrase, or the phrase NOT RECOMMENDED mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.

I suspect this was allowed for the JavaScript behavior where multiple keys are 
allowed, but the last key in the list wins.

 So we're allowed to do as Robert chose, and I think there are good reasons 
 for doing so (apart from anything else, checking it would slow down the 
 parser enormously).

Yes, but the implications are going to start biting us on the ass now.

 Now you could argue that in that case the extractor functions should allow it 
 too, and it's probably fairly easy to change them to allow it. In that case 
 we need to decide who wins. We could treat a later field lexically as 
 overriding an earlier field of the same name, which I think is what David 
 expected. That's what plv8 does (i.e. it's how v8 interprets JSON):
 
   andrew=# create or replace function jget(t json, fld text) returns
   text language plv8 as ' return t[fld]; ';
   CREATE FUNCTION
   andrew=# select jget('{f1:x,f1:y}','f1');
 jget
   --
 y
   (1 row)
 
 
 Or you could take the view I originally took that in view of the RFC wording 
 we should raise an error if this was found.
 
 I can live with either view.

I’m on the fence. On the one hand, I like the plv8 behavior, which is nice for 
a dynamic language. On the other hand, I don't much care for it in my database, 
where I want data storage requirements to be quite strict. I hate the idea of 
-00-00 being allowed as a date, and am uncomfortable with allowing 
duplicate keys to be stored in the JSON data type.

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key wins and 
is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type things a 
value is valid while the accessor does not. They contradict one another.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Hannu Krosing

On 03/08/2013 10:01 PM, Alvaro Herrera wrote:

Hannu Krosing escribió:

On 03/08/2013 09:39 PM, Robert Haas wrote:

On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler da...@justatheory.com wrote:

In the spirit of being liberal about what we accept but strict about what we 
store, it seems to me that JSON object key uniqueness should be enforced either 
by throwing an error on duplicate keys, or by flattening so that the latest key 
wins (as happens in JavaScript). I realize that tracking keys will slow parsing 
down, and potentially make it more memory-intensive, but such is the price for 
correctness.

I'm with Andrew.  That's a rathole I emphatically don't want to go
down.  I wrote this code originally, and I had the thought clearly in
mind that I wanted to accept JSON that was syntactically well-formed,
not JSON that met certain semantic constraints.

If it does not meet these semantic constraints, then it is not
really JSON - it is merely JSON-like.

this sounds very much like MySQLs decision to support timestamp
-00-00 00:00 - syntactically correct, but semantically wrong.

Is it wrong?  The standard cited says SHOULD, not MUST.



I think one MAY start implementation with loose interpretation of
SHOULD, but if at all possible we SHOULD implement the
SHOULD-qualified features :)

http://www.ietf.org/rfc/rfc2119.txt:

SHOULD   This word, or the adjective RECOMMENDED, mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

We might start with just throwing a warning for duplicate keys, but I
can see no good reason to do so. Except ease of implementation and with
current JSON-AS-TEXT implenetation performance.

And providing a boolean function is_really_json_object(json) to be used in check
constraints seems plain weird .

Otoh, as the spec defines JSON as being designed to be a subset of javascript,
it SHOULD accept select '{foo: 1, foo: 2}'::json; but turn it into
'{foo: 2}'::json; for storage.

I do not think it would be a good idea to leave it to data extraction
functions to always get the last value for foo in this case 2

--
Hannu






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Andrew Dunstan


On 03/08/2013 04:28 PM, David E. Wheeler wrote:

On Mar 8, 2013, at 1:21 PM, Andrew Dunstan and...@dunslane.net wrote:


Here's what rfc2119 says about that wording:

   4. SHOULD NOT This phrase, or the phrase NOT RECOMMENDED mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.

I suspect this was allowed for the JavaScript behavior where multiple keys are 
allowed, but the last key in the list wins.


So we're allowed to do as Robert chose, and I think there are good reasons for 
doing so (apart from anything else, checking it would slow down the parser 
enormously).

Yes, but the implications are going to start biting us on the ass now.


Now you could argue that in that case the extractor functions should allow it 
too, and it's probably fairly easy to change them to allow it. In that case we 
need to decide who wins. We could treat a later field lexically as overriding 
an earlier field of the same name, which I think is what David expected. That's 
what plv8 does (i.e. it's how v8 interprets JSON):

   andrew=# create or replace function jget(t json, fld text) returns
   text language plv8 as ' return t[fld]; ';
   CREATE FUNCTION
   andrew=# select jget('{f1:x,f1:y}','f1');
 jget
   --
 y
   (1 row)


Or you could take the view I originally took that in view of the RFC wording we 
should raise an error if this was found.

I can live with either view.

I’m on the fence. On the one hand, I like the plv8 behavior, which is nice for a dynamic 
language. On the other hand, I don't much care for it in my database, where I want data 
storage requirements to be quite strict. I hate the idea of -00-00 being 
allowed as a date, and am uncomfortable with allowing duplicate keys to be stored in the 
JSON data type.

So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key wins and 
is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type things a 
value is valid while the accessor does not. They contradict one another.





You can forget 1. We are not going to have the parser collapse anything. 
Either the JSON it gets is valid or it's not. But the parser isn't going 
to try to MAKE it valid.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Hannu Krosing

On 03/08/2013 10:42 PM, Andrew Dunstan wrote:


On 03/08/2013 04:28 PM, David E. Wheeler wrote:

On Mar 8, 2013, at 1:21 PM, Andrew Dunstan and...@dunslane.net wrote:


Here's what rfc2119 says about that wording:

   4. SHOULD NOT This phrase, or the phrase NOT RECOMMENDED mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.
I suspect this was allowed for the JavaScript behavior where multiple 
keys are allowed, but the last key in the list wins.


So we're allowed to do as Robert chose, and I think there are good 
reasons for doing so (apart from anything else, checking it would 
slow down the parser enormously).

Yes, but the implications are going to start biting us on the ass now.

Now you could argue that in that case the extractor functions should 
allow it too, and it's probably fairly easy to change them to allow 
it. In that case we need to decide who wins. We could treat a later 
field lexically as overriding an earlier field of the same name, 
which I think is what David expected. That's what plv8 does (i.e. 
it's how v8 interprets JSON):


   andrew=# create or replace function jget(t json, fld text) returns
   text language plv8 as ' return t[fld]; ';
   CREATE FUNCTION
   andrew=# select jget('{f1:x,f1:y}','f1');
 jget
   --
 y
   (1 row)


Or you could take the view I originally took that in view of the RFC 
wording we should raise an error if this was found.


I can live with either view.
I’m on the fence. On the one hand, I like the plv8 behavior, which is 
nice for a dynamic language. On the other hand, I don't much care for 
it in my database, where I want data storage requirements to be quite 
strict. I hate the idea of -00-00 being allowed as a date, and 
am uncomfortable with allowing duplicate keys to be stored in the 
JSON data type.


So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key 
wins and is actually stored


2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that 
value


4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type 
things a value is valid while the accessor does not. They contradict 
one another.






You can forget 1. We are not going to have the parser collapse anything.
Either the JSON it gets is valid or it's not. But the parser isn't 
going to try to MAKE it valid.
Ok, so the make valid part will have to wait for 
http://www.pgcon.org/2013/schedule/events/518.en.html if this will ever 
happen ;)


Which means that all extractor functions will need to do much more work 
in case of complex json, think of


json_get('(a:{b:1},a:{1:x}, a:[1,{b:7}]}'::json, 
[a,1,b])


the true value in javascript is but here the get_json function has several
options to error out early and real confusion as to where to report the
error if in the end it is not found. essentially all extractor functions 
have

to do what we omitted doing at input time.

Cheers
Hannu


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Andrew Dunstan


On 03/08/2013 04:42 PM, Andrew Dunstan wrote:




So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key 
wins and is actually stored


2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that 
value


4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type 
things a value is valid while the accessor does not. They contradict 
one another.






You can forget 1. We are not going to have the parser collapse 
anything. Either the JSON it gets is valid or it's not. But the parser 
isn't going to try to MAKE it valid.



Actually, now I think more about it 3 is the best answer. Here's why: 
even the JSON generators can produce JSON with non-unique field names:


   andrew=# select row_to_json(q) from (select x as a, y as a from
   generate_series(1,2) x, generate_series(3,4) y) q;
  row_to_json
   ---
 {a:1,a:3}
 {a:1,a:4}
 {a:2,a:3}
 {a:2,a:4}


So I think we have no option but to say, in terms of rfc 2119, that we 
have careful considered and decided not to comply with the RFC's 
recommendation (and we should note that in the docs).


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Josh Berkus

 Actually, now I think more about it 3 is the best answer. Here's why:
 even the JSON generators can produce JSON with non-unique field names:

+1

Also, I think we should add a json_normalize() function to the TODO list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Hannu Krosing

On 03/08/2013 11:03 PM, Andrew Dunstan wrote:


On 03/08/2013 04:42 PM, Andrew Dunstan wrote:




So my order of preference for the options would be:

1. Have the JSON type collapse objects so the last instance of a key 
wins and is actually stored


2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return 
that value


4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type 
things a value is valid while the accessor does not. They contradict 
one another.






You can forget 1. We are not going to have the parser collapse 
anything. Either the JSON it gets is valid or it's not. But the 
parser isn't going to try to MAKE it valid.



Actually, now I think more about it 3 is the best answer.
Here's why: even the JSON generators can produce JSON with non-unique 
field names:

Yes, especially if you consider popular json generators vim and strcat() :)

It is not a serialisation of some existing object, but it is something
that JavaScript could interpret as valid subset of JavaScript which
producees a JavaScript Object when interpreted.
In this sense it is way better than MySQL timestamp -00-00 00:00

So the loose (without implementing the SHOULD part) meaning of
JSON spec is anything that can be read  into JavaScript producing
a JS Object and not serialisation of a JavaScript Object as I wanted
to read it initially.



   andrew=# select row_to_json(q) from (select x as a, y as a from
   generate_series(1,2) x, generate_series(3,4) y) q;
  row_to_json
   ---
 {a:1,a:3}
 {a:1,a:4}
 {a:2,a:3}
 {a:2,a:4}


So I think we have no option but to say, in terms of rfc 2119, that we 
have careful considered and decided not to comply with the RFC's 
recommendation
The downside is, that the we have just shifted the burden of JS Object 
generation to the getter functions.


I suspect that 99.98% of the time we will get valid and unique JS Object 
serializations or equivalent as input to json_in()


If we want the getter functions to handle the loose JSON to Object 
conversion

side  assuming our stored JSON can contain non-unique keys then these are
bound to be slower, as they have to do these checks. Thay can't just 
grab the first

matching one and return or recurse on that.


(and we should note that in the docs).

definitely +1


cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Andrew Dunstan


On 03/08/2013 06:37 PM, Hannu Krosing wrote:



I suspect that 99.98% of the time we will get valid and unique JS 
Object serializations or equivalent as input to json_in()


If we want the getter functions to handle the loose JSON to Object 
conversion

side  assuming our stored JSON can contain non-unique keys then these are
bound to be slower, as they have to do these checks. Thay can't just 
grab the first

matching one and return or recurse on that.



No, there will be no slowdown. The parser doesn't short circuit.

Read the code.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-08 Thread Noah Misch
On Fri, Mar 08, 2013 at 10:34:20PM +0100, Hannu Krosing wrote:
 On 03/08/2013 10:01 PM, Alvaro Herrera wrote:
 Hannu Krosing escribi?:
 If it does not meet these semantic constraints, then it is not
 really JSON - it is merely JSON-like.

 Is it wrong?  The standard cited says SHOULD, not MUST.


 I think one MAY start implementation with loose interpretation of
 SHOULD, but if at all possible we SHOULD implement the
 SHOULD-qualified features :)

 http://www.ietf.org/rfc/rfc2119.txt:

 SHOULD   This word, or the adjective RECOMMENDED, mean that there
 may exist valid reasons in particular circumstances to ignore a
 particular item, but the full implications must be understood and
 carefully weighed before choosing a different course.

That SHOULD in section 2.2 of RFC 4627 constrains JSON data, not JSON
parsers.  Section 4 addresses parsers, saying A JSON parser MUST accept all
texts that conform to the JSON grammar.  

 We might start with just throwing a warning for duplicate keys, but I
 can see no good reason to do so. Except ease of implementation and with
 current JSON-AS-TEXT implenetation performance.

Since its departure from a SHOULD item does not impugn the conformance of an
input text, it follows that json_in(), to be a conforming JSON parser, MUST
not reject objects with duplicate keys.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate JSON Object Keys

2013-03-07 Thread Andrew Dunstan


On 03/07/2013 02:48 PM, David E. Wheeler wrote:

This behavior surprised me a bit:

 david=# select '{foo: 1, foo: 2}'::json;
  json
 --
  {foo: 1, foo: 2}

I had expected something more like this:

 david=# select '{foo: 1, foo: 2}'::json;
 json
 
  {foo: 2}

This hasn’t been much of an issue before, but with Andrew’s JSON enhancements 
going in, it will start to cause problems:

 david=# select json_get('{foo: 1, foo: 2}', 'foo');
 ERROR:  field name is not unique in json object

Andrew tells me that the spec requires this. I think that’s fine, but I would 
rather that it never got to there.


Specifically, rfc4627 says (note last sentence):

   2.2.  Objects

   An object structure is represented as a pair of curly brackets
   surrounding zero or more name/value pairs (or members).  A name is a
   string.  A single colon comes after each name, separating the name
   from the value.  A single comma separates a value from a following
   name.  The names within an object SHOULD be unique.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers