Re: [HACKERS] additional json functionality

Merlin Moncure Fri, 15 Nov 2013 12:52:50 -0800

On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan <and...@dunslane.net> wrote:
>
> On 11/15/2013 03:25 PM, Merlin Moncure wrote:
>>
>> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <da...@justatheory.com>
>> wrote:
>>>
>>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>>
>>>> Here are the options on the table:
>>>> 1) convert existing json type to binary flavor (notwithstanding
>>>> objections)
>>>> 2) maintain side by side types, one representing binary, one text.
>>>> unfortunately, i think the text one must get the name 'json' due to
>>>> unfortunate previous decision.
>>>> 3) merge the behaviors into a single type and get the best of both
>>>> worlds (as suggested upthread).
>>>>
>>>> I think we need to take a *very* hard look at #3 before exploring #1
>>>> or #2: Haven't through it through yet but it may be possible to handle
>>>> this in such a way that will be mostly transparent to the end user and
>>>> may have other benefits such as a faster path for serialization.
>>>
>>> If it’s possible to preserve order and still get the advantages of binary
>>> representation --- which are substantial (see
>>> http://theory.so/pg/2013/10/23/testing-nested-hstore/ and
>>> http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of
>>> examples) --- without undue maintenance overhead, then great.
>>>
>>> I am completely opposed to duplicate key preservation in JSON, though. It
>>> has caused us a fair number of headaches at $work.
>>
>> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
>>
>> *) 'json' type internally has a binary as well a text representation.
>> The text representation is basically the current type behavior
>> (duduplicated unordered).  The binary representation is the hstore-ish
>> variant.  The text mode is discarded when it's deemed no longer
>> appropriate to be needed, and, once gone, can never be rebuilt as it
>> was.
>>
>> *) only the binary internal representation ever gets stored to disk
>> (or anything else).
>>
>> *) the text mode is preferred for output if it is there.  otherwise, a
>> deduplicated, reordered text representation is generated
>>
>> *) When literal text is casted to json, the binary structure is built
>> up and kept alongside binary mode.   So, if you went: 'select '{"a":
>> 1,     "a": 2}'::json', you'd get the same thing back.  (This is how
>> it works now.).  but, if you went: 'insert into foo select '{"a": 1,
>>    "a": 2}'::json returning *', you'd get {"a": 2} back essentially
>> (although technically that would be a kind of race).
>>
>> *) When the json is stored to table, the text representation gets
>> immediately discarded on the basis that it's no longer the true
>> representation of the data.
>>
>> *) Ditto when making any equality operation (not as sure on this point).
>>
>> *) Ditto when doing any operation that mutates the structure in any
>> way. the text representation is immutable except during serialization
>> and if it gets invalidated it gets destroyed.
>>
>> *) New API function: json_simplify(); or some such.  It reorders and
>> dedups from user's point of view (but really just kills off the text
>> representation)
>>
>> *) once the text mode is gone, you get basically the proposed 'hstore'
>> behavior.
>>
>> *) serialization functions are generally used in contexts that do not
>> store anything but get output as query data.  They create *only* the
>> text mode.  However, if the resultant json is stored anywhere, the
>> text mode is destroyed and replaced with binary variant.  This is both
>> a concession to the current behavior and an optimization of
>> 'serialization-in-query' for which I think the binary mode is pessimal
>> performance wise.  so, xxx_to_json serialization functions work
>> exactly as they do now which fixes my problem essentially.
>>
>> *) if you are unhappy with duplicates in the above, just get use to
>> calling  json_simpify() on the serialized json (or deal with in on the
>> client side).
>>
>> This is all pretty glossy, but maybe there is a way forward...
>>
>
>
> It's making my head hurt, to be honest, and it sounds like a recipe for
> years and years of inconsistencies and bugs.
>
> I don't want to have two types, but I think I'd probably rather have two
> clean types than this. I can't imagine it being remotely acceptable to have
> behaviour depend in whether or not something was ever stored, which is what
> this looks like.


Well, maybe so.  My main gripe with the 'two types' solutions is that:
1) current type is already in core (that is, not an extension). In
hindsight, I think this was a huge mistake.
2) current type has grabbed the 'json' type name and the 'json_xxx' API.
3) current type is getting used all over the place

'Two types' means that (AIUI) you can't mess around with the existing
API too much. And the new type (due out in 2016?) will be something of
a second citizen.  The ramifications of dealing with the bifurcation
is what makes *my* head hurt.  Every day the json stuff is getting
more and more widely adopted.  9.4 isn't going to drop until 2014 best
case and it won't be widely deployed in the enterprise until 2015 and
beyond.  So you're going to have a huge code base operating on the
'legacy' json type.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

Reply via email to