On 03/16/2014 04:10 AM, Peter Geoghegan wrote:
On Thu, Mar 13, 2014 at 2:00 PM, Andrew Dunstan <and...@dunslane.net> wrote:
I'll be travelling a good bit of tomorrow (Friday), but I hope Peter has
finished by the time I am back on deck late tomorrow and that I am able to
commit this on Saturday.
I asked Andrew to hold off on committing this today. It was agreed
that we weren't quite ready, because there were one or two remaining
bugs (since fixed), but also because I felt that it would be useful to
first hear the opinions of more people before proceeding. I think that
we're not that far from having something committed. Obviously I hope
to get this into 9.4, and attach a lot of strategic importance to
having the feature, which is why I made a large effort to help land
Attached patch has a number of notable revisions. Throughout, it has
been possible for anyone to follow our progress here:
* In general, the file jsonb_support.c (renamed to jsonb_utils.c) is
vastly better commented, and has a much clearer structure. This was
not something I did much with in the previous revision, and so it has
been a definite focus of this one.
* Hashing is refactored to not use CRC32 anymore. I felt this was a
questionable method of hashing, both within jsonb_hash(), as well as
the jsonb_hash_ops GIN operator class.
* Dead code elimination.
* I got around to fixing the memory leaks in B-Tree support function one.
* Andrew added hstore_to_jsonb, hstore_to_jsonb_loose functions and a
cast. One goal of this effort is to preserve a parallel set of
facilities for the json and jsonb types, and that includes
* A fix from Alexander for the jsonb_hash_ops @>operator issue I
complained about during the last submission was merged.
* There is no longer any GiST opclass. That just leaves B-Tree, hash,
GIN (default) and GIN jsonb_hash_ops opclasses.
My outstanding concerns are:
* Have we got things right with GIN indexing, containment semantics,
etc? See my remarks in the patch, by grepping "contain" within
jsonb_util.c. Is the GIN text storage serialization format appropriate
* General design concerns. By far the largest source of these is the
* Is the on-disk format that we propose to tie Postgres to as good as
it could be?
I've been working through all the changes and fixes that Peter and
others have made, and they look pretty good to me. There are a few
mostly cosmetic changes I want to make, but nothing that would be worth
holding up committing this for. I'm fairly keen to get this committed,
get some buildfarm coverage and get more people playing with it and
Like Peter, I would like to see more comments from people on the GIN
The one outstanding significant question of substance I have is this:
given the commit 5 days ago of provision for triConsistent functions for
GIN opclasses, should be be adding these to the two GIN opclasses we are
providing, and what should they look like? Again, this isn't an issue
that I think needs to hold up committing what we have now.
Regarding Peter's last question, if we're not satisfied with the on-disk
format proposed it would mean throwing the whole effort out and starting
again. The only thing I have thought of as an alternative would be to
store the structure and values separately rather than with values inline
with the structure. That way you could have a hash of values more or
less, which would eliminate redundancy of storage of things like object
field names. But such a structure might well involve at least as much
computational overhead as the current structure. And nobody's been
saying all along "hold on, we can do better than this." So I'm pretty
inclined to go with what we have.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: