On Tue, May 6, 2014 at 06:20:53PM -0400, Tom Lane wrote: > "David E. Wheeler" <da...@justatheory.com> writes: > > On May 6, 2014, at 2:20 PM, Bruce Momjian <br...@momjian.us> wrote: > >> Well, then, we only have a few days to come up with a name. > > > What are the options? > > We have no proposals as yet. > > I've been looking at the source code to try to understand the difference > between the two opclasses (and BTW I concur with the opinions expressed > recently about the poor state of the internal documentation for jsonb). > If I've got it straight: > > jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}" > would give rise to GIN entries that are effectively the strings "Kxyz" > and "V2". If you're looking for tuples containing "{xyz: 2}" then you > would be looking for the AND of those independent index entries, which > fortunately GIN is pretty good at computing. But you could also look > for just keys or just values. > > jsonb_hash_ops creates an index entry only for values, but what it > stores is a hash of both the value and the key it's stored under. > So in this example you'd get a hash combining "xyz" and "2". This > means the only type of query you can perform is like "find JSON tuples > containing {xyz: 2}".
Good summary, thanks. This is the information I was hoping we had in our docs. How does hstore deal with these issues? > Because jsonb_ops stores the *whole* value, you can do lossless index > searches (no recheck needed on the heap tuple), but you also run the > risk of long strings failing to fit into an index entry. Since jsonb_ops > reduces everything to a hash, there's no possibility of index failure, > but all queries are lossy and require recheck. > > TBH, at this point I'm sort of agreeing with the thought expressed > upthread that maybe neither of these should be the default as-is. > They seem like rather arbitrary combinations of choices. In particular > I wonder why there's not an option to store keys and values separately, > but as hashes not as the original strings, so that indexability of > everything could be guaranteed. Or a variant of that might be to hash > only strings that are too large to fit in an index entry, and force > recheck only when searching for a string that needed hashing. > > I wonder whether the most effective use of time at this point > wouldn't be to fix jsonb_ops to do that, rather than arguing about > what to rename it to. If it didn't have the failure-for-long-strings > problem I doubt anybody would be unhappy about making it the default. Can we hash just the values, not the keys, in jsonb_ops, and hash the combo in jsonb_hash_ops. That would give us key-only lookups without a recheck. How do we index long strings now? Is it he combination of GIN and long strings that is the problem? -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers