SP-GIST opclass documentation)

Bruce Momjian Thu, 08 May 2014 06:48:11 -0700

On Tue, May  6, 2014 at 06:20:53PM -0400, Tom Lane wrote:
> "David E. Wheeler" <[email protected]> writes:
> > On May 6, 2014, at 2:20 PM, Bruce Momjian <[email protected]> wrote:
> >> Well, then, we only have a few days to come up with a name.
> 
> > What are the options?
> 
> We have no proposals as yet.
> 
> I've been looking at the source code to try to understand the difference
> between the two opclasses (and BTW I concur with the opinions expressed
> recently about the poor state of the internal documentation for jsonb).
> If I've got it straight:
> 
> jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
> would give rise to GIN entries that are effectively the strings "Kxyz"
> and "V2".  If you're looking for tuples containing "{xyz: 2}" then you
> would be looking for the AND of those independent index entries, which
> fortunately GIN is pretty good at computing.  But you could also look
> for just keys or just values.
> 
> jsonb_hash_ops creates an index entry only for values, but what it
> stores is a hash of both the value and the key it's stored under.
> So in this example you'd get a hash combining "xyz" and "2".  This
> means the only type of query you can perform is like "find JSON tuples
> containing {xyz: 2}".


Good summary, thanks.  This is the information I was hoping we had in
our docs.  How does hstore deal with these issues?

> Because jsonb_ops stores the *whole* value, you can do lossless index
> searches (no recheck needed on the heap tuple), but you also run the
> risk of long strings failing to fit into an index entry.  Since jsonb_ops
> reduces everything to a hash, there's no possibility of index failure,
> but all queries are lossy and require recheck.
> 
> TBH, at this point I'm sort of agreeing with the thought expressed
> upthread that maybe neither of these should be the default as-is.
> They seem like rather arbitrary combinations of choices.  In particular
> I wonder why there's not an option to store keys and values separately,
> but as hashes not as the original strings, so that indexability of
> everything could be guaranteed.  Or a variant of that might be to hash
> only strings that are too large to fit in an index entry, and force
> recheck only when searching for a string that needed hashing.
> 
> I wonder whether the most effective use of time at this point
> wouldn't be to fix jsonb_ops to do that, rather than arguing about
> what to rename it to.  If it didn't have the failure-for-long-strings
> problem I doubt anybody would be unhappy about making it the default.

Can we hash just the values, not the keys, in jsonb_ops, and hash the
combo in jsonb_hash_ops.  That would give us key-only lookups without a
recheck.

How do we index long strings now?  Is it he combination of GIN and long
strings that is the problem?

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Reply via email to