Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan
On Wed, Apr 9, 2014 at 8:24 AM, Tom Lane  wrote:
>> Maybe we should make *neither* of these the default opclass, and give
>> *neither* the name json_ops.
>
> There's definitely something to be said for that.  Default opclasses are
> sensible when there's basically only one behavior that's interesting for
> most people.  We can already see that that's not going to be the case
> for jsonb indexes, at least not with the currently available alternatives.

I've heard worse ideas.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> One other point here is that non-default opclasses can't be used in
>> UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
>> specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
>> matter here since these aren't btree opclasses, but is there a
>> use-case for EXCLUDE with any of the supported jsonb operators?

> That sounds like an oversight that could better be fixed in EXCLUDE, no?

Well, there hasn't been a use-case up to now.  I'm not sure there's
one yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alvaro Herrera
Tom Lane wrote:

> > On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
> >  wrote:
> >> Both of the operator classes are actually much less flexible than I'd like.
> 
> > Maybe we should make *neither* of these the default opclass, and give
> > *neither* the name json_ops.

+1.  I was thinking the same thing after reading Heikki's rant.

> One other point here is that non-default opclasses can't be used in
> UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
> specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
> matter here since these aren't btree opclasses, but is there a
> use-case for EXCLUDE with any of the supported jsonb operators?

That sounds like an oversight that could better be fixed in EXCLUDE, no?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Tom Lane
Robert Haas  writes:
> On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
>  wrote:
>> Both of the operator classes are actually much less flexible than I'd like.

> Maybe we should make *neither* of these the default opclass, and give
> *neither* the name json_ops.

There's definitely something to be said for that.  Default opclasses are
sensible when there's basically only one behavior that's interesting for
most people.  We can already see that that's not going to be the case
for jsonb indexes, at least not with the currently available alternatives.

Not having a default would force users to make decisions explicitly.
Is that what we want?

One other point here is that non-default opclasses can't be used in
UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
matter here since these aren't btree opclasses, but is there a
use-case for EXCLUDE with any of the supported jsonb operators?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Robert Haas
On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
 wrote:
> Both of the operator classes are actually much less flexible than I'd like.
> Firstly, they index everything. In many cases, that's not what you want, so
> you end up with much larger indexes than necessary. Secondly, jsonb_ops
> indexes all values separately from the keys. That makes the index pretty
> much useless for a query on, say, WHERE json @> '{"needs_processing":true}',
> if all the rows also contain a key-value pair "active":true. Thirdly,
> inequality operators are not supported; you can't search for rows with (the
> json-syntax equivalent of) "price < 12.3". Fourthly, sometimes you would
> want to include the "path" to an entry in the key, sometimes not.

Maybe we should make *neither* of these the default opclass, and give
*neither* the name json_ops.

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

Yeah, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov
On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan  wrote:

> On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
>  wrote:
> > I didn't say that. On the contrary, I think the shotgun approach
> jsonb_ops
> > and jsonb_hash_ops take is too broad. It should be possible to specify
> what
> > to index in a more detailed fashion.
>
> It is - use an expression index. That's by far the most important way
> to specify what to index in a more detailed fashion. There are others,
> but that's the major one. Beyond that, yes, it's necessary to
> carefully write your query predicate a certain way. However, a similar
> situation exists in MongoDB, where there is a distinction between
> "Indexes on embedded fields" (which must be accessed using special
> "dot notation") and "indexes on subdocuments" (which cannot be
> accessed using "dot notation"). It's late here, but I'm pretty sure
> that's a feature and not a limitation.
>

I believe that serious limitation we now have is that we actually specify
kind of index to be used in the SQL query.
For example you need to find objects with active = true. You can write:

js @> {"active": true}

then GIN index on js can be used. Also you can write:

js->'active' = true

then btree expression index on (js->'active') can be used. For sure, one
can do

js @> {"active": true} AND js->'active' = true

This query can use any of indexes, but it is:
1) Cluge
2) Excess recheck
3) If both indexes present, excess "bitmap and".

Having to choose index in SQL-query we make our SQL more imperative and
less declarative. Similar things can happen without json/hstore (user have
to rewrite SQL in order to use expression index), but now it could become
very common. My opinion is that we have to do something in planner to make
it understand at least this two kinds of queries to be equivalent.

--
With best regards,
Alexander Korotkov.


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov
On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas  wrote:

> The ship has cleatly sailed to add parameterized opclasses to 9.4, but
> let's keep it in mind when we decide on the defaults.
>
> In the absence of parameterizable opclasses, it would be much more
> flexible to have opclasses that index, keys, values, key-value pairs and
> paths separately, instead of the current json_ops and json_hash_ops
> opclasses which index all of those in the same index. That way, if you only
> e.g. ever query on the existence of a key, you'd only need to index the
> keys.
>
> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...


+1 for parameterizable opclasses

--
With best regards,
Alexander Korotkov.


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan
On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
 wrote:
> I didn't say that. On the contrary, I think the shotgun approach jsonb_ops
> and jsonb_hash_ops take is too broad. It should be possible to specify what
> to index in a more detailed fashion.

It is - use an expression index. That's by far the most important way
to specify what to index in a more detailed fashion. There are others,
but that's the major one. Beyond that, yes, it's necessary to
carefully write your query predicate a certain way. However, a similar
situation exists in MongoDB, where there is a distinction between
"Indexes on embedded fields" (which must be accessed using special
"dot notation") and "indexes on subdocuments" (which cannot be
accessed using "dot notation"). It's late here, but I'm pretty sure
that's a feature and not a limitation.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Heikki Linnakangas

On 04/09/2014 10:40 AM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
 wrote:

As the code stands, you don't have a choice on any of those things. The
decisions have been made by us, PostgreSQL developers. The only choice you
have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
tradeoffs in both. Sure, they're still useful, if not optimal, for a
wide-range of applications. For more complicated cases, you will have to
resort to expression indexes. It bugs me greatly that the underlying indexam
could do all those things, we're just not exposing the capability.


Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested "tags" array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd "resort" to expression indexes, like
they're a kludge.


Expression indexes are definitely nice, but you have to be careful to 
formulate the query in exactly the same way to match the index.



Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?


I didn't say that. On the contrary, I think the shotgun approach 
jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to 
specify what to index in a more detailed fashion.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
 wrote:
> As the code stands, you don't have a choice on any of those things. The
> decisions have been made by us, PostgreSQL developers. The only choice you
> have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
> tradeoffs in both. Sure, they're still useful, if not optimal, for a
> wide-range of applications. For more complicated cases, you will have to
> resort to expression indexes. It bugs me greatly that the underlying indexam
> could do all those things, we're just not exposing the capability.

Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested "tags" array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd "resort" to expression indexes, like
they're a kludge. Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

That would be nice.

> In the absence of parameterizable opclasses, it would be much more flexible
> to have opclasses that index, keys, values, key-value pairs and paths
> separately, instead of the current json_ops and json_hash_ops opclasses
> which index all of those in the same index. That way, if you only e.g. ever
> query on the existence of a key, you'd only need to index the keys.

I think only ever needing to index the keys is not a common use-case.
It pretty much makes exactly as much sense to do so as it would with
hstore, and yet hstore doesn't support that after all these years.

> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...

It makes sense if you consider jsonb_ops best suited to simpler
hstore-style indexing, while jsonb_hash_ops is best suited to testing
containment of JSON documents, potentially with lots of nesting. These
documents are typically homogeneous in structure. Idiomatic usage of
systems like MongoDB involves "collections" of fairly homogeneous
documents. If there is a lot of variability in their structure within
a collection, the collection more or less becomes impossible to
usefully query. They aim to be flexible, but still implicitly require
you to insert data with a half-way sensible/consistent structure. This
makes separately indexing the keys less than compelling as a default,
because there is so much duplication of keys in practice.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Heikki Linnakangas

On 04/09/2014 01:18 AM, Andrew Dunstan wrote:


On 04/08/2014 05:57 PM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane  wrote:

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.


I tend to agree with Tom that POLA will be more violated by the default
ops class not being able to index some values.


Yeah.



Both of the operator classes are actually much less flexible than I'd 
like. Firstly, they index everything. In many cases, that's not what you 
want, so you end up with much larger indexes than necessary. Secondly, 
jsonb_ops indexes all values separately from the keys. That makes the 
index pretty much useless for a query on, say, WHERE json @> 
'{"needs_processing":true}', if all the rows also contain a key-value 
pair "active":true. Thirdly, inequality operators are not supported; you 
can't search for rows with (the json-syntax equivalent of) "price < 
12.3". Fourthly, sometimes you would want to include the "path" to an 
entry in the key, sometimes not.


If I understood correctly the way jsonb_hash_ops works, the limitation 
compared to jsonb_ops is that it cannot be used for foo ? 'bar' type 
queries. And the reason for that limitation is that it hashes the whole 
path to the key; the naked values are not indexes separately. But why 
not? jsonb_ops does - why is that decision related to whether you hash 
or not? Or it could index both. Sure, it would be wasteful when you 
don't need to support foo ? 'bar', but the point is that it should be up 
to the DBA to decide, based on his needs.


As the code stands, you don't have a choice on any of those things. The 
decisions have been made by us, PostgreSQL developers. The only choice 
you have is between jsonb_ops and jsonb_hash_ops, with a strange 
combination of tradeoffs in both. Sure, they're still useful, if not 
optimal, for a wide-range of applications. For more complicated cases, 
you will have to resort to expression indexes. It bugs me greatly that 
the underlying indexam could do all those things, we're just not 
exposing the capability.


ISTM we need a way to parameterize opclasses, so that when you create 
the index, you specify the above things.




The ship has cleatly sailed to add parameterized opclasses to 9.4, but 
let's keep it in mind when we decide on the defaults.


In the absence of parameterizable opclasses, it would be much more 
flexible to have opclasses that index, keys, values, key-value pairs and 
paths separately, instead of the current json_ops and json_hash_ops 
opclasses which index all of those in the same index. That way, if you 
only e.g. ever query on the existence of a key, you'd only need to index 
the keys.


I don't understand how we ended up with the current dichotomy of 
json_ops and json_hash_ops...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Tom Lane
Andrew Dunstan  writes:
> On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
>> ... I didn't propose changing the default due to
>> concerns about the POLA, but I'm happy to be told that those concerns
>> were out of proportion to the practical benefits of a different
>> default.

> I tend to agree with Tom that POLA will be more violated by the default 
> ops class not being able to index some values.

We should wait a bit longer to see if anyone objects, but assuming that
this represents the consensus opinion ...

ISTM that the name "jsonb_ops" should have pride of place as the default
jsonb opclass.  Therefore, if we make this change, jsonb_hash_ops needs to
be renamed to jsonb_ops, and we need a new name for what is now jsonb_ops.
I haven't paid attention to the technical details of the differences so
I have no idea what to suggest for the new name.  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Andrew Dunstan


On 04/08/2014 05:57 PM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane  wrote:

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.




I tend to agree with Tom that POLA will be more violated by the default 
ops class not being able to index some values.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Peter Geoghegan
On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane  wrote:
> Well, let me see if I understand the situation correctly:
>
> * jsonb_ops supports more operators
>
> * jsonb_hash_ops produces smaller, better-performing indexes
>
> * jsonb_ops falls over on inputs with wide field values, but
> jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane  wrote:
>> (BTW, wasn't there some discussion of changing our minds about which
>> one is the default?  We already have one bug report complaining about
>> jsonb_ops' size restriction, so that seems to be evidence in favor
>> of changing ...)

> Yes, there was. I very nearly came down on the side of making
> jsonb_hash_ops the default, but given that it doesn't make all
> operators indexable, I ultimately decided against supporting that
> course of action. I thought that that would be an odd limitation for
> the default GIN opclass to have. It was a very close call in my mind,
> and if you favor changing the default now, in light of the few
> complaints we've heard, I think that's a reasonable decision. That
> said, as I noted in the main -bugs thread, the case presented is
> fairly atypical.

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

If that's an accurate summary then I would say that we've got
the default backwards.  I would much rather tell people "you
can have more operators supported, but here are the tradeoffs"
than have a default that fails under evidently-not-so-improbable
cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers