Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 8:24 AM, Tom Lane wrote: >> Maybe we should make *neither* of these the default opclass, and give >> *neither* the name json_ops. > > There's definitely something to be said for that. Default opclasses are > sensible when there's basically only one behavior that's interesting for > most people. We can already see that that's not going to be the case > for jsonb indexes, at least not with the currently available alternatives. I've heard worse ideas. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Alvaro Herrera writes: > Tom Lane wrote: >> One other point here is that non-default opclasses can't be used in >> UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to >> specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't >> matter here since these aren't btree opclasses, but is there a >> use-case for EXCLUDE with any of the supported jsonb operators? > That sounds like an oversight that could better be fixed in EXCLUDE, no? Well, there hasn't been a use-case up to now. I'm not sure there's one yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Tom Lane wrote: > > On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas > > wrote: > >> Both of the operator classes are actually much less flexible than I'd like. > > > Maybe we should make *neither* of these the default opclass, and give > > *neither* the name json_ops. +1. I was thinking the same thing after reading Heikki's rant. > One other point here is that non-default opclasses can't be used in > UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to > specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't > matter here since these aren't btree opclasses, but is there a > use-case for EXCLUDE with any of the supported jsonb operators? That sounds like an oversight that could better be fixed in EXCLUDE, no? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Robert Haas writes: > On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas > wrote: >> Both of the operator classes are actually much less flexible than I'd like. > Maybe we should make *neither* of these the default opclass, and give > *neither* the name json_ops. There's definitely something to be said for that. Default opclasses are sensible when there's basically only one behavior that's interesting for most people. We can already see that that's not going to be the case for jsonb indexes, at least not with the currently available alternatives. Not having a default would force users to make decisions explicitly. Is that what we want? One other point here is that non-default opclasses can't be used in UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to specify an opclass name in those syntaxes. UNIQUE/PRIMARY KEY don't matter here since these aren't btree opclasses, but is there a use-case for EXCLUDE with any of the supported jsonb operators? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas wrote: > Both of the operator classes are actually much less flexible than I'd like. > Firstly, they index everything. In many cases, that's not what you want, so > you end up with much larger indexes than necessary. Secondly, jsonb_ops > indexes all values separately from the keys. That makes the index pretty > much useless for a query on, say, WHERE json @> '{"needs_processing":true}', > if all the rows also contain a key-value pair "active":true. Thirdly, > inequality operators are not supported; you can't search for rows with (the > json-syntax equivalent of) "price < 12.3". Fourthly, sometimes you would > want to include the "path" to an entry in the key, sometimes not. Maybe we should make *neither* of these the default opclass, and give *neither* the name json_ops. > ISTM we need a way to parameterize opclasses, so that when you create the > index, you specify the above things. Yeah, that would be great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan wrote: > On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas > wrote: > > I didn't say that. On the contrary, I think the shotgun approach > jsonb_ops > > and jsonb_hash_ops take is too broad. It should be possible to specify > what > > to index in a more detailed fashion. > > It is - use an expression index. That's by far the most important way > to specify what to index in a more detailed fashion. There are others, > but that's the major one. Beyond that, yes, it's necessary to > carefully write your query predicate a certain way. However, a similar > situation exists in MongoDB, where there is a distinction between > "Indexes on embedded fields" (which must be accessed using special > "dot notation") and "indexes on subdocuments" (which cannot be > accessed using "dot notation"). It's late here, but I'm pretty sure > that's a feature and not a limitation. > I believe that serious limitation we now have is that we actually specify kind of index to be used in the SQL query. For example you need to find objects with active = true. You can write: js @> {"active": true} then GIN index on js can be used. Also you can write: js->'active' = true then btree expression index on (js->'active') can be used. For sure, one can do js @> {"active": true} AND js->'active' = true This query can use any of indexes, but it is: 1) Cluge 2) Excess recheck 3) If both indexes present, excess "bitmap and". Having to choose index in SQL-query we make our SQL more imperative and less declarative. Similar things can happen without json/hstore (user have to rewrite SQL in order to use expression index), but now it could become very common. My opinion is that we have to do something in planner to make it understand at least this two kinds of queries to be equivalent. -- With best regards, Alexander Korotkov.
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas wrote: > The ship has cleatly sailed to add parameterized opclasses to 9.4, but > let's keep it in mind when we decide on the defaults. > > In the absence of parameterizable opclasses, it would be much more > flexible to have opclasses that index, keys, values, key-value pairs and > paths separately, instead of the current json_ops and json_hash_ops > opclasses which index all of those in the same index. That way, if you only > e.g. ever query on the existence of a key, you'd only need to index the > keys. > > I don't understand how we ended up with the current dichotomy of json_ops > and json_hash_ops... +1 for parameterizable opclasses -- With best regards, Alexander Korotkov.
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas wrote: > I didn't say that. On the contrary, I think the shotgun approach jsonb_ops > and jsonb_hash_ops take is too broad. It should be possible to specify what > to index in a more detailed fashion. It is - use an expression index. That's by far the most important way to specify what to index in a more detailed fashion. There are others, but that's the major one. Beyond that, yes, it's necessary to carefully write your query predicate a certain way. However, a similar situation exists in MongoDB, where there is a distinction between "Indexes on embedded fields" (which must be accessed using special "dot notation") and "indexes on subdocuments" (which cannot be accessed using "dot notation"). It's late here, but I'm pretty sure that's a feature and not a limitation. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/09/2014 10:40 AM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas wrote: As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability. Why would you ever not have to use expression indexes? Idiomatic usage of jsonb involves expression indexes because it's desirable to index only a expression. People will want to do things like only index the nested "tags" array far more frequently then they'll only want to index keys (that is, Object pair keys) in the entire document. I don't get why you'd say that they'd "resort" to expression indexes, like they're a kludge. Expression indexes are definitely nice, but you have to be careful to formulate the query in exactly the same way to match the index. Have you ever tried out one of the new document databases? I suggest you do. Expression indexes on jsonb map pretty closely onto how you're frequently expected to index data in those systems. That's something that they make heavy use of. Why would you ever not really have to consider ahead of time what is important enough to be indexed, and what is not? I didn't say that. On the contrary, I think the shotgun approach jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to specify what to index in a more detailed fashion. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas wrote: > As the code stands, you don't have a choice on any of those things. The > decisions have been made by us, PostgreSQL developers. The only choice you > have is between jsonb_ops and jsonb_hash_ops, with a strange combination of > tradeoffs in both. Sure, they're still useful, if not optimal, for a > wide-range of applications. For more complicated cases, you will have to > resort to expression indexes. It bugs me greatly that the underlying indexam > could do all those things, we're just not exposing the capability. Why would you ever not have to use expression indexes? Idiomatic usage of jsonb involves expression indexes because it's desirable to index only a expression. People will want to do things like only index the nested "tags" array far more frequently then they'll only want to index keys (that is, Object pair keys) in the entire document. I don't get why you'd say that they'd "resort" to expression indexes, like they're a kludge. Have you ever tried out one of the new document databases? I suggest you do. Expression indexes on jsonb map pretty closely onto how you're frequently expected to index data in those systems. That's something that they make heavy use of. Why would you ever not really have to consider ahead of time what is important enough to be indexed, and what is not? > ISTM we need a way to parameterize opclasses, so that when you create the > index, you specify the above things. That would be nice. > In the absence of parameterizable opclasses, it would be much more flexible > to have opclasses that index, keys, values, key-value pairs and paths > separately, instead of the current json_ops and json_hash_ops opclasses > which index all of those in the same index. That way, if you only e.g. ever > query on the existence of a key, you'd only need to index the keys. I think only ever needing to index the keys is not a common use-case. It pretty much makes exactly as much sense to do so as it would with hstore, and yet hstore doesn't support that after all these years. > I don't understand how we ended up with the current dichotomy of json_ops > and json_hash_ops... It makes sense if you consider jsonb_ops best suited to simpler hstore-style indexing, while jsonb_hash_ops is best suited to testing containment of JSON documents, potentially with lots of nesting. These documents are typically homogeneous in structure. Idiomatic usage of systems like MongoDB involves "collections" of fairly homogeneous documents. If there is a lot of variability in their structure within a collection, the collection more or less becomes impossible to usefully query. They aim to be flexible, but still implicitly require you to insert data with a half-way sensible/consistent structure. This makes separately indexing the keys less than compelling as a default, because there is so much duplication of keys in practice. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/09/2014 01:18 AM, Andrew Dunstan wrote: On 04/08/2014 05:57 PM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane wrote: Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. I tend to agree with Tom that POLA will be more violated by the default ops class not being able to index some values. Yeah. Both of the operator classes are actually much less flexible than I'd like. Firstly, they index everything. In many cases, that's not what you want, so you end up with much larger indexes than necessary. Secondly, jsonb_ops indexes all values separately from the keys. That makes the index pretty much useless for a query on, say, WHERE json @> '{"needs_processing":true}', if all the rows also contain a key-value pair "active":true. Thirdly, inequality operators are not supported; you can't search for rows with (the json-syntax equivalent of) "price < 12.3". Fourthly, sometimes you would want to include the "path" to an entry in the key, sometimes not. If I understood correctly the way jsonb_hash_ops works, the limitation compared to jsonb_ops is that it cannot be used for foo ? 'bar' type queries. And the reason for that limitation is that it hashes the whole path to the key; the naked values are not indexes separately. But why not? jsonb_ops does - why is that decision related to whether you hash or not? Or it could index both. Sure, it would be wasteful when you don't need to support foo ? 'bar', but the point is that it should be up to the DBA to decide, based on his needs. As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability. ISTM we need a way to parameterize opclasses, so that when you create the index, you specify the above things. The ship has cleatly sailed to add parameterized opclasses to 9.4, but let's keep it in mind when we decide on the defaults. In the absence of parameterizable opclasses, it would be much more flexible to have opclasses that index, keys, values, key-value pairs and paths separately, instead of the current json_ops and json_hash_ops opclasses which index all of those in the same index. That way, if you only e.g. ever query on the existence of a key, you'd only need to index the keys. I don't understand how we ended up with the current dichotomy of json_ops and json_hash_ops... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Andrew Dunstan writes: > On 04/08/2014 05:57 PM, Peter Geoghegan wrote: >> ... I didn't propose changing the default due to >> concerns about the POLA, but I'm happy to be told that those concerns >> were out of proportion to the practical benefits of a different >> default. > I tend to agree with Tom that POLA will be more violated by the default > ops class not being able to index some values. We should wait a bit longer to see if anyone objects, but assuming that this represents the consensus opinion ... ISTM that the name "jsonb_ops" should have pride of place as the default jsonb opclass. Therefore, if we make this change, jsonb_hash_ops needs to be renamed to jsonb_ops, and we need a new name for what is now jsonb_ops. I haven't paid attention to the technical details of the differences so I have no idea what to suggest for the new name. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On 04/08/2014 05:57 PM, Peter Geoghegan wrote: On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane wrote: Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. I tend to agree with Tom that POLA will be more violated by the default ops class not being able to index some values. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane wrote: > Well, let me see if I understand the situation correctly: > > * jsonb_ops supports more operators > > * jsonb_hash_ops produces smaller, better-performing indexes > > * jsonb_ops falls over on inputs with wide field values, but > jsonb_hash_ops does not There might be some compelling cases for indexing existence rather than containment, since the recheck flag isn't set there, but in general this summary seems sound. I would say that broadly, existence is a less useful operator than containment, and so jsonb_hash_ops is broadly more compelling. I didn't propose changing the default due to concerns about the POLA, but I'm happy to be told that those concerns were out of proportion to the practical benefits of a different default. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)
Peter Geoghegan writes: > On Tue, Apr 8, 2014 at 2:34 PM, Tom Lane wrote: >> (BTW, wasn't there some discussion of changing our minds about which >> one is the default? We already have one bug report complaining about >> jsonb_ops' size restriction, so that seems to be evidence in favor >> of changing ...) > Yes, there was. I very nearly came down on the side of making > jsonb_hash_ops the default, but given that it doesn't make all > operators indexable, I ultimately decided against supporting that > course of action. I thought that that would be an odd limitation for > the default GIN opclass to have. It was a very close call in my mind, > and if you favor changing the default now, in light of the few > complaints we've heard, I think that's a reasonable decision. That > said, as I noted in the main -bugs thread, the case presented is > fairly atypical. Well, let me see if I understand the situation correctly: * jsonb_ops supports more operators * jsonb_hash_ops produces smaller, better-performing indexes * jsonb_ops falls over on inputs with wide field values, but jsonb_hash_ops does not If that's an accurate summary then I would say that we've got the default backwards. I would much rather tell people "you can have more operators supported, but here are the tradeoffs" than have a default that fails under evidently-not-so-improbable cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers