subject:"\\\[HACKERS\\\] jsonb and nested hstore"

On 13 Březen 2014, 23:39, Peter Geoghegan wrote:
 On Thu, Mar 13, 2014 at 2:21 AM, Greg Stark st...@mit.edu wrote:
 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?

 Personally, I don't think it's a good idea to change the default. I
 have yet to be convinced that if you hit the GIN limitation it's an
 indication of anything other than that you need to reconsider your
 indexing choices (how often have we heard that complaint of GIN before
 in practice?). Even if you don't hit the limitation directly, with

I've never used GIN with anything else than values that built-in full-text
(tsvector), pg_trgm or points, and I suspect that's the case with most
other users. All those types have naturally limited size (e.g. words
tend to have very limited length, unless you're Maori, but even there the
longest name is just 85 characters [1]).

The only place in (core|contrib) where I'd expect this kind of issues is
probably intarray, but it's arguably less frequently used than
tsvector/pg_trgm for example.

So ISTM this is the main reason why we don't see more complaints about the
GIN size limit. I expect that to change with json + index all approach.

 something like jsonb_hash_ops you're still hashing a large nested
 structure, very probably uselessly. Are you really going to look for
 an exact match to an elaborate nested structure? I would think,
 probably not.

What I find (very) useful is queries that look like this:

   SELECT if FROM json_table WHERE json_value @ '{a : {b : {c : 3}}}';

or (without the @ operator) like this:

   SELECT if FROM json_table WHERE json_value # ARRAY['a', 'b', 'c'] = '3';

or something like that ...

 Now, as Alexander says, there might be a role for another
 (jsonb_hash_ops) opclass that separately indexes values only. I still
 think that by far the simplest solution is to use expressional
 indexes, because we index key values and array element values
 indifferently. Of course, nothing we have here precludes the
 development of such an opclass.

Maybe. I don't have much insight into ho GIN works / what is possible. But
I think we should avoid having large number of opclasses, each supporting
a small fraction of use cases. If we could keep the two we have right now,
that'd be nice.

regards
Tomas

[1] http://en.wikipedia.org/wiki/List_of_long_place_names




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Oleg Bartunov

VODKA index will have no lenght limitation.

On Fri, Mar 14, 2014 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 13 Březen 2014, 23:39, Peter Geoghegan wrote:
 On Thu, Mar 13, 2014 at 2:21 AM, Greg Stark st...@mit.edu wrote:
 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?

 Personally, I don't think it's a good idea to change the default. I
 have yet to be convinced that if you hit the GIN limitation it's an
 indication of anything other than that you need to reconsider your
 indexing choices (how often have we heard that complaint of GIN before
 in practice?). Even if you don't hit the limitation directly, with

 I've never used GIN with anything else than values that built-in full-text
 (tsvector), pg_trgm or points, and I suspect that's the case with most
 other users. All those types have naturally limited size (e.g. words
 tend to have very limited length, unless you're Maori, but even there the
 longest name is just 85 characters [1]).

 The only place in (core|contrib) where I'd expect this kind of issues is
 probably intarray, but it's arguably less frequently used than
 tsvector/pg_trgm for example.

 So ISTM this is the main reason why we don't see more complaints about the
 GIN size limit. I expect that to change with json + index all approach.

 something like jsonb_hash_ops you're still hashing a large nested
 structure, very probably uselessly. Are you really going to look for
 an exact match to an elaborate nested structure? I would think,
 probably not.

 What I find (very) useful is queries that look like this:

SELECT if FROM json_table WHERE json_value @ '{a : {b : {c : 3}}}';

 or (without the @ operator) like this:

SELECT if FROM json_table WHERE json_value # ARRAY['a', 'b', 'c'] = '3';

 or something like that ...

 Now, as Alexander says, there might be a role for another
 (jsonb_hash_ops) opclass that separately indexes values only. I still
 think that by far the simplest solution is to use expressional
 indexes, because we index key values and array element values
 indifferently. Of course, nothing we have here precludes the
 development of such an opclass.

 Maybe. I don't have much insight into ho GIN works / what is possible. But
 I think we should avoid having large number of opclasses, each supporting
 a small fraction of use cases. If we could keep the two we have right now,
 that'd be nice.

 regards
 Tomas

 [1] http://en.wikipedia.org/wiki/List_of_long_place_names




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Josh Berkus

On 03/14/2014 04:52 AM, Oleg Bartunov wrote:
 VODKA index will have no lenght limitation.

Yeah, so I think we go with what we have, and tell people if you're
hitting these length issues, wait for 9.5, where they will be fixed.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Oleg Bartunov

9.5 may too optimistic :)

On Fri, Mar 14, 2014 at 11:18 PM, Josh Berkus j...@agliodbs.com wrote:
 On 03/14/2014 04:52 AM, Oleg Bartunov wrote:
 VODKA index will have no lenght limitation.

 Yeah, so I think we go with what we have, and tell people if you're
 hitting these length issues, wait for 9.5, where they will be fixed.


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Gavin Flower


On 15/03/14 08:45, Oleg Bartunov wrote:

9.5 may too optimistic :)

On Fri, Mar 14, 2014 at 11:18 PM, Josh Berkus j...@agliodbs.com wrote:

On 03/14/2014 04:52 AM, Oleg Bartunov wrote:

VODKA index will have no lenght limitation.

Yeah, so I think we go with what we have, and tell people if you're
hitting these length issues, wait for 9.5, where they will be fixed.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


No tell them to wait for Postgres 12.3.42 - the version that is totally 
bug free  implements parallel processing of individual queries!  :-)


(With apologies to Douglas Adams)


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Josh Berkus

On 03/14/2014 12:45 PM, Oleg Bartunov wrote:
 9.5 may too optimistic :)

Nonsense, you, Teodor and Alexander are geniuses.  It can't possibly
take you more than a year.  ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 14.3.2014 20:18, Josh Berkus wrote:
 On 03/14/2014 04:52 AM, Oleg Bartunov wrote:
 VODKA index will have no lenght limitation.
 
 Yeah, so I think we go with what we have, and tell people if you're 
 hitting these length issues, wait for 9.5, where they will be
 fixed.

VODKA may be great, but I haven't seen a single line of code for that
yet. And given the response from Oleg, 9.5 seems ambitious.

I'm not awfully familiar with the GIN code, but based on Alexander's
feedback I presume fixing the GIN length limit (or rather removing it,
as it's a feature, not a bug) is quite straightforward. Why not to at
least consider that for 9.4, unless it turns more complex than expected?

Don't get me wrong - I'm aware it's quite late in the last commitfest,
and if it's deemed unacceptable / endandering 9.4 release, I'm not going
to say a word. But if it's a simple patch ...

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Fri, Mar 14, 2014 at 2:21 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm not awfully familiar with the GIN code, but based on Alexander's
 feedback I presume fixing the GIN length limit (or rather removing it,
 as it's a feature, not a bug) is quite straightforward. Why not to at
 least consider that for 9.4, unless it turns more complex than expected?

Alexander said nothing about removing that limitation, or if he did I
missed it. Which, as I said, I don't consider to be much of a
limitation, because indexing the whole nested value doesn't mean it
can satisfy a query on some more nested subset of an indexed value
datum (i.e. a value in the sense of a value in a key/value pair).

Alexander mentioned just indexing keys (object keys, or equivalently
array elements at the jsonb level), which is a reasonable thing, but
can be worked on later. I don't have much interest in working on
making it possible to index elaborate nested values in key/value
pairs, which is what you're suggesting if I've understood correctly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

For the benefit of anyone that would like to try the patch out, I make
available a custom format dump of some delicious sample data. I can
query the sample data as follows on my local installation:

[local]/jsondata=# select count(*) from delicious ;
  count
-
 1079399
(1 row)

[local]/jsondata=# \dt+ delicious
 List of relations
 Schema |   Name| Type  | Owner |  Size   | Description
+---+---+---+-+-
 public | delicious | table | pg| 1174 MB |
(1 row)

It's available from:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/jsondata.dump

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 14.3.2014 22:54, Peter Geoghegan wrote:
On Fri, Mar 14, 2014 at 2:21 PM, Tomas Vondra t...@fuzzy.cz wrote:
I'm not awfully familiar with the GIN code, but based on Alexander's
feedback I presume fixing the GIN length limit (or rather removing it,
as it's a feature, not a bug) is quite straightforward. Why not to at
least consider that for 9.4, unless it turns more complex than expected?

Alexander said nothing about removing that limitation, or if he did I
missed it. Which, as I said, I don't consider to be much of a

Sure he did, see this:

http://www.postgresql.org/message-id/capphfds4xmg5zop+1ctrrqnm6wxhh2a7j11nnjeosa76uow...@mail.gmail.com

Although it doesn't mention how complex change it would be.

limitation, because indexing the whole nested value doesn't mean it
can satisfy a query on some more nested subset of an indexed value
datum (i.e. a value in the sense of a value in a key/value pair).

OK, I'm getting lost in the nested stuff. The trouble I'm running into
are rather unlerated to nesting. For example indexing this fails if the
string is sufficiently long (~1350B if random, more if compressible).

{key : ... string ...}

How's that related to nesting?

Anyway, I'm not talking about exact matches on subtrees. I'm talking
about queries like this:

SELECT doc FROM delicious
WHERE doc @ '{title_detail : {value : TheaterMania}}';

which does exactly the same thing like this query:

SELECT doc FROM delicious
WHERE doc-'title_detail'-'value' = 'TheaterMania';

Except that the first query can use a GIN index created like this:

CREATE INDEX delicious_idx ON delicious USING GIN (doc);

while the latter does sequential scan. It can use a GiST index too, but
it takes 140ms with GiST and only ~0.3ms with GIN. Big difference.

Alexander mentioned just indexing keys (object keys, or equivalently
array elements at the jsonb level), which is a reasonable thing, but
can be worked on later. I don't have much interest in working on
making it possible to index elaborate nested values in key/value
pairs, which is what you're suggesting if I've understood correctly.

I never asked for indexing elaborate nested values in key/value pairs.
All I'm asking for is indexing of json values containing long strings.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 14.3.2014 23:06, Peter Geoghegan wrote:
 For the benefit of anyone that would like to try the patch out, I make
 available a custom format dump of some delicious sample data. I can
 query the sample data as follows on my local installation:
 
 [local]/jsondata=# select count(*) from delicious ;
   count
 -
  1079399
 (1 row)
 
 [local]/jsondata=# \dt+ delicious
  List of relations
  Schema |   Name| Type  | Owner |  Size   | Description
 +---+---+---+-+-
  public | delicious | table | pg| 1174 MB |
 (1 row)
 
 It's available from:
 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/jsondata.dump

Thanks.

I've been doing some simple queries on this dataset and ISTM there's a
memory leak somewhere in the json code (i.e. something is probably using
a wrong memory context), because this query:

  SELECT doc-'title_detail'-'value', COUNT(*)
FROM delicious GROUP BY 1;

results in this:

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND


 8231 tomas 20   0 5987520 4,645g   6136 R  95,4 60,4   0:37.54
postgres: tomas delicious [local]

I have shared_buffers=1GB and work_mem=64MB, so 4.6GB seems a bit too
much 4.6GB. Actually it grows even further, and then OOM jumps in and
kills the backend like this:

[ 9227.318998] Out of memory: Kill process 8159 (postgres) score 595
   or sacrifice child
[ 9227.319000] Killed process 8159 (postgres) total-vm:5920272kB,
   anon-rss:4791568kB, file-rss:6192kB

I'm on commit a3115f0d, which is just 2 days old, so I suppose this was
not fixed yet.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Andres Freund

On 2014-03-14 22:21:18 +0100, Tomas Vondra wrote:
 Don't get me wrong - I'm aware it's quite late in the last commitfest,
 and if it's deemed unacceptable / endandering 9.4 release, I'm not going
 to say a word. But if it's a simple patch ...

IMNSHO there's no bloody chance for such an addition at this point of
the cycle.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Greg Stark

On Fri, Mar 14, 2014 at 9:21 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm not awfully familiar with the GIN code, but based on Alexander's
 feedback I presume fixing the GIN length limit (or rather removing it,
 as it's a feature, not a bug) is quite straightforward. Why not to at
 least consider that for 9.4, unless it turns more complex than expected?

 Don't get me wrong - I'm aware it's quite late in the last commitfest,
 and if it's deemed unacceptable / endandering 9.4 release, I'm not going
 to say a word. But if it's a simple patch ...

Well I think the bigger picture is that the cases were we're getting
this error it's because we're expecting too much from the GIN opclass.
It's trying to index entire json objects as individual values which
isn't really very useful. We're unlikely to go querying for rows where
the value of a given key is a specific json object.

As I understand it Peter's right that in its current form the GIN
opclass is only useful if you use it on an expression index on
specific pieces of your json which are traditional non-nested hash
tables. Or I suppose if you're really only concerned with the ?
operator which looks for keys, which is pretty common too.

I had in mind that the GIN opclass would do something clever like
decompose the json into all the path-value tuples so I could do
arbitrary path lookups for values. That might be possible in the
future but it's not what we have today and what we have today is
already better than hstore. I think we're better off committing this
and moving forward with the contrib hstore2 wrapper which uses this
infrastructure so people have a migration path.

I don't think Josh is right to say it'll be fixed in 9.5. It'll be
better in 9.5 because we have ambitious plans to continue improving
in this direction. But it'll be even better in 9.6 and better again in
9.7. It'll never be fixed.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Fri, Mar 14, 2014 at 5:10 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm on commit a3115f0d, which is just 2 days old, so I suppose this was
 not fixed yet.

Try merging the feature branch now, which will get you commit 16923d,
which you're missing. That was an open item for a while, which I only
got around to fixing a few days ago.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 15.3.2014 02:03, Greg Stark wrote:
 On Fri, Mar 14, 2014 at 9:21 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm not awfully familiar with the GIN code, but based on Alexander's
 feedback I presume fixing the GIN length limit (or rather removing it,
 as it's a feature, not a bug) is quite straightforward. Why not to at
 least consider that for 9.4, unless it turns more complex than expected?

 Don't get me wrong - I'm aware it's quite late in the last commitfest,
 and if it's deemed unacceptable / endandering 9.4 release, I'm not going
 to say a word. But if it's a simple patch ...
 
 Well I think the bigger picture is that the cases were we're getting 
 this error it's because we're expecting too much from the GIN
 opclass. It's trying to index entire json objects as individual
 values which isn't really very useful. We're unlikely to go querying
 for rows where the value of a given key is a specific json object.

Stupid question - so if I have a json like this:

  { a : { b : c}}

the GIN code indexes {b : c} as a single value? And then takes c
and indexes it as a single value too?

Because otherwise I don't understand how the index could be used for
queries with @ '{a : {b : c}}' conditions (i.e. path [a,b] with
value c).

H, if that's how it works, removing the size limit would be
certainly more difficult than I thought.


 As I understand it Peter's right that in its current form the GIN
 opclass is only useful if you use it on an expression index on
 specific pieces of your json which are traditional non-nested hash
 tables. Or I suppose if you're really only concerned with the ?
 operator which looks for keys, which is pretty common too.

Well, depends on how you define useful. With the sample dataset
'delicious' (see Peter's post) I can do this:

   SELECT doc FROM delicious
  WHERE doc @ '{title_detail : {value : TheaterMania}}';

with arbitrary paths, and I may create a GIN index to support such
queries. And yes, it's much faster than GiST for example (by a factor of
1000).

Yes, the GIN index is quite large (~560MB for a ~1.2GB table).


 I had in mind that the GIN opclass would do something clever like
 decompose the json into all the path-value tuples so I could do
 arbitrary path lookups for values. That might be possible in the
 future but it's not what we have today and what we have today is
 already better than hstore. I think we're better off committing this
 and moving forward with the contrib hstore2 wrapper which uses this
 infrastructure so people have a migration path.

Yes, it's better than hstore - no doubt about that. The hierarchy and
data types are great, and hstore has the same size limitation.

 I don't think Josh is right to say it'll be fixed in 9.5. It'll be 
 better in 9.5 because we have ambitious plans to continue
 improving in this direction. But it'll be even better in 9.6 and
 better again in 9.7. It'll never be fixed.

I don't dare to say what will be in 9.5 (not even thinking about the
following versions).

Assuming the GIN will remain for 9.4 as it is now (both opclasses), it
would be nice if we could improve this in 9.5. I can live with custom
opclasses in an extension, if there are some ...

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-14 Thread Josh Berkus

On 03/14/2014 06:44 PM, Tomas Vondra wrote:
 Stupid question - so if I have a json like this:

Not a stupid question, actually.   In fact, I expect to answer it 400 or
500 times over the lifespan of 9.4.

   { a : { b : c}}
 
 the GIN code indexes {b : c} as a single value? And then takes c
 and indexes it as a single value too?

I don't know that c is indexed separately.

 Because otherwise I don't understand how the index could be used for
 queries with @ '{a : {b : c}}' conditions (i.e. path [a,b] with
 value c).
 
 H, if that's how it works, removing the size limit would be
 certainly more difficult than I thought.

Precisely.  Hence, the Russian plans for VODKA.

 Well, depends on how you define useful. With the sample dataset
 'delicious' (see Peter's post) I can do this:
 
SELECT doc FROM delicious
   WHERE doc @ '{title_detail : {value : TheaterMania}}';
 
 with arbitrary paths, and I may create a GIN index to support such
 queries. And yes, it's much faster than GiST for example (by a factor of
 1000).
 
 Yes, the GIN index is quite large (~560MB for a ~1.2GB table).

State of the art, actually.  In MongoDB, the indexes are frequently
several times larger than the raw data.  So if ours are 50% the size,
we're doing pretty good.

On 15.3.2014 02:03, Greg Stark wrote:
 I don't think Josh is right to say it'll be fixed in 9.5. It'll be 
 better in 9.5 because we have ambitious plans to continue
 improving in this direction. But it'll be even better in 9.6 and
 better again in 9.7. It'll never be fixed.

Oh, no doubt.  The important thing is that 9.4 will significantly
broaden the class of applications for which our JSON support is useful,
and allow us to remain relevant to an increasingly NoSQLish developer
base.  We're both showing progress and delivering features which are
actually useful, even if they still have major limitations.

Plus, you know, those features are useful to *me*, so I'm keen on them
personally.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Fri, Mar 14, 2014 at 6:44 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Well, depends on how you define useful. With the sample dataset
 'delicious' (see Peter's post) I can do this:

SELECT doc FROM delicious
   WHERE doc @ '{title_detail : {value : TheaterMania}}';

 with arbitrary paths, and I may create a GIN index to support such
 queries. And yes, it's much faster than GiST for example (by a factor of
 1000).

If you know ahead of time the entire nested value you can. So, if you
attach some other data to the TheaterMania document, you had better
know that too if you hope to write a query like this. You also have to
index the entire table, where presumably with a little thought you
could get away with a much smaller index. That strikes me as not very
useful.

 Yes, the GIN index is quite large (~560MB for a ~1.2GB table).

With the default opclass, without an expressional index, 100% of the
data from the table appears in the index. Why do you think that's
quite large?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Bruce Momjian

On Wed, Mar 12, 2014 at 01:58:14PM -0700, Peter Geoghegan wrote:
 The use case you describe here doesn't sound like something similar to
 full text search. It sounds like something identical.
 
 In any case, let's focus on what we have right now. I think that the
 indexing facilities proposed here are solid. In any case they do not
 preclude working on better indexing strategies as the need emerges.

Keep in mind that if we ship an index format, we are going to have
trouble changing the layout because of pg_upgrade.  pg_upgrade can mark
the indexes as invalid and force users to reindex, but that is less than
idea.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Thu, Mar 13, 2014 at 6:15 AM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Mar 12, 2014 at 01:58:14PM -0700, Peter Geoghegan wrote:
 The use case you describe here doesn't sound like something similar to
 full text search. It sounds like something identical.

 In any case, let's focus on what we have right now. I think that the
 indexing facilities proposed here are solid. In any case they do not
 preclude working on better indexing strategies as the need emerges.

 Keep in mind that if we ship an index format, we are going to have
 trouble changing the layout because of pg_upgrade.  pg_upgrade can mark
 the indexes as invalid and force users to reindex, but that is less than
 idea.

Well these are just normal gin and gist indexes. If we want to come up
with new index operator classess we can still do that and keep the old
ones if necessary. Even that seems pretty unlikely from past experience.

I'm actually pretty sanguine even about keeping the GIST opclass. If
it has bugs then the bugs only affect people who use this non-default
opclass and we can fix them. It doesn't risk questioning any basic
design choices in the patch.

It does sound like the main question here is which opclass should be
the default. From the discussion there's a jsonb_hash_ops which works
on all input values but supports fewer operators and a jsonb_ops which
supports more operators but can't handle json with larger individual
elements. Perhaps it's better to make jsonb_hash_ops the default so at
least it's always safe to create a default gin index?
-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

Fwiw I have a few questions -- but beware, I'm a complete neophyte
when it comes to jsonb style document databases so these are more
likely to represent misconceptions on my part than problems with
jsonb.

I naively though a gin index on a jsonb would help with queries like
WHERE col-'prop' = 'val'. In fact it only seems to help with WHERE
col ? 'prop'. To help with the former it looks like I need an
expression index on col-'prop'  is that right? There doesn't seem to
be an operator that combines both a dereference and value test into a
single operator so I don't think our index machinery can deal with
this. Or am I supposed to use contains and construct a json object for
the test?

I also find it awkward that col-'prop' returns the json
representation of the property. If it's text that means it's
double-quoted. I would think that a user storing text in a json
property would want a way to pull out the text that json property
represents so he doesn't have to write col-'prop' = 'foo' and
doesn't need to strip the quotes (and de-escape the string?) before
displaying the value or passing it through other apis.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Alexander Korotkov

On Thu, Mar 13, 2014 at 1:21 PM, Greg Stark st...@mit.edu wrote:

 Well these are just normal gin and gist indexes. If we want to come up
 with new index operator classess we can still do that and keep the old
 ones if necessary. Even that seems pretty unlikely from past experience.

 I'm actually pretty sanguine even about keeping the GIST opclass. If
 it has bugs then the bugs only affect people who use this non-default
 opclass and we can fix them. It doesn't risk questioning any basic
 design choices in the patch.

 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?


A couple of thoughts from me:
1) We can evade length limitation if GIN index by truncating long values
and setting recheck flag. We can introduce some indicator of truncated
value like zero byte at the end.
2) jsonb_hash_ops can be extended to handle keys queries too. We can
preserve one bit in hash as flag indicating whether it's a hash of key or
hash of path to value. For sure, such index would be a bit larger. Also,
jsonb_hash_ops can be split into two: with and without keys.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Oleg Bartunov

On Thu, Mar 13, 2014 at 4:21 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Thu, Mar 13, 2014 at 1:21 PM, Greg Stark st...@mit.edu wrote:

 Well these are just normal gin and gist indexes. If we want to come up
 with new index operator classess we can still do that and keep the old
 ones if necessary. Even that seems pretty unlikely from past experience.

 I'm actually pretty sanguine even about keeping the GIST opclass. If
 it has bugs then the bugs only affect people who use this non-default
 opclass and we can fix them. It doesn't risk questioning any basic
 design choices in the patch.

 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?


 A couple of thoughts from me:
 1) We can evade length limitation if GIN index by truncating long values and
 setting recheck flag. We can introduce some indicator of truncated value
 like zero byte at the end.
 2) jsonb_hash_ops can be extended to handle keys queries too. We can
 preserve one bit in hash as flag indicating whether it's a hash of key or
 hash of path to value. For sure, such index would be a bit larger. Also,
 jsonb_hash_ops can be split into two: with and without keys.

That's right ! Should we do these now, that's the question.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

Fwiw the jsonb data doesn't actually seem to be any smaller than text
json on this data set (this is avg(pg_column_size(col)) and I checked,
they're both using the same amount of toast space)

 jsonb | json
---+---
 813.5 | 716.3
(1 row)

It's still more than 7x faster in cpu costs though:

stark=# select count(attrs-'properties'-'STREET') from citylots;
 count

 196507
(1 row)

Time: 1026.678 ms

stark=# select count(attrs-'properties'-'STREET') from citylots_json;
 count

 196507
(1 row)

Time: 7418.010 ms


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Andrew Dunstan



On 03/13/2014 06:53 AM, Greg Stark wrote:


I also find it awkward that col-'prop' returns the json
representation of the property. If it's text that means it's
double-quoted. I would think that a user storing text in a json
property would want a way to pull out the text that json property
represents so he doesn't have to write col-'prop' = 'foo' and
doesn't need to strip the quotes (and de-escape the string?) before
displaying the value or passing it through other apis.




- returns dequoted text if the value it points to is a plain string. 
If it's not doing that then that's a bug.


   andrew=# select jsonb '{a:the string}' - 'a';
   ?column?
   --
 the string
   (1 row)

   andrew=# select jsonb '{a:the string}' - 'a'
   ;
  ?column?
   
 the string
   (1 row)



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Andrew Dunstan



On 03/13/2014 08:42 AM, Greg Stark wrote:

Fwiw the jsonb data doesn't actually seem to be any smaller than text
json on this data set (this is avg(pg_column_size(col)) and I checked,
they're both using the same amount of toast space)

  jsonb | json
---+---
  813.5 | 716.3
(1 row)



That's expected, you save on whitespace, quotes and punctuation and 
spend on structural overhead (e.g. string lengths). The actual strings 
stored are the virtally the same. Numbers are stored as numerics, which 
might or might not be longer. Nulls and booleans are about a wash.





It's still more than 7x faster in cpu costs though:

stark=# select count(attrs-'properties'-'STREET') from citylots;
  count

  196507
(1 row)

Time: 1026.678 ms

stark=# select count(attrs-'properties'-'STREET') from citylots_json;
  count

  196507
(1 row)

Time: 7418.010 ms





That's also expected, it's one of the major benefits. With jsonb you're 
avoiding reparsing the json.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Thu, Mar 13, 2014 at 1:08 PM, Andrew Dunstan and...@dunslane.net wrote:
 - returns dequoted text if the value it points to is a plain string. If
 it's not doing that then that's a bug.

Sorry, I must have gotten confused between various tests. It does seem
to be doing that.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

Another question. Is Peter's branch up to date with
jsonb_populate_record() ? From discussions on list it sounds like the
plan was to get rid of the use_json_as_text argument but his patch
still has it.

(Tangentially, I wonder if it wouldn't be possible to make this a
plain cast. I'm not sure but I think it's possible to have a cast to a
polymorphic type and peek at runtime at the record definition to
determine what to do).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Andrew Dunstan



On 03/13/2014 10:49 AM, Greg Stark wrote:

Another question. Is Peter's branch up to date with
jsonb_populate_record() ? From discussions on list it sounds like the
plan was to get rid of the use_json_as_text argument but his patch
still has it.


Yes, we're not changing that, and some people like it anyway. The API is 
intentionally the same as the legacy json_populate_record API.





(Tangentially, I wonder if it wouldn't be possible to make this a
plain cast. I'm not sure but I think it's possible to have a cast to a
polymorphic type and peek at runtime at the record definition to
determine what to do).




If you can simplify it be my guest.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Tomas Vondra

On 13.3.2014 13:28, Oleg Bartunov wrote:
 On Thu, Mar 13, 2014 at 4:21 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 On Thu, Mar 13, 2014 at 1:21 PM, Greg Stark st...@mit.edu wrote:

 Well these are just normal gin and gist indexes. If we want to come up
 with new index operator classess we can still do that and keep the old
 ones if necessary. Even that seems pretty unlikely from past experience.

 I'm actually pretty sanguine even about keeping the GIST opclass. If
 it has bugs then the bugs only affect people who use this non-default
 opclass and we can fix them. It doesn't risk questioning any basic
 design choices in the patch.

 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?


 A couple of thoughts from me:
 1) We can evade length limitation if GIN index by truncating long values and
 setting recheck flag. We can introduce some indicator of truncated value
 like zero byte at the end.
 2) jsonb_hash_ops can be extended to handle keys queries too. We can
 preserve one bit in hash as flag indicating whether it's a hash of key or
 hash of path to value. For sure, such index would be a bit larger. Also,
 jsonb_hash_ops can be split into two: with and without keys.
 
 That's right ! Should we do these now, that's the question.

Yeah, those are basically the two solutions I proposed a few messages
back in this thread. I'm pleased I haven't proposed a complete nonsense.

The question whether do that now or wait for 9.5 is a tough one. Doing
both for 9.4 is certainly stretching the commitfest to it's limits :-(

My impression is that while (2) means rather significant implementation
changes in jsonb_hash_ops, (1) is rather straightforward. Is that
correct (e.g. how's the truncation going to work with arrays?).

If that's true, I'd like propose doing (1) for 9.4 and leaving (2) to
9.5. I'm ready to spend non-trivial amount of time testing the changes
required in (1).

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Merlin Moncure

On Mon, Mar 10, 2014 at 4:18 AM, Peter Geoghegan p...@heroku.com wrote:
 * Extensive additional documentation. References to the very new JSON
 RFC. I think that this revision is in general a lot more coherent, and
 I found that reflecting on what idiomatic usage should look like while
 writing the documentation brought clarity to my thoughts on how the
 code should be structured. The documentation is worth a read if you
 want to get a better sense of what the patch is about relatively
 quickly.

The attached documentation is excellent -- wow.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Peter Geoghegan

On Thu, Mar 13, 2014 at 2:21 AM, Greg Stark st...@mit.edu wrote:
 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?

Personally, I don't think it's a good idea to change the default. I
have yet to be convinced that if you hit the GIN limitation it's an
indication of anything other than that you need to reconsider your
indexing choices (how often have we heard that complaint of GIN before
in practice?). Even if you don't hit the limitation directly, with
something like jsonb_hash_ops you're still hashing a large nested
structure, very probably uselessly. Are you really going to look for
an exact match to an elaborate nested structure? I would think,
probably not.

Now, as Alexander says, there might be a role for another
(jsonb_hash_ops) opclass that separately indexes values only. I still
think that by far the simplest solution is to use expressional
indexes, because we index key values and array element values
indifferently. Of course, nothing we have here precludes the
development of such an opclass.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 12 Březen 2014, 0:41, Peter Geoghegan wrote:
 On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra t...@fuzzy.cz wrote:
   ERROR:  index row size 1416 exceeds maximum 1352 for index gin_idx

 All index AMs have similar restrictions.

Yes, I know and I have no problem with restrictions in general. You may
run into similar issues with btree indexes on text columns with long text,
for example. The thing is that people don't generally index text directly,
because it usually does not make much sense, but using tsvector etc.

But with jsonb it's more likely because indexing is one of the goodies (at
least for me). And the discussions with several people interested in
storing json data I had recently went often like this:

me: It seems we'll have a better json datatype in 9.4.
them: Nice!
me: And it will be possible to do searches on arbitrary keys.
them: Yay!
me: And we actually got pretty significant improvements in GIN indexes.
them: Awesome!
me: But the values you may index need to be less than ~1500B.
them: Bummer :-(
me: Well, you can use GIST then.

 A good example of such header is dkim-signature which basically
 contains the whole message digitally signed with DKIM. The signature
 tends to be long and non-compressible, thanks to the signature.

 I'm wondering what's the best way around this, because I suspect many
 new users (especially those attracted by jsonb and GIN improvements)
 will run into this. Maybe not immediately, but eventully they'll try to
 insert a jsonb with long value, and it will fail ...

 The jsonb_hash_ops operator class just stores a 32-bit integer hash
 value (it always sets the recheck flag, which only some of the other
 default GIN opclass' strategies do). It only supports containment, and
 not the full variety of operators that the default opclass supports,
 which is why it isn't the default. I think that in practice the
 general recommendation will be that when indexing at the top level,
 use jsonb_hash_ops. When indexing nested items, use the more flexible
 default GIN opclass. That seems like a pretty smart trade-off to me.

OK, I'll look into the jsonb_hash_ops - that sounds more or less like what
I was thinking about (and sure, storing hashes makes some operations
impossible to support).

The other thing I was thinking about is introducing some kind of upper
limit for the value length - e.g. index just the first 1kB, or something
like that. My experience is most values are way shorter, or actually
differ in the first 1kB, so this should allow most decisions to be made.
But I'm not really that familiar with how GIN works, so maybe this is
nonsense.

 The more I think about it, the more inclined I am to lose GiST support
 entirely for the time being. It lets us throw out about 700 lines of C
 code, which is a very significant fraction of the total, removes the
 one open bug, and removes the least understood part of the code. The
 GiST opclass is not particularly compelling for this.

I disagree with that. I see GiST as a simple fallback option for the cases
I described. I wasn't able to create a GIN index because of exceeding the
max item length, but GiST created just fine. It was considerably slower,
but it worked.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 12 Březen 2014, 0:51, Peter Geoghegan wrote:
 On Tue, Mar 11, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that in practice the
 general recommendation will be that when indexing at the top level,
 use jsonb_hash_ops. When indexing nested items, use the more flexible
 default GIN opclass. That seems like a pretty smart trade-off to me.

 By which I mean: index nested items using an expressional GIN index.

I'm still not sure how would that look. Does that mean I'd have to create
multiple GIN indexes - one for each possible key or something like that?
Can you give an example?

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Oleg Bartunov

Also, GiST index is faster for create/update operations. I really hope we will
improve jsonb indexing in the next one-two releases. For now I'd suggest people
index expressional indexes to index just interesting keys or use GiST.

On Wed, Mar 12, 2014 at 5:15 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 12 Březen 2014, 0:41, Peter Geoghegan wrote:
 On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra t...@fuzzy.cz wrote:
   ERROR:  index row size 1416 exceeds maximum 1352 for index gin_idx

 All index AMs have similar restrictions.

 Yes, I know and I have no problem with restrictions in general. You may
 run into similar issues with btree indexes on text columns with long text,
 for example. The thing is that people don't generally index text directly,
 because it usually does not make much sense, but using tsvector etc.

 But with jsonb it's more likely because indexing is one of the goodies (at
 least for me). And the discussions with several people interested in
 storing json data I had recently went often like this:

 me: It seems we'll have a better json datatype in 9.4.
 them: Nice!
 me: And it will be possible to do searches on arbitrary keys.
 them: Yay!
 me: And we actually got pretty significant improvements in GIN indexes.
 them: Awesome!
 me: But the values you may index need to be less than ~1500B.
 them: Bummer :-(
 me: Well, you can use GIST then.

 A good example of such header is dkim-signature which basically
 contains the whole message digitally signed with DKIM. The signature
 tends to be long and non-compressible, thanks to the signature.

 I'm wondering what's the best way around this, because I suspect many
 new users (especially those attracted by jsonb and GIN improvements)
 will run into this. Maybe not immediately, but eventully they'll try to
 insert a jsonb with long value, and it will fail ...

 The jsonb_hash_ops operator class just stores a 32-bit integer hash
 value (it always sets the recheck flag, which only some of the other
 default GIN opclass' strategies do). It only supports containment, and
 not the full variety of operators that the default opclass supports,
 which is why it isn't the default. I think that in practice the
 general recommendation will be that when indexing at the top level,
 use jsonb_hash_ops. When indexing nested items, use the more flexible
 default GIN opclass. That seems like a pretty smart trade-off to me.

 OK, I'll look into the jsonb_hash_ops - that sounds more or less like what
 I was thinking about (and sure, storing hashes makes some operations
 impossible to support).

 The other thing I was thinking about is introducing some kind of upper
 limit for the value length - e.g. index just the first 1kB, or something
 like that. My experience is most values are way shorter, or actually
 differ in the first 1kB, so this should allow most decisions to be made.
 But I'm not really that familiar with how GIN works, so maybe this is
 nonsense.

 The more I think about it, the more inclined I am to lose GiST support
 entirely for the time being. It lets us throw out about 700 lines of C
 code, which is a very significant fraction of the total, removes the
 one open bug, and removes the least understood part of the code. The
 GiST opclass is not particularly compelling for this.

 I disagree with that. I see GiST as a simple fallback option for the cases
 I described. I wasn't able to create a GIN index because of exceeding the
 max item length, but GiST created just fine. It was considerably slower,
 but it worked.

 Tomas



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Wed, Mar 12, 2014 at 6:20 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm still not sure how would that look. Does that mean I'd have to create
 multiple GIN indexes - one for each possible key or something like that?
 Can you give an example?

It could mean that you're obliged to create multiple indexes, yes. For
an example, and to get a better sense of what I mean, look at the
documentation in the patch.

The idea that you're going to create one index on a jsonb, and it's
going to be able to usefully index a lot of different queries doesn't
seem practical for most use-cases. Mostly, people will have fairly
homogeneous json documents, and they'll want to index certain nested
fields common to all or at least a large majority of those documents.

By indexing entire jsonb datums, do you hope to get much benefit out
of the indexed values (as opposed to keys) being stored (in serialized
form) in the GIN index? Because you *are* indexing a large nested
structure as a value. Is that large nested structure going to appear
in your query predicate, or are you just going to subscript the jsonb
to get to the level that's of interest to query that? I'm pretty sure
that people want the latter. Are you sure that your complaint isn't
just that the default GIN opclass indexes values (as distinct from
keys) that are large and unwieldy, and not terribly useful?

I don't think expressional indexes are some kind of unfortunate work
around for a jsonb limitation. I think that they're the natural way to
approach indexing a nested structure in Postgres. MongoDB, for
example, does not magically index everything. You're still required to
make choices about indexing that consider the access patterns.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote:
 Also, GiST index is faster for create/update operations. I really hope we will
 improve jsonb indexing in the next one-two releases. For now I'd suggest 
 people
 index expressional indexes to index just interesting keys or use GiST.

When do you ever want to index non-interesting keys?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Oleg Bartunov

On Thu, Mar 13, 2014 at 12:10 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote:
 Also, GiST index is faster for create/update operations. I really hope we 
 will
 improve jsonb indexing in the next one-two releases. For now I'd suggest 
 people
 index expressional indexes to index just interesting keys or use GiST.

 When do you ever want to index non-interesting keys?

Regular user may just index all keys.

I mean, that json can contains keys, which are not searched, so it's
not needed to index them and save index size. We probably could
provide option in CREATE INDEX to specify what to index and what not
index, but it require planner to know that information.



 --
 Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Andrew Dunstan



On 03/12/2014 04:10 PM, Peter Geoghegan wrote:

On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote:

Also, GiST index is faster for create/update operations. I really hope we will
improve jsonb indexing in the next one-two releases. For now I'd suggest people
index expressional indexes to index just interesting keys or use GiST.

When do you ever want to index non-interesting keys?




The problem is when do you know they are interesting?

One major use case for using treeish data types in the first place is 
that you don't know when you're designing the database exactly what 
shape the data will be. If you don't know that, then how are you 
supposed to know what in it will be interesting? It's somewhat analogous 
to full text indexing, where we don't know in advance what phrases or 
words will be interesting. Here, a key is the equivalent of a word and a 
key path or subpath is the equivalent of a phrase.


Maybe I'm dreaming, since I have no idea how to go about this sort of 
indexing, but it's where I'd like to see lots of effort.


I agree with Oleg that we need to be very creative about jsonb indexing. 
One of my hopes is that by going down the road we are on, we'll get much 
wider interest in this, and that both ideas and money might flow towards 
addressing it in a way that we probably wouldn't have seen otherwise.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Josh Berkus

Andrew, Peter:

Just so I'm clear on the limits here, lemme make sure I understand this:

a) GIN indexing is limited to ~~1500chars

b) The value, which includes everything other than the top level set
of keys, is one item as far as GIN is concerned.

Therefore: we are limited to indexing JSON where nothing below a
top-level key is more than 1500bytes?

I'm asking for documentation purposes.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Wed, Mar 12, 2014 at 1:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 One major use case for using treeish data types in the first place is that
 you don't know when you're designing the database exactly what shape the
 data will be. If you don't know that, then how are you supposed to know what
 in it will be interesting? It's somewhat analogous to full text indexing,
 where we don't know in advance what phrases or words will be interesting.
 Here, a key is the equivalent of a word and a key path or subpath is the
 equivalent of a phrase.

You don't know exactly how, but you have some idea. The major benefit
is that you can add new things to new documents as the need arises,
and that's not a big deal, nor does it require a migration with DDL.
If we continue to take MongoDB as representative of how people will
use jsonb, they pretty strongly encourage the idea that you have to
have some structure or design. Google mongodb schema design to see
what I mean - you'll find plenty. It has more to do with making
querying the data possible than anything else. There is a limited
amount you can do with a bunch of documents that share little in
common in terms of their structure - what does a query (that can use
an index just in principle) even look like there?

The use case you describe here doesn't sound like something similar to
full text search. It sounds like something identical.

In any case, let's focus on what we have right now. I think that the
indexing facilities proposed here are solid. In any case they do not
preclude working on better indexing strategies as the need emerges.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Andrew Dunstan



On 03/12/2014 04:58 PM, Peter Geoghegan wrote:

In any case, let's focus on what we have right now. I think that the
indexing facilities proposed here are solid. In any case they do not
preclude working on better indexing strategies as the need emerges.




I quite agree, didn't mean to suggest otherwise.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 12.3.2014 20:40, Peter Geoghegan wrote:
 On Wed, Mar 12, 2014 at 6:20 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I'm still not sure how would that look. Does that mean I'd have to
 create multiple GIN indexes - one for each possible key or
 something like that? Can you give an example?
 
 It could mean that you're obliged to create multiple indexes, yes.
 For an example, and to get a better sense of what I mean, look at
 the documentation in the patch.

OK, will do.

 The idea that you're going to create one index on a jsonb, and it's
 going to be able to usefully index a lot of different queries doesn't
 seem practical for most use-cases. Mostly, people will have fairly
 homogeneous json documents, and they'll want to index certain nested
 fields common to all or at least a large majority of those documents.

I think that's unfounded assumption. Many users actually have very
little control over the documents or queries - a nice example may be the
mail archive, with headers stored in a hstore/jsonb. I have absolutely
no control over the headers or queries.

But I think this is a feedback loop too - what if many users actually
want that functionality, but realize that expression indexes are not
sufficient for their needs and thus don't even try (and so we don't hear
about them)?

And my experience is that this is actualy one of the very cool hstore
features - being able to index the whole structure and then do arbitrary
queries over that.

The only reason why I'm looking at jsonb is that it the improved support
for data types (especially arrays).

So I have my doubts about the claims that users have homogenous
documents and only want to index some fields with expression indexes.

 By indexing entire jsonb datums, do you hope to get much benefit out
 of the indexed values (as opposed to keys) being stored (in serialized
 form) in the GIN index? Because you *are* indexing a large nested
 structure as a value. Is that large nested structure going to appear
 in your query predicate, or are you just going to subscript the jsonb
 to get to the level that's of interest to query that? I'm pretty sure
 that people want the latter. Are you sure that your complaint isn't
 just that the default GIN opclass indexes values (as distinct from
 keys) that are large and unwieldy, and not terribly useful?

No, I don't expect a large nested structure to appear in the query. And
I expect most people won't need that, although I can imagine queries  @
doing that (not sure if that checks for equality or 'subset').

But I'm not sure I understand how's this related to my original post?

All I was asking whether it wouldn't be enough to store a hash instead
of the original value, i.e. instead of this:

  {from : j...@example.com,
   to : j...@example.com,
   content-type : text/plain; charset=us-ascii,
   dkim-signature :  vry long value }

this

  {129812 : 29382,
   459821 : 1029381,
21083 : 102941,
   111390 : 129010292}

which would solve issues with the long values and might still support
the queries (with recheck, of course). I don't know if that's what
jsonb_hash_ops do or if it's even possible / compatible with GIN.

 I don't think expressional indexes are some kind of unfortunate work
 around for a jsonb limitation. I think that they're the natural way to
 approach indexing a nested structure in Postgres. MongoDB, for
 example, does not magically index everything. You're still required to
 make choices about indexing that consider the access patterns.

For many usecases, expressional indexes are the right tool. But not for
all and I see no reason to just throw some tools away.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 12.3.2014 21:58, Peter Geoghegan wrote:
 
 The use case you describe here doesn't sound like something similar to
 full text search. It sounds like something identical.

I think this very depends on the definition of full text search.

 In any case, let's focus on what we have right now. I think that the
 indexing facilities proposed here are solid. In any case they do not
 preclude working on better indexing strategies as the need emerges.

+1

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Wed, Mar 12, 2014 at 2:30 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I think that's unfounded assumption. Many users actually have very
 little control over the documents or queries - a nice example may be the
 mail archive, with headers stored in a hstore/jsonb. I have absolutely
 no control over the headers or queries.

Maybe, but what do you want me to do to help them? Indexing a typical
jsonb field is a bad idea, unless you really do want something
essentially equivalent to full text search (which could be justified),
or unless you know ahead of time that your documents are not going to
be heavily nested. The whole basis of your complaints seems to be that
people won't know that at all.

 For many usecases, expressional indexes are the right tool. But not for
 all and I see no reason to just throw some tools away.

If the tool you're talking about throwing away is the GiST opclass, I
do not propose to throw that away. I don't think it's important enough
to justify inclusion in our first cut at this, especially given the
fact that the code has bugs, and is quite a bit more complex than GIN.
What's wrong with those reasons?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On 12.3.2014 21:55, Josh Berkus wrote:
 Andrew, Peter:
 
 Just so I'm clear on the limits here, lemme make sure I understand this:
 
 a) GIN indexing is limited to ~~1500chars

The exact message I get is this:

ERROR: index row size 1944 exceeds maximum 1352 for index tmp_idx

so it's 1352B. But IIRC this is closely related to block size, so with
larger block sizes you'll get different limits. Also, this is a limit on
compressed value, which makes it less user-friendly as it's difficult to
predict whether the row is OK or not :-(

And I just discovered this:

  create table tmp (val jsonb);
  create index tmp_gin_idx on tmp using gin (val);
  insert into tmp
 select ('{z : ' || repeat('z', 100) || '}')::jsonb;

which tries to insert a well-compressible string ('z' repeated
1e6-times), and fails with this:

ERROR: index row requires 11472 bytes, maximum size is 8191

So I think it's quite difficult to give simple and exact explanation in
the docs, other than there are limits, but it's difficult to say when
you hit them.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-12 Thread Stephen Frost

* Tomas Vondra (t...@fuzzy.cz) wrote:
 So I think it's quite difficult to give simple and exact explanation in
 the docs, other than there are limits, but it's difficult to say when
 you hit them.

Arrays have more-or-less the same issue...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] jsonb and nested hstore

On 12.3.2014 22:43, Peter Geoghegan wrote:
 On Wed, Mar 12, 2014 at 2:30 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I think that's unfounded assumption. Many users actually have very
 little control over the documents or queries - a nice example may be the
 mail archive, with headers stored in a hstore/jsonb. I have absolutely
 no control over the headers or queries.
 
 Maybe, but what do you want me to do to help them? Indexing a
 typical jsonb field is a bad idea, unless you really do want
 something essentially equivalent to full text search (which could be
 justified), or unless you know ahead of time that your documents are
 not going to be heavily nested. The whole basis of your complaints
 seems to be that people won't know that at all.

Well, I would be quite happy with the GIN indexing without the limit I
ran into. I don't think we need to invent something entirely new.

You're right that the index is pretty futile with a condition matching
field/value combination. But what if I'm doing a query with multiple
such conditions, and the combination matches just a small fraction of
rows? GIN index works with that (and the patches from Alexander improve
this case tremendously, IIRC).

I still don't understand how's this similar to fulltext - that seems
pretty unsuitable for a treeish structure, assuming you can't flatten
it. Which you can't, if the queries use paths to access just parts of
the json value.

 For many usecases, expressional indexes are the right tool. But not for
 all and I see no reason to just throw some tools away.
 
 If the tool you're talking about throwing away is the GiST opclass, I
 do not propose to throw that away. I don't think it's important enough
 to justify inclusion in our first cut at this, especially given the
 fact that the code has bugs, and is quite a bit more complex than GIN.
 What's wrong with those reasons?

Meh, I accidentally mixed two responses :-/

I have no problem with expression indexes, but it's not a good solution
to all problems. I certainly can't use them to achieve what I'd like and
I disagree with your assumptions that it doesn't make sense to index
everything / non-interesting keys, or that the documents have
well-defined structure. I can live with larger / less efficient indexes
on all fields.

Regarding GiST - I understand your concerns about complexity, and you
may be right that not shipping it now is prefferable to shipping it with
bugs. The thing is it doesn't have issues with the value lengths, which
prevents me from using GIN, and although GiST is slower, it's at least
some indexing. But maybe jsonb_hash_ops will work, I haven't tried yet.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-11 Thread Alexander Korotkov

On Tue, Mar 11, 2014 at 5:19 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 4:19 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Here it is.

 So it looks like what you have here is analogous to the other problems
 that I fixed with both GiST and GIN. That isn't surprising, and this
 does fix my test-case. I'm not terribly happy about the lack of
 explanation for the hashing in that loop, though. Why use COMP_CRC32()
 at all, for one thing?

 Why do this for non-primitive jsonb hashing?

 COMP_CRC32(stack-hash_state, PATH_SEPARATOR, 1);

 Where PATH_SEPARATOR is:

 #define PATH_SEPARATOR (\0)

 Actually, come to think of it, why not just use one hashing function
 everywhere? i.e., jsonb_hash(PG_FUNCTION_ARGS)? It's already very
 similar. Pretty much every hash operator support function 1 (i.e. a
 particular type's hash function) is implemented with hash_any(). Can't
 we just do the same here? In any case it isn't obvious why the
 requirements for those two things (the hashing mechanism used by the
 jsonb_hash_ops GIN opclass, and the hash operator class support
 function 1 hash function) cannot be the same thing.


It's because CRC32 interface allows incremental calculation while hash_any
requires single chunk of memory. I don't think that unfolding everything is
good idea. But we could implement incremental interface for hash_any.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-11 Thread Tomas Vondra

Hi,

I've spent a few hours stress-testing this a bit - loading a mail
archive with ~1M of messages (with headers stored in a jsonb column) and
then doing queries on that. Good news - no crashes or any such issues so
far. The queries that I ran manually seem to return sane results.

The only problem I ran into is with limited index row size with GIN
indexes. I understand it's not a bug, but I admit I haven't realized I
might run into it in this case ...

The data I used for testing is just a bunch of e-mail messages, with
headers stored as jsonb, so each row has something like this in
headers column:

{
 from : John Doe j...@example.com,
 to : [Jane Doe j...@example.com, Jack Doe j...@example.com],
 cc : ...,
 bcc : ...,
 ... various other headers ...
}

The snag is that some of the header values may be very long, exceeding
the limit of 1352 bytes and causing errors like this:

  ERROR:  index row size 1416 exceeds maximum 1352 for index gin_idx

A good example of such header is dkim-signature which basically
contains the whole message digitally signed with DKIM. The signature
tends to be long and non-compressible, thanks to the signature.

I'm wondering what's the best way around this, because I suspect many
new users (especially those attracted by jsonb and GIN improvements)
will run into this. Maybe not immediately, but eventully they'll try to
insert a jsonb with long value, and it will fail ...

With btree indexes on text I would probably create an index on
substr(column,0,1000) or something like that, but doing that with JSON
seems a bit strange.

I assume we need to store the actual values in the GIN index (so a hash
is not sufficient), right?

GIST indexes work, but with that I have to give up the significant
performance gains that we got thanks to Alexander's GIN patches.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-11 Thread Peter Geoghegan

On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra t...@fuzzy.cz wrote:
   ERROR:  index row size 1416 exceeds maximum 1352 for index gin_idx

All index AMs have similar restrictions.

 A good example of such header is dkim-signature which basically
 contains the whole message digitally signed with DKIM. The signature
 tends to be long and non-compressible, thanks to the signature.

 I'm wondering what's the best way around this, because I suspect many
 new users (especially those attracted by jsonb and GIN improvements)
 will run into this. Maybe not immediately, but eventully they'll try to
 insert a jsonb with long value, and it will fail ...

The jsonb_hash_ops operator class just stores a 32-bit integer hash
value (it always sets the recheck flag, which only some of the other
default GIN opclass' strategies do). It only supports containment, and
not the full variety of operators that the default opclass supports,
which is why it isn't the default. I think that in practice the
general recommendation will be that when indexing at the top level,
use jsonb_hash_ops. When indexing nested items, use the more flexible
default GIN opclass. That seems like a pretty smart trade-off to me.

The more I think about it, the more inclined I am to lose GiST support
entirely for the time being. It lets us throw out about 700 lines of C
code, which is a very significant fraction of the total, removes the
one open bug, and removes the least understood part of the code. The
GiST opclass is not particularly compelling for this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-11 Thread Peter Geoghegan

On Tue, Mar 11, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that in practice the
 general recommendation will be that when indexing at the top level,
 use jsonb_hash_ops. When indexing nested items, use the more flexible
 default GIN opclass. That seems like a pretty smart trade-off to me.

By which I mean: index nested items using an expressional GIN index.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 1:18 PM, Peter Geoghegan p...@heroku.com wrote:

 * The jsonb_hash_ops non-default GIN opclass is broken. It has its own
 idiosyncratic notion of what constitutes containment, that sees it
 only return, say, jsonb arrays that have a matching string as their
 leftmost element (if we ask it if it contains within it another
 array with the same string). Because of the limited number of
 indexable operators (only @), I'd put this opclass in the same
 category as GiST in terms of my willingness to forgo it for a release,
 even if it did receive a loud applause at pgConf.EU. Again, it might
 be some disparity between the opertors as they existed in hstore2 at
 one time, and as they exist in the core code now, but I doubt it, not
 least since the regression tests didn't pick this up, and it's such a
 basic thing. Perhaps Oleg and Teodor just need to explain this to me.


I din't get comment about leftmost element. There is absolutely no
distinguish between array elements. All elements are extracted into same
keys independent of their indexes. It seems to have no change since I wrote
hstore_hash_ops.  Could you share test case to illustrate what you mean?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

On Mon, Mar 10, 2014 at 3:00 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 I din't get comment about leftmost element. There is absolutely no
 distinguish between array elements. All elements are extracted into same
 keys independent of their indexes. It seems to have no change since I wrote
 hstore_hash_ops.  Could you share test case to illustrate what you mean?

I don't have time to post that at the moment, but offhand I *think*
your confusion may be due to the fact that the json_hash_ops opclass
(as I call it) was previously consistent with the behavior of the
other GIN opclass (the default). The problem is that they (well, at
least the default GIN and GiST opclasses) were inconsistent with how
the containment operator behaved in respect of jsonb array elements
generally.

Here is the commit on our feature branch where I fixed the problem for
the default GIN opclass:

https://github.com/feodor/postgres/commit/6f5e4fe9fc34f9512919b1c8b6a54952ab288640s

If it doesn't explain the problem, you may still wish to comment on
the correctness of this fix. I am still waiting on feedback from Oleg
and Teodor.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Mon, Mar 10, 2014 at 3:19 AM, Peter Geoghegan p...@heroku.com wrote:
 I don't have time to post that at the moment, but offhand I *think*
 your confusion may be due to the fact that the json_hash_ops opclass
 (as I call it) was previously consistent with the behavior of the
 other GIN opclass (the default). The problem is that they (well, at
 least the default GIN and GiST opclasses) were inconsistent with how
 the containment operator behaved in respect of jsonb array elements
 generally.

Sorry, I realize now that that must be incorrect. Still, please take a
look at the commit linked to.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Mon, Mar 10, 2014 at 3:21 AM, Peter Geoghegan p...@heroku.com wrote:
 Sorry, I realize now that that must be incorrect. Still, please take a
 look at the commit linked to.

To be clear, I mean that my explanation of why this was missed before
was incorrect, not my contention that it's a problem right now (for
whatever reason).

I fat-fingered the URL that linked to the GIN opclass bugfix commit:
https://github.com/feodor/postgres/commit/6f5e4fe9fc34f9512919b1c8b6a54952ab288640



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 2:19 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 3:00 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  I din't get comment about leftmost element. There is absolutely no
  distinguish between array elements. All elements are extracted into same
  keys independent of their indexes. It seems to have no change since I
 wrote
  hstore_hash_ops.  Could you share test case to illustrate what you mean?

 I don't have time to post that at the moment, but offhand I *think*
 your confusion may be due to the fact that the json_hash_ops opclass
 (as I call it) was previously consistent with the behavior of the
 other GIN opclass (the default). The problem is that they (well, at
 least the default GIN and GiST opclasses) were inconsistent with how
 the containment operator behaved in respect of jsonb array elements
 generally.

 Here is the commit on our feature branch where I fixed the problem for
 the default GIN opclass:


 https://github.com/feodor/postgres/commit/6f5e4fe9fc34f9512919b1c8b6a54952ab288640s

 If it doesn't explain the problem, you may still wish to comment on
 the correctness of this fix. I am still waiting on feedback from Oleg
 and Teodor.


Apparently, there is bug in calculation of hashes. Array elements were
hashed incrementally while each of them should be hashed separately. That
cause an effect of distinguishing array elements by their indexes. Not sure
about when this bug was added.
Fix is attached.

--
With best regards,
Alexander Korotkov.


jsonb-hash-ops-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Mon, Mar 10, 2014 at 3:47 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Fix is attached.

Could you post a patch with regression tests, please?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 3:47 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Fix is attached.

 Could you post a patch with regression tests, please?


Here it is.

--
With best regards,
Alexander Korotkov.


jsonb-hash-ops-fix-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Andrew Dunstan



On 03/10/2014 05:18 AM, Peter Geoghegan wrote:

On Fri, Mar 7, 2014 at 9:00 AM, Bruce Momjian br...@momjian.us wrote:

OK, it sounds like the adjustments are minimal, like not using the
high-order bit.

Attached patch is a refinement of the work of Oleg, Teodor and Andrew.
Revisions are mostly my own, although Andrew contributed too.

Changes include:

* Extensive relocation, and moderate restructuring of code. Many
comments added, while many existing comments were copy-edited. Nothing
remains in contrib. jsonb is a distinct, in-core type with no
user-visible relationship to hstore. There is no code dependency
between the two. The amount of code redundancy this turned out to
create (between jsonb and an unchanged hstore) is, in my estimation,
quite acceptable.

* B-Tree and hash operator classes for the core type are included. A
GiST operator class, and two GIN operator classes are also included.
Obviously this is where I spent most time by far.

* Everything else that was in hstore in the last revision (the
complement of the hstore2 opclasses) is removed entirely. The patch is
much smaller. If we just consider code (excluding tests and
documentation), the diffstat seems far more manageable:




Thanks for your work on this.

It's just occurred to me that we'll need to add hstore_to_jsonb 
functions and a cast to match the hstore_to_json functions and cast.


That should be fairly simple - I'll work on that. It need not hold up 
progress with what's in this patch.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Andrew Dunstan



On 03/10/2014 10:50 AM, Andrew Dunstan wrote:


Thanks for your work on this.

It's just occurred to me that we'll need to add hstore_to_jsonb 
functions and a cast to match the hstore_to_json functions and cast.


That should be fairly simple - I'll work on that. It need not hold up 
progress with what's in this patch.


Here's a patch sans docs for this, to be applied on top of Peter's 
patch. It's actually kinda useful as it demonstrates how non-jsonb code 
can construct jsonb values directy.


cheers

andrew

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index 43b7e5f..bf21c65 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,8 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
 	crc32.o
 
 EXTENSION = hstore
-DATA = hstore--1.2.sql hstore--1.1--1.2.sql hstore--1.0--1.1.sql \
+DATA = hstore--1.3.sql hstore--1.2--1.3.sql \
+	hstore--1.2.sql hstore--1.1--1.2.sql hstore--1.0--1.1.sql \
 	hstore--unpackaged--1.0.sql
 
 REGRESS = hstore
diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out
index 2114143..9749e45 100644
--- a/contrib/hstore/expected/hstore.out
+++ b/contrib/hstore/expected/hstore.out
@@ -1453,7 +1453,7 @@ select count(*) from testhstore where h = 'pos=98, line=371, node=CBA, indexe
  1
 (1 row)
 
--- json
+-- json and jsonb
 select hstore_to_json('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4');
  hstore_to_json  
 -
@@ -1472,6 +1472,24 @@ select hstore_to_json_loose('a key =1, b = t, c = null, d= 12345, e = 012
  {b: true, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
 (1 row)
 
+select hstore_to_jsonb('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4');
+ hstore_to_jsonb 
+-
+ {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
+(1 row)
+
+select cast( hstore  'a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4' as jsonb);
+  jsonb  
+-
+ {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
+(1 row)
+
+select hstore_to_jsonb_loose('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4');
+ hstore_to_jsonb_loose 
+---
+ {b: true, c: null, d: 12345, e: 012345, f: 1.234, g: 23450, a key: 1}
+(1 row)
+
 create table test_json_agg (f1 text, f2 hstore);
 insert into test_json_agg values ('rec1','a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4'),
('rec2','a key =2, b = f, c = null, d= -12345, e = 012345.6, f= -1.234, g= 0.345e-4');
diff --git a/contrib/hstore/hstore--1.2--1.3.sql b/contrib/hstore/hstore--1.2--1.3.sql
new file mode 100644
index 000..0a70560
--- /dev/null
+++ b/contrib/hstore/hstore--1.2--1.3.sql
@@ -0,0 +1,17 @@
+/* contrib/hstore/hstore--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use ALTER EXTENSION hstore UPDATE TO '1.3' to load this file. \quit
+
+CREATE FUNCTION hstore_to_jsonb(hstore)
+RETURNS jsonb
+AS 'MODULE_PATHNAME', 'hstore_to_jsonb'
+LANGUAGE C IMMUTABLE STRICT;
+
+CREATE CAST (hstore AS jsonb)
+  WITH FUNCTION hstore_to_jsonb(hstore);
+
+CREATE FUNCTION hstore_to_jsonb_loose(hstore)
+RETURNS jsonb
+AS 'MODULE_PATHNAME', 'hstore_to_jsonb_loose'
+LANGUAGE C IMMUTABLE STRICT;
diff --git a/contrib/hstore/hstore--1.3.sql b/contrib/hstore/hstore--1.3.sql
new file mode 100644
index 000..995ade1
--- /dev/null
+++ b/contrib/hstore/hstore--1.3.sql
@@ -0,0 +1,550 @@
+/* contrib/hstore/hstore--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION hstore to load this file. \quit
+
+CREATE TYPE hstore;
+
+CREATE FUNCTION hstore_in(cstring)
+RETURNS hstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION hstore_out(hstore)
+RETURNS cstring
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION hstore_recv(internal)
+RETURNS hstore
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION hstore_send(hstore)
+RETURNS bytea
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE TYPE hstore (
+INTERNALLENGTH = -1,
+INPUT = hstore_in,
+OUTPUT = hstore_out,
+RECEIVE = hstore_recv,
+SEND = hstore_send,
+STORAGE =

Re: [HACKERS] jsonb and nested hstore

On Mon, Mar 10, 2014 at 4:19 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Here it is.

So it looks like what you have here is analogous to the other problems
that I fixed with both GiST and GIN. That isn't surprising, and this
does fix my test-case. I'm not terribly happy about the lack of
explanation for the hashing in that loop, though. Why use COMP_CRC32()
at all, for one thing?

Why do this for non-primitive jsonb hashing?

COMP_CRC32(stack-hash_state, PATH_SEPARATOR, 1);

Where PATH_SEPARATOR is:

#define PATH_SEPARATOR (\0)

Actually, come to think of it, why not just use one hashing function
everywhere? i.e., jsonb_hash(PG_FUNCTION_ARGS)? It's already very
similar. Pretty much every hash operator support function 1 (i.e. a
particular type's hash function) is implemented with hash_any(). Can't
we just do the same here? In any case it isn't obvious why the
requirements for those two things (the hashing mechanism used by the
jsonb_hash_ops GIN opclass, and the hash operator class support
function 1 hash function) cannot be the same thing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Merlin Moncure

On Thu, Mar 6, 2014 at 10:33 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Mar  6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote:
 Hi there,

 Looks like consensus is done. I and Teodor are not happy with it, but
 what we can do :)   One thing I  want to do is to reserve our
 contribution to the flagship feature (jsonb), particularly, binary
 storage for nested structures and indexing. Their work was sponsored
 by Engine Yard.

 OK, if we are going with an unchanged hstore in contrib and a new JSONB,
 there is no reason to wack around JSONB to be binary compatible with the
 old hstore format.  What sacrifices did we need to make to have JSBONB
 be binary compatible with hstore, can those sacrifices be removed, and
 can that be done in time for 9.4?

Also,
*) what hstore2 features (if any) that are not already reflected in
the jsonb type are going to be moved to josnb for 9.4?
*) if the answer above is anything but 'nothing', what hstore-isms are
going to be adjusted in the process of doing so?  Presumably there
would be same function name changes to put them in the jsonb style but
also the hstore sytnax ('=') is going to be embedded in some of the
search operators and possibly other things.  Is that going change?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Andrew Dunstan



On 03/06/2014 11:33 PM, Bruce Momjian wrote:

On Thu, Mar  6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote:

Hi there,

Looks like consensus is done. I and Teodor are not happy with it, but
what we can do :)   One thing I  want to do is to reserve our
contribution to the flagship feature (jsonb), particularly, binary
storage for nested structures and indexing. Their work was sponsored
by Engine Yard.

OK, if we are going with an unchanged hstore in contrib and a new JSONB,
there is no reason to wack around JSONB to be binary compatible with the
old hstore format.  What sacrifices did we need to make to have JSBONB
be binary compatible with hstore, can those sacrifices be removed, and
can that be done in time for 9.4?




IIRC The sacrifice was one bit in the header (i.e. in the first int 
after the varlena header). We could now repurpose that (for example if 
we ever decided to use a new format).


Oleg and Teodor made most of the adjustments on the hstore(2) side (e.g. 
providing for scalar roots, providing for json typing of scalars so 
everything isn't just a string).


Can the architecture be changed? No. If we think it's not good enough we 
would have to kiss jsonb goodbye for 9.4 and go back to the drawing 
board. But I haven't seen any such suggestion from anyone who has been 
reviewing it (e.g. Andres or Peter).


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Bruce Momjian

On Fri, Mar  7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote:
 IIRC The sacrifice was one bit in the header (i.e. in the first int
 after the varlena header). We could now repurpose that (for example
 if we ever decided to use a new format).
 
 Oleg and Teodor made most of the adjustments on the hstore(2) side
 (e.g. providing for scalar roots, providing for json typing of
 scalars so everything isn't just a string).
 
 Can the architecture be changed? No. If we think it's not good
 enough we would have to kiss jsonb goodbye for 9.4 and go back to
 the drawing board. But I haven't seen any such suggestion from
 anyone who has been reviewing it (e.g. Andres or Peter).

We are going to be stuck with the JSONB binary format we ship in 9.4 so
I am asking if there are things we should do to improve it, now that we
know we don't need backward compatibility.

If they can be done for 9.4, great, if not, we have to decide if these
suboptimal cases are enough for us to delay the data type until 9.5.  I
don't know the answer, but I have to ask the question.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Andrew Dunstan



On 03/07/2014 11:45 AM, Bruce Momjian wrote:

On Fri, Mar  7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote:

IIRC The sacrifice was one bit in the header (i.e. in the first int
after the varlena header). We could now repurpose that (for example
if we ever decided to use a new format).

Oleg and Teodor made most of the adjustments on the hstore(2) side
(e.g. providing for scalar roots, providing for json typing of
scalars so everything isn't just a string).

Can the architecture be changed? No. If we think it's not good
enough we would have to kiss jsonb goodbye for 9.4 and go back to
the drawing board. But I haven't seen any such suggestion from
anyone who has been reviewing it (e.g. Andres or Peter).

We are going to be stuck with the JSONB binary format we ship in 9.4 so
I am asking if there are things we should do to improve it, now that we
know we don't need backward compatibility.

If they can be done for 9.4, great, if not, we have to decide if these
suboptimal cases are enough for us to delay the data type until 9.5.  I
don't know the answer, but I have to ask the question.



AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were 
essentially two ways of spelling the same data, the domains were 
virtually identical (hstore might have been a bit more liberal about 
numeric input).


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Bruce Momjian

On Fri, Mar  7, 2014 at 11:57:48AM -0500, Andrew Dunstan wrote:
 If they can be done for 9.4, great, if not, we have to decide if these
 suboptimal cases are enough for us to delay the data type until 9.5.  I
 don't know the answer, but I have to ask the question.
 
 
 AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were
 essentially two ways of spelling the same data, the domains were
 virtually identical (hstore might have been a bit more liberal about
 numeric input).

OK, it sounds like the adjustments are minimal, like not using the
high-order bit.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread David E. Wheeler

On Mar 6, 2014, at 1:51 AM, Peter Geoghegan p...@heroku.com wrote:

 It's true for perl. Syntax of hstore is close to hash/array syntax and it's
 easy serialize/deserialize hstore to/from perl. Syntax of hstore was
 inspired by perl.
 
 I understand that. There is a module on CPAN called Pg::hstore that
 will do this; it appears to have been around since 2011. I don't use
 Perl, so I don't know a lot about it. Perhaps David Wheeler has an
 opinion on the value of Perl-like syntax, as a long time Perl
 enthusiast?

HSTORE was inspired by the syntax of Perl hash declarations, but it is not 
compatible. Notably, HSTORE the HSTORE can have a value `NULL`, while in Perl 
hashes it’s `undef`. So you cannot simply `eval` an HSTORE to get a Perl hash 
unless you are certain there are no NULLs.

Besides, string eval in Perl is considered unsafe. Parsing is *much* safer.

 In any case, Perl has excellent support for JSON, just like every
 other language - you are at no particular advantage in Perl by having
 a format that happens to more closely resemble the format of Perl
 hashes and arrays. I really feel that we should concentrate our
 efforts on one standardized format here. It makes the effort to
 integrate your good work, in a way that makes it available to everyone
 so much easier.

I agree. I like HSTORE, but now that JSON is so standard (in fact, as of this 
week, a *real* standard! http://rfc7159.net/rfc7159), and its support is so 
much better than that of HSTORE, including in Perl, I believe that it should be 
priority over HSTORE. I’m happy if HSTORE has the same functionality as JSONB, 
but given the choice, all other things being equal, as a Perl hacker I will 
always choose JSONB.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Teodor Sigaev



Thank you for checking that.  Teodor's goal was that new-hstore be 100%
backwards-compatible with old-hstore.  If we're breaking APIs, then it


That's true. Binary format is fully compatible unless old hstore value has more 
than 2^28 key-value pairs (256 mln which is far from reachable by memory 
requirements). The single issue is a GiST index, GIN index should be recreated 
to utilize new features.



doesn't really work to force users to upgrade the type, no?

Teodor, are these output changes things that can be made consistent, or
do we need separate hstore and hstore2 datatypes?


Introducing types in hstore causes this incompatibility - but I don't think 
that's huge or even  big problem. In most cases application does quoting (sets 
1 instead of just 1) to preserve SQL-injection and to protect hstore-forbidden 
characters in hstore. Keys leaves untouched - it could be only a string.


That's possible to introduce GUC variable for i/o functions which will control 
old bug-to-bug behavior. IMHO, this is much better option that stopping hstore 
development or split hstore to two branches.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Peter Geoghegan

On Thu, Mar 6, 2014 at 12:23 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 That's possible to introduce GUC variable for i/o functions which will
 control old bug-to-bug behavior. IMHO, this is much better option that
 stopping hstore development or split hstore to two branches.

A GUC that controls i/o functions is generally considered to be an
unacceptable hack.

In what sense are we really stopping hstore development if hstore2
lives as jsonb? I have a hard time imagining someone dealing with the
incompatibility that a user-facing hstore2 would introduce, while
still preferring hstore syntax over json syntax given the choice.
There are very rich facilities for manipulating json available in
every programming language. The same is not true of hstore.

Having looked at the issue today, I think that the amount of redundant
code between a hstore2 in core as jsonb and hstore1 will be
acceptable. The advantages of making a clean-break in having to
support the legacy hstore disk format strengthen the case for doing so
too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Teodor Sigaev




In what sense are we really stopping hstore development if hstore2
lives as jsonb? I have a hard time imagining someone dealing with the
incompatibility that a user-facing hstore2 would introduce, while
still preferring hstore syntax over json syntax given the choice.
There are very rich facilities for manipulating json available in
every programming language. The same is not true of hstore.
It's true for perl. Syntax of hstore is close to hash/array syntax and it's easy 
serialize/deserialize hstore to/from perl. Syntax of hstore was inspired by perl.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Peter Geoghegan

On Thu, Mar 6, 2014 at 1:32 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 It's true for perl. Syntax of hstore is close to hash/array syntax and it's
 easy serialize/deserialize hstore to/from perl. Syntax of hstore was
 inspired by perl.

I understand that. There is a module on CPAN called Pg::hstore that
will do this; it appears to have been around since 2011. I don't use
Perl, so I don't know a lot about it. Perhaps David Wheeler has an
opinion on the value of Perl-like syntax, as a long time Perl
enthusiast?

In any case, Perl has excellent support for JSON, just like every
other language - you are at no particular advantage in Perl by having
a format that happens to more closely resemble the format of Perl
hashes and arrays. I really feel that we should concentrate our
efforts on one standardized format here. It makes the effort to
integrate your good work, in a way that makes it available to everyone
so much easier.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Thu, Mar 6, 2014 at 12:43 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Mar 6, 2014 at 12:23 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 That's possible to introduce GUC variable for i/o functions which will
 control old bug-to-bug behavior. IMHO, this is much better option that
 stopping hstore development or split hstore to two branches.

 A GUC that controls i/o functions is generally considered to be an
 unacceptable hack.

 In what sense are we really stopping hstore development if hstore2
 lives as jsonb? I have a hard time imagining someone dealing with the
 incompatibility that a user-facing hstore2 would introduce, while
 still preferring hstore syntax over json syntax given the choice.
 There are very rich facilities for manipulating json available in
 every programming language. The same is not true of hstore.

 Having looked at the issue today, I think that the amount of redundant
 code between a hstore2 in core as jsonb and hstore1 will be
 acceptable. The advantages of making a clean-break in having to
 support the legacy hstore disk format strengthen the case for doing so
 too.

Heh, let's not to do an implusive decision about hstore2. I agree,
that jsonb has
a lot of facilities, but don't forget, that json(b) has to follow standard and
in that sense it's more constrained than hstore, which we could further
develop to support some interesting features, which will never be implemented
in json(b).  Also,  it'd be a bit awkward after working on nested
hstore and declaring it
on several conferences (Engine Yard has sponsored part of our hstore
work), suddenly
break people expectation and say, that our work has moved to core to
provide json
some very cool features, good bye, hstore users :(   I'm afraid people
will not understand us.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Andrew Dunstan



On 03/06/2014 08:16 AM, Oleg Bartunov wrote:

On Thu, Mar 6, 2014 at 12:43 PM, Peter Geoghegan p...@heroku.com wrote:

On Thu, Mar 6, 2014 at 12:23 AM, Teodor Sigaev teo...@sigaev.ru wrote:

That's possible to introduce GUC variable for i/o functions which will
control old bug-to-bug behavior. IMHO, this is much better option that
stopping hstore development or split hstore to two branches.

A GUC that controls i/o functions is generally considered to be an
unacceptable hack.

In what sense are we really stopping hstore development if hstore2
lives as jsonb? I have a hard time imagining someone dealing with the
incompatibility that a user-facing hstore2 would introduce, while
still preferring hstore syntax over json syntax given the choice.
There are very rich facilities for manipulating json available in
every programming language. The same is not true of hstore.

Having looked at the issue today, I think that the amount of redundant
code between a hstore2 in core as jsonb and hstore1 will be
acceptable. The advantages of making a clean-break in having to
support the legacy hstore disk format strengthen the case for doing so
too.

Heh, let's not to do an implusive decision about hstore2. I agree,
that jsonb has
a lot of facilities, but don't forget, that json(b) has to follow standard and
in that sense it's more constrained than hstore, which we could further
develop to support some interesting features, which will never be implemented
in json(b).  Also,  it'd be a bit awkward after working on nested
hstore and declaring it
on several conferences (Engine Yard has sponsored part of our hstore
work), suddenly
break people expectation and say, that our work has moved to core to
provide json
some very cool features, good bye, hstore users :(   I'm afraid people
will not understand us.




Oleg,

I hear you, and largely agree, as long as the compatibility issue is 
solved. If it's not, I think inventing a new hstore2 type is probably a 
lousy way to go.


For good or ill, the world has pretty much settled on wanting to use 
json for lightweight treeish data. That's where we'll get the most 
impact. Virtually every programming language (including Perl) has good 
support for json.


I'm not sure what the constraints of json that you might want to break 
are. Perhaps you'd like to specify.


Whatever we do, rest assured your work won't go to waste.

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Bruce Momjian

On Thu, Mar  6, 2014 at 09:33:18AM -0500, Andrew Dunstan wrote:
 I hear you, and largely agree, as long as the compatibility issue is
 solved. If it's not, I think inventing a new hstore2 type is
 probably a lousy way to go.
 
 For good or ill, the world has pretty much settled on wanting to use
 json for lightweight treeish data. That's where we'll get the most
 impact. Virtually every programming language (including Perl) has
 good support for json.
 
 I'm not sure what the constraints of json that you might want to
 break are. Perhaps you'd like to specify.
 
 Whatever we do, rest assured your work won't go to waste.

OK, just to summarize:

JSONB and everything it shares with hstore will be in core
hstore-specific code stays in contrib
hstore contrib will create an hstore type to call contrib and core code
9.4 hstore has some differences from pre-9.4

The question is whether we change/improve hstore in 9.4, or create an
hstore2 that is the improved hstore for 9.4 and keep hstore identical to
pre-9.4.  That last option looks an awful like the dreaded VARCHAR2.

What can we do to help people migrate to an hstore type that supports
data types?  Is there a function we can give them to flag possible
problem data, or give them some function to format things the old way
for migrations, etc.  If they are going to have to rewrite all their old
data, why bother with a backward-compatible binary format?  Is it only
the client applications that will need to be changed?  How would we
instruct users on the necessary changes?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 OK, just to summarize:

   JSONB and everything it shares with hstore will be in core
   hstore-specific code stays in contrib
   hstore contrib will create an hstore type to call contrib and core code
   9.4 hstore has some differences from pre-9.4

I've got a problem with the last part of that.  AFAICS, the value
proposition for hstore2 largely fails if it's not 100% upward compatible
with existing hstore, both as to on-disk storage and as to application-
visible behavior.  If you've got to adapt your application anyway, why
not switch to JSONB which is going to offer a lot of benefits in terms
of available code you can work with?

Although I've not looked at the patch, it was claimed upthread that there
were changes in the I/O format for existing test cases, for example.
IMO, that's an absolute dead no-go.

 The question is whether we change/improve hstore in 9.4, or create an
 hstore2 that is the improved hstore for 9.4 and keep hstore identical to
 pre-9.4.  That last option looks an awful like the dreaded VARCHAR2.

I think hstore2 as a separate type isn't likely to be a win either.

The bottom line here is that hstore2 is more or less what we'd agreed to
doing back at the last PGCon, but that decision has now been obsoleted by
events in the JSON area.  If jsonb gets in, I think we probably end up
rejecting hstore2 as such.  Or at least, that's what we should do IMO.
contrib/hstore is now a legacy type and we shouldn't be putting additional
work into it, especially not work that breaks backwards compatibility.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Ronan Dunklau

Le jeudi 6 mars 2014 09:33:18 Andrew Dunstan a écrit :
 On 03/06/2014 08:16 AM, Oleg Bartunov wrote:
  On Thu, Mar 6, 2014 at 12:43 PM, Peter Geoghegan p...@heroku.com wrote:
  On Thu, Mar 6, 2014 at 12:23 AM, Teodor Sigaev teo...@sigaev.ru wrote:
  That's possible to introduce GUC variable for i/o functions which will
  control old bug-to-bug behavior. IMHO, this is much better option that
  stopping hstore development or split hstore to two branches.
  
  A GUC that controls i/o functions is generally considered to be an
  unacceptable hack.
  
  In what sense are we really stopping hstore development if hstore2
  lives as jsonb? I have a hard time imagining someone dealing with the
  incompatibility that a user-facing hstore2 would introduce, while
  still preferring hstore syntax over json syntax given the choice.
  There are very rich facilities for manipulating json available in
  every programming language. The same is not true of hstore.
  
  Having looked at the issue today, I think that the amount of redundant
  code between a hstore2 in core as jsonb and hstore1 will be
  acceptable. The advantages of making a clean-break in having to
  support the legacy hstore disk format strengthen the case for doing so
  too.
  
  Heh, let's not to do an implusive decision about hstore2. I agree,
  that jsonb has
  a lot of facilities, but don't forget, that json(b) has to follow standard
  and in that sense it's more constrained than hstore, which we could
  further develop to support some interesting features, which will never be
  implemented in json(b).  Also,  it'd be a bit awkward after working on
  nested
  hstore and declaring it
  on several conferences (Engine Yard has sponsored part of our hstore
  work), suddenly
  break people expectation and say, that our work has moved to core to
  provide json
  some very cool features, good bye, hstore users :(   I'm afraid people
  will not understand us.
 
 Oleg,
 
 I hear you, and largely agree, as long as the compatibility issue is
 solved. If it's not, I think inventing a new hstore2 type is probably a
 lousy way to go.
 
 For good or ill, the world has pretty much settled on wanting to use
 json for lightweight treeish data. That's where we'll get the most
 impact. Virtually every programming language (including Perl) has good
 support for json.
 
 I'm not sure what the constraints of json that you might want to break
 are. Perhaps you'd like to specify.

I haven't followed the whole thread, but json is really restrictive on the 
supported types: a hierarchical hstore could maybe support more types 
(timestamp comes to mind) as its values, which is not a valid data type in the 
json spec.

 
 Whatever we do, rest assured your work won't go to waste.
 
 cheers
 
 andrew

-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Magnus Hagander

On Thu, Mar 6, 2014 at 4:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Bruce Momjian br...@momjian.us writes:
  OK, just to summarize:

JSONB and everything it shares with hstore will be in core
hstore-specific code stays in contrib
hstore contrib will create an hstore type to call contrib and core
 code
9.4 hstore has some differences from pre-9.4

 I've got a problem with the last part of that.  AFAICS, the value
 proposition for hstore2 largely fails if it's not 100% upward compatible
 with existing hstore, both as to on-disk storage and as to application-
 visible behavior.  If you've got to adapt your application anyway, why
 not switch to JSONB which is going to offer a lot of benefits in terms
 of available code you can work with?

 Although I've not looked at the patch, it was claimed upthread that there
 were changes in the I/O format for existing test cases, for example.
 IMO, that's an absolute dead no-go.

  The question is whether we change/improve hstore in 9.4, or create an
  hstore2 that is the improved hstore for 9.4 and keep hstore identical to
  pre-9.4.  That last option looks an awful like the dreaded VARCHAR2.

 I think hstore2 as a separate type isn't likely to be a win either.

 The bottom line here is that hstore2 is more or less what we'd agreed to
 doing back at the last PGCon, but that decision has now been obsoleted by
 events in the JSON area.  If jsonb gets in, I think we probably end up
 rejecting hstore2 as such.  Or at least, that's what we should do IMO.
 contrib/hstore is now a legacy type and we shouldn't be putting additional
 work into it, especially not work that breaks backwards compatibility.


(not read up on the full details of the thread, sorry if I'm re-iterating
something)

I think we definitely want/need to maintain hstore compatibility. A
completely separate hstore2 type that's not backwards compatible makes very
little sense.

However, if the new hstore type (compatible with the old one) is the
wrapper around jsonb, rather than the other way around, I don't see any
problem with it at all. Most future users are almost certainly going to use
the json interfaces, but we don't want to leave upgraded users behind. (But
of course it has to actually maintain backwards compatibility for that
argument to hold)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 However, if the new hstore type (compatible with the old one) is the
 wrapper around jsonb, rather than the other way around, I don't see any
 problem with it at all. Most future users are almost certainly going to use
 the json interfaces, but we don't want to leave upgraded users behind. (But
 of course it has to actually maintain backwards compatibility for that
 argument to hold)

Yeah --- all of this turns on whether hstore improvements can be 100%
upwards compatible or not.  If they are, I don't object to including them;
I'd have said it was wasted effort, but if the work is already done then
that's moot.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Andrew Dunstan



On 03/06/2014 10:46 AM, Tom Lane wrote:

Magnus Hagander mag...@hagander.net writes:

However, if the new hstore type (compatible with the old one) is the
wrapper around jsonb, rather than the other way around, I don't see any
problem with it at all. Most future users are almost certainly going to use
the json interfaces, but we don't want to leave upgraded users behind. (But
of course it has to actually maintain backwards compatibility for that
argument to hold)

Yeah --- all of this turns on whether hstore improvements can be 100%
upwards compatible or not.  If they are, I don't object to including them;
I'd have said it was wasted effort, but if the work is already done then
that's moot.



Clearly there are people who want it, or else they would not have 
sponsored the work.


We seem to have an emerging consensus on the compatibility issue.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Heikki Linnakangas


On 03/06/2014 05:46 PM, Tom Lane wrote:

Magnus Hagander mag...@hagander.net writes:

However, if the new hstore type (compatible with the old one) is the
wrapper around jsonb, rather than the other way around, I don't see any
problem with it at all. Most future users are almost certainly going to use
the json interfaces, but we don't want to leave upgraded users behind. (But
of course it has to actually maintain backwards compatibility for that
argument to hold)


Yeah --- all of this turns on whether hstore improvements can be 100%
upwards compatible or not.  If they are, I don't object to including them;


There are reasons for *not* wanting the new hstore2 functionality. If 
you don't want nesting, for example, with the new type you're going to 
need to add a constraint to forbid that. Ugh. Many applications are 
happy with the current functionality, a simple string key/value 
dictionary, and for them the new features are not an improvement.


As an analogy, adding significant new functionality like nesting to the 
existing hstore type is like suddenly adding the time of day to the date 
datatype. It might be useful in many cases. And an existing application 
can leave the hour and minute fields zero, so it's backwards-compatible. 
But as soon as someone inserts a datum that uses the hour and minute 
fields, it will confuse the application that doesn't know about that.


I haven't been following these discussions closely, but for those 
reasons, I thought hstore2 was going to be a separate type. I don't 
think there are very many applications that would be interested in 
upgrading from the current hstore to the new hstore2 type. More 
likely, the new data type is useful for many applications that couldn't 
have used hstore before because it didn't support nesting or was too 
loosely typed. And old applications that are already using hstore are 
perfectly happy with the status quo.


Let's not mess with the existing hstore datatype. For what it does, it 
works great.


Likewise, jsonb is significantly different from hstore2, so it should be 
a separate data type. Frankly I don't understand what the problem is 
with doing that. I don't have a problem with copy-pasting the common parts.


BTW, now that I look at the nested hstore patch, I'm disappointed to see 
that it only supports a few hardcoded datatypes. Call me naive, but 
somehow I thought it would support *all* PostgreSQL datatypes, built-in 
or user-defined. I realize that's a tall order, but that's what I 
thought it did. Since it doesn't, color me unimpressed. It's really not 
any better than json, I don't see why anyone would prefer it over json. 
Not that I particularly like json, but it's a format a lot of people are 
familiar with.


So here my opinion on what we should do:

1. Forget about hstore2
2. Add GIN and GIST operator classes to jsonb, if they're ready for 
commit pretty darn soon. If not, punt them to next release.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Merlin Moncure

On Thu, Mar 6, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So here my opinion on what we should do:

 1. Forget about hstore2
 2. Add GIN and GIST operator classes to jsonb, if they're ready for commit
 pretty darn soon. If not, punt them to next release.

For #2, would we maintain the hstore syntax for the searching
operators.  For example,

SELECT count(*) FROM jsonb_schema WHERE tabledata @ 'columns =
{{column_name=total_time}}';

Note the hstore-ish = in the searching operator.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

Hi there,

Looks like consensus is done. I and Teodor are not happy with it, but
what we can do :)   One thing I  want to do is to reserve our
contribution to the flagship feature (jsonb), particularly, binary
storage for nested structures and indexing. Their work was sponsored
by Engine Yard.

As for the old hstore I think it'd be nice to add gin_hstore_hash_ops,
so hstore users will benefit from 9.4 release. There is no
compatibiliy issue, so I think this could be harmless.

Oleg

On Thu, Mar 6, 2014 at 7:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 OK, just to summarize:

   JSONB and everything it shares with hstore will be in core
   hstore-specific code stays in contrib
   hstore contrib will create an hstore type to call contrib and core code
   9.4 hstore has some differences from pre-9.4

 I've got a problem with the last part of that.  AFAICS, the value
 proposition for hstore2 largely fails if it's not 100% upward compatible
 with existing hstore, both as to on-disk storage and as to application-
 visible behavior.  If you've got to adapt your application anyway, why
 not switch to JSONB which is going to offer a lot of benefits in terms
 of available code you can work with?

 Although I've not looked at the patch, it was claimed upthread that there
 were changes in the I/O format for existing test cases, for example.
 IMO, that's an absolute dead no-go.

 The question is whether we change/improve hstore in 9.4, or create an
 hstore2 that is the improved hstore for 9.4 and keep hstore identical to
 pre-9.4.  That last option looks an awful like the dreaded VARCHAR2.

 I think hstore2 as a separate type isn't likely to be a win either.

 The bottom line here is that hstore2 is more or less what we'd agreed to
 doing back at the last PGCon, but that decision has now been obsoleted by
 events in the JSON area.  If jsonb gets in, I think we probably end up
 rejecting hstore2 as such.  Or at least, that's what we should do IMO.
 contrib/hstore is now a legacy type and we shouldn't be putting additional
 work into it, especially not work that breaks backwards compatibility.

 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Andrew Dunstan







On 03/06/2014 12:50 PM, Oleg Bartunov wrote:

Hi there,

Looks like consensus is done. I and Teodor are not happy with it, but
what we can do :)   One thing I  want to do is to reserve our
contribution to the flagship feature (jsonb), particularly, binary
storage for nested structures and indexing. Their work was sponsored
by Engine Yard.



We don't normally credit sponsors in commits, but if I'm doing the 
commit I promise you guys would certainly get major credit as authors.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

I meant in Release Notes for 9.4

On Thu, Mar 6, 2014 at 10:26 PM, Andrew Dunstan and...@dunslane.net wrote:





 On 03/06/2014 12:50 PM, Oleg Bartunov wrote:

 Hi there,

 Looks like consensus is done. I and Teodor are not happy with it, but
 what we can do :)   One thing I  want to do is to reserve our
 contribution to the flagship feature (jsonb), particularly, binary
 storage for nested structures and indexing. Their work was sponsored
 by Engine Yard.



 We don't normally credit sponsors in commits, but if I'm doing the commit I
 promise you guys would certainly get major credit as authors.


 cheers

 andrew



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Josh Berkus

On 03/06/2014 07:00 AM, Bruce Momjian wrote:
 What can we do to help people migrate to an hstore type that supports
 data types?  Is there a function we can give them to flag possible
 problem data, or give them some function to format things the old way
 for migrations, etc.  If they are going to have to rewrite all their old
 data, why bother with a backward-compatible binary format?  Is it only
 the client applications that will need to be changed?  How would we
 instruct users on the necessary changes?

So, from what I've been able to check:

The actual storage upgrade of hstore--hstore2 is fairly painless from
the user perspective; they don't have to do anything.  The problem is
that the input/output strings are different, something which I didn't
think to check for (and Peter did), and which will break applications
relying on Hstore, since the drivers which support Hstore (like
psycopg2) rely on string-parsing to convert it.  I haven't
regression-tested hstore2 against psycopg2 since I don't have a good
test, but that would be a useful thing to do.

Hstore2 supports the same limited data types as JSON does, and not any
additional ones.

This makes an hstore2 of dubious value unless the compatibility issues
can be solved conclusively.

Is that all correct?  Have I missed something?

On 03/06/2014 09:50 AM, Oleg Bartunov wrote: Looks like consensus is
done. I and Teodor are not happy with it, but
 what we can do :)   One thing I  want to do is to reserve our
 contribution to the flagship feature (jsonb), particularly, binary
 storage for nested structures and indexing. Their work was sponsored
 by Engine Yard.

We don't generally credit companies in the release notes, since if we
started, where would we stop?  However, we *do* credit them in the press
release, and I'll make a note of the EY sponsorship, especially since
it's also good PR.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore