Re: [GENERAL] jsonb

2017-11-14 Thread Merlin Moncure
On Tue, Nov 14, 2017 at 3:32 PM, hmidi slim  wrote:
> I have a column name of type 'jsonb' on my table named product. The format
> of the column:
> name: {"key1": "text1", "key2": "text2"}
>
> When I make a query to fetch data from the table I got this format:
> name: '{"key1": "text1", "key2": "text2"}'
>
> Why does postgresql returns the name such as string type and not jsonb? is
> it a bug or is there something else to add?

not quite following.  Can you paste the query you are trying to
execute along with the results vs. expectation?  thanks

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb

2017-11-14 Thread hmidi slim
I'm trying to use the function to_jsonb and create the name:
to_jsonb ('{"key1":" ' || 'text1' || '","key2":" ' || 'text2' || '"}');
But after that I used Objection.js ORM to get data using the query:
Product.query().where('id',1).then(prod => {console.log(prod)})
I think that the problem maybe with the usage of to_jsonb function, maybe I
miss something. But when I fetch the data with the ORM I found that the
type was a string and not a jsonb


2017-11-14 23:09 GMT+01:00 Merlin Moncure :

> On Tue, Nov 14, 2017 at 3:32 PM, hmidi slim  wrote:
> > I have a column name of type 'jsonb' on my table named product. The
> format
> > of the column:
> > name: {"key1": "text1", "key2": "text2"}
> >
> > When I make a query to fetch data from the table I got this format:
> > name: '{"key1": "text1", "key2": "text2"}'
> >
> > Why does postgresql returns the name such as string type and not jsonb?
> is
> > it a bug or is there something else to add?
>
> not quite following.  Can you paste the query you are trying to
> execute along with the results vs. expectation?  thanks
>
> merlin
>


Re: [GENERAL] jsonb

2017-11-14 Thread Merlin Moncure
On Tue, Nov 14, 2017 at 4:30 PM, hmidi slim  wrote:
> I'm trying to use the function to_jsonb and create the name:
> to_jsonb ('{"key1":" ' || 'text1' || '","key2":" ' || 'text2' || '"}');
> But after that I used Objection.js ORM to get data using the query:
> Product.query().where('id',1).then(prod => {console.log(prod)})
> I think that the problem maybe with the usage of to_jsonb function, maybe I
> miss something. But when I fetch the data with the ORM I found that the type
> was a string and not a jsonb

Still not quite following. My advice would be to:

#1) work out the SQL you want the database to be running and verify
the results are correct

#2) figure out out to get the ORM to send that SQL

If you need help figuring out that SQL the ORM is actually running,
try turning on statement logging in postgresql.conf and watching the
log.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb

2017-11-14 Thread Johannes Graën
On 11/14/2017 11:30 PM, hmidi slim wrote:
> I'm trying to use the function to_jsonb and create the name:
> to_jsonb ('{"key1":" ' || 'text1' || '","key2":" ' || 'text2' || '"}');

This query converts a string into a JSON object that consist of that
string. I guess what you intend to accomplish is rather:

select jsonb_build_object('key1', 'text1', 'key2', 'text2');




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb

2017-11-14 Thread John R Pierce

On 11/14/2017 2:30 PM, hmidi slim wrote:

I'm trying to use the function to_jsonb and create the name:
to_jsonb ('{"key1":" ' || 'text1' || '","key2":" ' || 'text2' || '"}');
But after that I used Objection.js ORM to get data using the query:
Product.query().where('id',1).then(prod => {console.log(prod)})
I think that the problem maybe with the usage of to_jsonb function, 
maybe I miss something. But when I fetch the data with the ORM I found 
that the type was a string and not a jsonb




never heard of your ORM... does it even know what postgres jsonb is ?   
do you know what actual SQL query that piece of ORMism generates ?




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb search

2016-06-28 Thread Oleg Bartunov
On Tue, Jun 28, 2016 at 5:15 PM, Armand Pirvu (home)
 wrote:
> Hi
>
> In my quest of JSONB querying and searching without having to actually cast
> into a text, I found JSQuery
>
> I do admit my JSONB knowledge shortcoming and I am not a developer but a
> DBA. As such some examples would be greatly appreciated since I tend to
> understand better
>
> I compiled and installed the extension
>
> 1 - Exact matching without knowing the hierarchy, just the key and element,
> I built a set like
>
> col1 |   col2
> --+--
>1 | {"Home Email": {"EmailAddress": "1...@yahoo.com"}}
>2 | {"Home Email": {"EmailAddress": "2...@yahoo.com"}}
>3 | {"Home Email": {"EmailAddress": "3...@yahoo.com"}}
>
>
> JSQuqery is super
>
> SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress = "1...@yahoo.com"';
>
> Now I can do a performance boost using
>
> CREATE INDEX idx1 ON test1 USING GIN (col2 jsonb_value_path_ops);
>
> I see this yield
>
> from
>
> testdb=# explain analyze^JSELECT * FROM test1 WHERE col2 @@ '*.EmailAddress
> = "1...@yahoo.com"';
> Seq Scan on test1  (cost=0.00..12423.00 rows=500 width=68) (actual
> time=0.016..160.777 rows=1 loops=1)
>   Filter: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
>   Rows Removed by Filter: 49
> Planning time: 0.042 ms
> Execution time: 160.799 ms
> (5 rows)
>
>
> to
>
> testdb-# SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress =
> "1...@yahoo.com"';
> Bitmap Heap Scan on test1  (cost=31.88..1559.32 rows=500 width=68) (actual
> time=0.018..0.019 rows=1 loops=1)
>   Recheck Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
>   Heap Blocks: exact=1
>   ->  Bitmap Index Scan on idx1  (cost=0.00..31.75 rows=500 width=0) (actual
> time=0.011..0.011 rows=1 loops=1)
> Index Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
> Planning time: 0.039 ms
> Execution time: 0.038 ms
> (7 rows)
>
> A whooping 4000 times improvement
>
>
>
>
> But I also noticed a vodka index
>
>
> testdb=# CREATE INDEX idx2 ON
> testdb-# test1 USING vodka (col2);
> ERROR:  access method "vodka" does not exist
>
> What am I missing ?
>
> 2 - Is there anyway I can accomplish a pattern and/or case insensitive
> search using JSQuery similar to
>
>
> select * from test2 where upper((col2 -> 'Home Email') ->> 'EmailAddress')
> ilike '%3%YAH%';
>
> select * from test2 where (col2 -> 'Home Email') ->> 'EmailAddress' like
> '%3%yah%';
>
>
> If so what indexing strategy can be used to have similar gains as above ?
>
>
> Many thanks for any help

Vodka is our experimental prototype of access method of next
generation and it doesn't exists in production-ready form. You can
check our presentation
http://www.sai.msu.su/~megera/postgres/talks/highload-2014-vodka.pdf
to understand jsquery limitation and why we stop its development.
Also, 2 years ago I wrote (in russian)
http://obartunov.livejournal.com/179422.html about jsonb query
language and our plans. Google translate might helps

https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fobartunov.livejournal.com%2F179422.html&edit-text=&act=url


>
>
> Armand
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb search

2016-06-28 Thread Arthur Silva
On Tue, Jun 28, 2016 at 5:09 PM, Oleg Bartunov  wrote:

> On Tue, Jun 28, 2016 at 5:15 PM, Armand Pirvu (home)
>  wrote:
> > Hi
> >
> > In my quest of JSONB querying and searching without having to actually
> cast
> > into a text, I found JSQuery
> >
> > I do admit my JSONB knowledge shortcoming and I am not a developer but a
> > DBA. As such some examples would be greatly appreciated since I tend to
> > understand better
> >
> > I compiled and installed the extension
> >
> > 1 - Exact matching without knowing the hierarchy, just the key and
> element,
> > I built a set like
> >
> > col1 |   col2
> > --+--
> >1 | {"Home Email": {"EmailAddress": "1...@yahoo.com"}}
> >2 | {"Home Email": {"EmailAddress": "2...@yahoo.com"}}
> >3 | {"Home Email": {"EmailAddress": "3...@yahoo.com"}}
> >
> >
> > JSQuqery is super
> >
> > SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress = "1...@yahoo.com"';
> >
> > Now I can do a performance boost using
> >
> > CREATE INDEX idx1 ON test1 USING GIN (col2 jsonb_value_path_ops);
> >
> > I see this yield
> >
> > from
> >
> > testdb=# explain analyze^JSELECT * FROM test1 WHERE col2 @@
> '*.EmailAddress
> > = "1...@yahoo.com"';
> > Seq Scan on test1  (cost=0.00..12423.00 rows=500 width=68) (actual
> > time=0.016..160.777 rows=1 loops=1)
> >   Filter: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
> >   Rows Removed by Filter: 49
> > Planning time: 0.042 ms
> > Execution time: 160.799 ms
> > (5 rows)
> >
> >
> > to
> >
> > testdb-# SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress =
> > "1...@yahoo.com"';
> > Bitmap Heap Scan on test1  (cost=31.88..1559.32 rows=500 width=68)
> (actual
> > time=0.018..0.019 rows=1 loops=1)
> >   Recheck Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
> >   Heap Blocks: exact=1
> >   ->  Bitmap Index Scan on idx1  (cost=0.00..31.75 rows=500 width=0)
> (actual
> > time=0.011..0.011 rows=1 loops=1)
> > Index Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com
> "'::jsquery)
> > Planning time: 0.039 ms
> > Execution time: 0.038 ms
> > (7 rows)
> >
> > A whooping 4000 times improvement
> >
> >
> >
> >
> > But I also noticed a vodka index
> >
> >
> > testdb=# CREATE INDEX idx2 ON
> > testdb-# test1 USING vodka (col2);
> > ERROR:  access method "vodka" does not exist
> >
> > What am I missing ?
> >
> > 2 - Is there anyway I can accomplish a pattern and/or case insensitive
> > search using JSQuery similar to
> >
> >
> > select * from test2 where upper((col2 -> 'Home Email') ->>
> 'EmailAddress')
> > ilike '%3%YAH%';
> >
> > select * from test2 where (col2 -> 'Home Email') ->> 'EmailAddress' like
> > '%3%yah%';
> >
> >
> > If so what indexing strategy can be used to have similar gains as above ?
> >
> >
> > Many thanks for any help
>
> Vodka is our experimental prototype of access method of next
> generation and it doesn't exists in production-ready form. You can
> check our presentation
> http://www.sai.msu.su/~megera/postgres/talks/highload-2014-vodka.pdf
> to understand jsquery limitation and why we stop its development.
> Also, 2 years ago I wrote (in russian)
> http://obartunov.livejournal.com/179422.html about jsonb query
> language and our plans. Google translate might helps
>
>
> https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fobartunov.livejournal.com%2F179422.html&edit-text=&act=url
>
>
> >
> >
> > Armand
> >
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Could you share your future plans for it (or it's reincarnation), if any?

Even in the limited form, vodka is very impressive.

--
Arthur Silva


Re: [GENERAL] JSONB Overlap Operator.

2017-03-20 Thread Merlin Moncure
On Fri, Mar 17, 2017 at 9:43 AM, Eduardo Felipe  wrote:
> Hi there!
>
> In a project I was experimenting with replacing array columns with JSONB
> columns, to allow a greater flexibility of types.
>
> One thing that I found missing is the "overlap" operator (&&).
>
> JSONB includes all other operators from arrays, such as containment (@>,
> <@), comparison (>, >=, <, <=) and inclusion (?). The only one missing is
> the overlap operator (&&).
>
> Does anyone know of a technical reason that operator was not included on
> JSONB?

I really miss hstore slice() (which is a function, not an operator,
but the distinction is is minor IMO) where you can pass array of keys
and get a smaller object back.  This would have to be redefined a bit
since there are new edge cases with jsonb vs hstore but it ought to be
reasonably worked out.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb creation functions?

2014-08-01 Thread Christoph Moench-Tegeder
Hi,

> There is a set of creation functions for json, such as:
> 
>  to_json(anyelement)
> 
> There doesn't seem to be any equivalent functions for converting text to
> jsonb.
> 
> Is there a way to do this?

You can always cast json to jsonb:
test_db=# create table t (a integer primary key, b jsonb);
CREATE TABLE
test_db=# insert into t (a, b) values (1, to_json('a'::text)::jsonb);
INSERT 0 1
test_db=# select * from t;
 a |  b  
---+-
 1 | "a"
(1 row)

test_db=# insert into t (a, b) values (2, 
to_json('{"a","b","c"}'::text[])::jsonb);
INSERT 0 1
test_db=# select * from t;
 a |b
---+-
 1 | "a"
 2 | ["a", "b", "c"]
(2 rows)

Regards,
Christoph

-- 
Spare Space


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB matching element count

2015-05-31 Thread Michael Paquier
On Sun, May 31, 2015 at 11:07 PM, Arup Rakshit
 wrote:
> Hi,
>
> This says if matched found or not against the input array :
>
> '{"a":1, "b":2, "c":3}'::jsonb ?| array['b', 'c']
>
> But how would I determine how many matched ? Like for the above example, I 
> see only 2 matched found.

You could use "?" to do the check for each individual key, and then
count how many matched...
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev

and I am trying to get value via  jsonb->parentKey->childKey
it seems it is very slow.
Would it be actually faster to use top level key only and parse it at client 
side?


Suppose, most time is spent for decompressing huge value, not for actual search 
inside jsonb. If so, we need to implement some search method which decompress 
some chunks of jsonb.



Could you send to me an example of that jsonb?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev

Suppose, most time is spent for decompressing huge value, not for actual search
inside jsonb. If so, we need to implement some search method which decompress
some chunks of jsonb.

On artificial example:
%SAMP IMAGE  FUNCTION CALLERS
 92.9 postgres   pglz_decompress  toast_decompress_datum


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Tom Smith
It can be any jsonb so I am asking a general question of the implementaion
for each jsonb storage (not about GIN or table wide indexing, but only
within
a single jsonb item in a single row.

A sample would be like (no quotes)

{
a1: {b1:v1, b2:v2, b100:v100}

a3000: {c1:x1, c2: x2. c200: v200
}

you get the idea.  I wonder how postgresql does it when try

jsonb->a1->b1,

 does it read in the whole jsonb tree structure in memory
and get to v1  or it has some optimization so only get v1 instead
of reading in the whole structure.









On Tue, Sep 8, 2015 at 11:58 AM, Teodor Sigaev  wrote:

> and I am trying to get value via  jsonb->parentKey->childKey
>> it seems it is very slow.
>> Would it be actually faster to use top level key only and parse it at
>> client side?
>>
>
> Suppose, most time is spent for decompressing huge value, not for actual
> search inside jsonb. If so, we need to implement some search method which
> decompress some chunks of jsonb.
>
>
> Could you send to me an example of that jsonb?
>
>
>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>


Re: [GENERAL] jsonb value retrieval performance

2015-09-09 Thread Teodor Sigaev

  does it read in the whole jsonb tree structure in memory
and get to v1  or it has some optimization so only get v1 instead
of reading in the whole structure.


it reads, untoasts and uncompresses whole value and then executes search. An 
idea to fix that is a reading jsonb value by only needed chunks.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-09 Thread Tom Lane
"hari.prasath"  writes:
>  I am using jsonb for storing key-value pair information(500 keys) and it 
> was a very big data set with some 10M rows. Whenever i try to extract some 
> keys(let say some 10 keys and its values) its really very slow.
> Is this due to jsonb parsing (or) each time json will be loaded from disk to 
> memory for 10keys(mainly if my keys are at end of 500 this is very slow).?

It's probably mostly the cost to fetch and decompress the very wide json
field.  jsonb is pretty quick at finding an object key once it's got
the value available to look at.

You could possibly alleviate some of the speed issue by storing the column 
uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would
bloat your disk space requirements so I'm not really sure it'd be a win.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-10 Thread Jim Nasby

On 8/9/16 9:29 AM, Tom Lane wrote:

"hari.prasath"  writes:

 I am using jsonb for storing key-value pair information(500 keys) and it 
was a very big data set with some 10M rows. Whenever i try to extract some 
keys(let say some 10 keys and its values) its really very slow.
Is this due to jsonb parsing (or) each time json will be loaded from disk to 
memory for 10keys(mainly if my keys are at end of 500 this is very slow).?


It's probably mostly the cost to fetch and decompress the very wide json
field.  jsonb is pretty quick at finding an object key once it's got
the value available to look at.

You could possibly alleviate some of the speed issue by storing the column
uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would
bloat your disk space requirements so I'm not really sure it'd be a win.


Actually I've done some testing with this and there is a *significant* 
overhead in getting multiple keys from a large document. There's a 
significant extra cost for the first key, but there's also a non-trivial 
cost for every key after that.


I suspect the issue is the goofy logic used to store key name offsets 
(to improve compression), but I never got around to actually tracing it. 
I suspect there's a win to be had by having both json types use the 
ExpandedObject stuff.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-11 Thread Jim Nasby

Please CC the list.

On 8/11/16 2:19 AM, hari.prasath wrote:

Actually I've done some testing with this and there is a *significant*
overhead in getting multiple keys from a large document. There's a
significant extra cost for the first key, but there's also a non-trivial
cost for every key after that.


Why is it take some extra cost for the first key and less for keys after
that.?
Is there any specific reason for this.? if so please explain..


I never dug into why. As Tom posited, decompression might explain the 
time to get a single key out. Getting 10 keys instead of just 1 wasn't 
10x more expensive, but it was significantly more expensive than just 
getting a single key.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-11 Thread Tom Lane
Jim Nasby  writes:
> I never dug into why. As Tom posited, decompression might explain the 
> time to get a single key out. Getting 10 keys instead of just 1 wasn't 
> 10x more expensive, but it was significantly more expensive than just 
> getting a single key.

What were you doing to "get ten keys out"?  If those were ten separate
JSON operators, they'd likely have done ten separate decompressions.
You'd have saved something by having the TOAST data already fetched into
shared buffers, but it'd still hardly be free.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-11 Thread hari.prasath
>What were you doing to "get ten keys out"? If those were ten separate 

>JSON operators, they'd likely have done ten separate decompressions. 

>You'd have saved something by having the TOAST data already fetched into 

>shared buffers, but it'd still hardly be free. 



Now i got the point. Initially, i thought for n keys to extract from json only 
one time the full json is decompressed. But it's actually decompressing n times 
for n keys.




Thanks

- Harry







Re: [GENERAL] Jsonb extraction very slow

2016-08-16 Thread Merlin Moncure
On Thu, Aug 11, 2016 at 8:45 AM, Tom Lane  wrote:
> Jim Nasby  writes:
>> I never dug into why. As Tom posited, decompression might explain the
>> time to get a single key out. Getting 10 keys instead of just 1 wasn't
>> 10x more expensive, but it was significantly more expensive than just
>> getting a single key.
>
> What were you doing to "get ten keys out"?  If those were ten separate
> JSON operators, they'd likely have done ten separate decompressions.
> You'd have saved something by having the TOAST data already fetched into
> shared buffers, but it'd still hardly be free.

Huh -- FWICT there is no way to pull N values from a jsonb with # of
items M for any value of N other than 1 or M with a single operation.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-16 Thread Jim Nasby

On 8/11/16 8:45 AM, Tom Lane wrote:

Jim Nasby  writes:

I never dug into why. As Tom posited, decompression might explain the
time to get a single key out. Getting 10 keys instead of just 1 wasn't
10x more expensive, but it was significantly more expensive than just
getting a single key.


What were you doing to "get ten keys out"?  If those were ten separate
JSON operators, they'd likely have done ten separate decompressions.
You'd have saved something by having the TOAST data already fetched into
shared buffers, but it'd still hardly be free.


Multiple -> or ->> operators, but all operating on the same field (which 
I thought would mean a single datum that would end up detoasted?).


Some of these would have been nested ->/->>. In essence, this was a set 
of nested views that ultimately pulled from a single JSONB field.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-16 Thread Tom Lane
Jim Nasby  writes:
> On 8/11/16 8:45 AM, Tom Lane wrote:
>> What were you doing to "get ten keys out"?  If those were ten separate
>> JSON operators, they'd likely have done ten separate decompressions.
>> You'd have saved something by having the TOAST data already fetched into
>> shared buffers, but it'd still hardly be free.

> Multiple -> or ->> operators, but all operating on the same field (which 
> I thought would mean a single datum that would end up detoasted?).

No, that's going to work as I said.  It'd be a useful thing to be able to
amortize the decompression work across multiple references to the field,
but currently there's no way to do that.

[ thinks for a bit... ]  In principle we could have the planner notice
whether there are multiple references to the same Var of a varlena type,
and then cue the executor to do a pre-emptive detoasting of that field
of the input tuple slot.  But it would be hard to avoid introducing some
regressions along with the benefits, I'm afraid.

> Some of these would have been nested ->/->>.

In a chain of functions only the first one would be paying the overhead
we're talking about here; though I'm not sure how efficient the case is
overall in JSONB.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Jsonb extraction very slow

2016-08-16 Thread Jim Nasby

On 8/16/16 10:19 AM, Tom Lane wrote:

[ thinks for a bit... ]  In principle we could have the planner notice
whether there are multiple references to the same Var of a varlena type,
and then cue the executor to do a pre-emptive detoasting of that field
of the input tuple slot.  But it would be hard to avoid introducing some
regressions along with the benefits, I'm afraid.



I suspect that the ExtendedObject stuff makes this even more 
appealing... it would certainly be nice if we only needed to pay the 
expansion cost once (assuming no one dirtied the expanded object). I 
certainly think there's more need for this kind of thing as the use of 
JSON expands.


Perhaps that's part of what Robert was suggesting recently with moving 
datums around the executor instead of tuples.



> Some of these would have been nested ->/->>.

In a chain of functions only the first one would be paying the overhead
we're talking about here; though I'm not sure how efficient the case is
overall in JSONB.


I've since heard that chaining -> is a really bad idea compared to #>, 
which is unfortunately because -> is the normal idiom in other languages 
(and what I suspect everyone will use by default). I've wondered if an 
expanded object version of json might be expanding only top-level keys 
(and maybe only as needed), and then -> is actually just a pointer to 
the originally expanded data. A chained -> then wouldn't need to 
duplicate everything... and in fact might be able to do it's expansion 
in the original object so that subsequent references to that key 
wouldn't need to re-expand it. I don't think the current EO framework 
supports that, but it doesn't seem impossible to add...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb case insensitive search

2017-05-31 Thread David G. Johnston
On Wed, May 31, 2017 at 12:18 PM, armand pirvu 
wrote:

>
> For the example mentioned
>
> SELECT *
> FROM cfg_files_data
> WHERE cfg_files_data.show_id = 32
> AND cfg_files_data.file_id = 123
> AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
> ;
>
>
> create index cfg_files_data_record_idx on cfg_files_data (show_id,
> file_id,
> ​​
> lower(file_data_record::text));
>
>
> Not sure why the index is ignored
>

Because ​"lower((file_data_record ->> 'Company'))" is not the same
as ​"lower(file_data_record::text)"


> But is is possible to go for a broader search, aka being able to search by
> any key:value , efficient and case insensitive ? What am I missing in
> this picture ?
>
>
Use a trigger to maintain an all lower case copy of the json
file_data_record and use the copy for predicates while using the original
​for select-list outputs.

David J.


Re: [GENERAL] jsonb case insensitive search

2017-05-31 Thread armand pirvu

> On May 31, 2017, at 2:32 PM, David G. Johnston  
> wrote:
> 
> On Wed, May 31, 2017 at 12:18 PM, armand pirvu  > wrote:
> 
> For the example mentioned
> 
> SELECT * 
> FROM cfg_files_data 
> WHERE cfg_files_data.show_id = 32 
> AND cfg_files_data.file_id = 123
> AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
> ;
> 
> 
> create index cfg_files_data_record_idx on cfg_files_data (show_id, file_id, 
> ​​lower(file_data_record::text));
> 
> 
> Not sure why the index is ignored
> 
> Because ​"lower((file_data_record ->> 'Company'))" is not the same as 
> ​"lower(file_data_record::text)”


I see, missed it 

Cause this works
create index fooidx on cfg_files_data (show_id, file_id, lower(file_data_record 
->> 'Company'));

The idea is that I would like to avoid having an index for each key possibly 
wanted to search, but rather say have the whole record then search by the key 
and get the key value, thus having one index serving multiple purposes so to 
speak


I looked at JSQuery but does not seem that I can have a composite index like 
fooidx ???!


> 
> 
> But is is possible to go for a broader search, aka being able to search by 
> any key:value , efficient and case insensitive ? What am I missing in this 
> picture ?
> 
> 
> Use a trigger to maintain an all lower case copy of the json file_data_record 
> and use the copy for predicates while using the original ​for select-list 
> outputs.
> 
> David J.



Uhh that would be let’s just say less than optimal. But maybe JSON itself is 
not intended to be used this way ? Or the functionality is just not there yet ?


Thanks
Armand





Re: [GENERAL] jsonb case insensitive search

2017-06-01 Thread Karl Czajkowski
On May 31, armand pirvu modulated:

> The idea is that I would like to avoid having an index for each key
> possibly wanted to search, but rather say have the whole record then
> search by the key and get the key value, thus having one index serving
> multiple purposes so to speak
> 

First, benchmarking would be important to figure out if any proposed
indexing actually speeds up the kinds of queries you want to perform.
With the recently added parallel query features, a simpler indexing
scheme with some brute-force search might be adequate?

But, you could use a search idiom like this:

 (lower(json_column::text)::json) -> lower('key') = 'value'::json

This will down-convert the case on all values and keys.  The left-hand
parenthetic expression could be precomputed in an expression index to
avoid repeated case conversion. But, typical searches will still have
to scan the whole index to perform the projection and match the final
value tests on the right-hand side.

If you want to do things like substring matching on field values, you
might stick with text and using regexp matches:

 (lower(json_column::text)) ~ 'valuepattern'

or more structural searches:

 (lower(json_column::text)) ~ '"key": "[^"]*substring[^"]*"'

Here, the left-hand expression could be trigram indexed to help with
sparse, substring matching without a full index scan.  We've had good
luck using trigram indexing with regexp matching, though I've honestly
never used it for the purpose sketched above...

Karl


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb case insensitive search

2017-06-01 Thread armand pirvu
Thank you Karl and David

Ideally as far as I can tell the index would need to be show_id, file_id, 
lower(…)


The question is if this is  possible ?


Thanks
Armand


> On Jun 1, 2017, at 12:24 PM, Karl Czajkowski  wrote:
> 
> On May 31, armand pirvu modulated:
> 
>> The idea is that I would like to avoid having an index for each key
>> possibly wanted to search, but rather say have the whole record then
>> search by the key and get the key value, thus having one index serving
>> multiple purposes so to speak
>> 
> 
> First, benchmarking would be important to figure out if any proposed
> indexing actually speeds up the kinds of queries you want to perform.
> With the recently added parallel query features, a simpler indexing
> scheme with some brute-force search might be adequate?
> 
> But, you could use a search idiom like this:
> 
> (lower(json_column::text)::json) -> lower('key') = 'value'::json
> 
> This will down-convert the case on all values and keys.  The left-hand
> parenthetic expression could be precomputed in an expression index to
> avoid repeated case conversion. But, typical searches will still have
> to scan the whole index to perform the projection and match the final
> value tests on the right-hand side.
> 
> If you want to do things like substring matching on field values, you
> might stick with text and using regexp matches:
> 
> (lower(json_column::text)) ~ 'valuepattern'
> 
> or more structural searches:
> 
> (lower(json_column::text)) ~ '"key": "[^"]*substring[^"]*"'
> 
> Here, the left-hand expression could be trigram indexed to help with
> sparse, substring matching without a full index scan.  We've had good
> luck using trigram indexing with regexp matching, though I've honestly
> never used it for the purpose sketched above...
> 
> Karl



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb case insensitive search

2017-06-01 Thread armand pirvu
I apologize before hand replying again on my own reply . I know it is frowned 
upon . My inline comments.

> On Jun 1, 2017, at 2:05 PM, armand pirvu  wrote:
> 
> Thank you Karl and David
> 
> Ideally as far as I can tell the index would need to be show_id, file_id, 
> lower(…)
> 
> 
> The question is if this is  possible ?
> 
> 
> Thanks
> Armand
> 
> 
>> On Jun 1, 2017, at 12:24 PM, Karl Czajkowski  wrote:
>> 
>> On May 31, armand pirvu modulated:
>> 
>>> The idea is that I would like to avoid having an index for each key
>>> possibly wanted to search, but rather say have the whole record then
>>> search by the key and get the key value, thus having one index serving
>>> multiple purposes so to speak
>>> 
>> 
>> First, benchmarking would be important to figure out if any proposed
>> indexing actually speeds up the kinds of queries you want to perform.
>> With the recently added parallel query features, a simpler indexing
>> scheme with some brute-force search might be adequate?
>> 

Not sure what you mean by benchmarking
But I think comparative times , aka 2 seconds vs a couple milliseconds is quite 
a difference. 
A table scan while in certain cases is okay , in a case when there is heavy 
usage on the same part/area , it will become a problem. 


>> But, you could use a search idiom like this:
>> 
>>(lower(json_column::text)::json) -> lower('key') = 'value'::json
>> 
>> This will down-convert the case on all values and keys.  The left-hand
>> parenthetic expression could be precomputed in an expression index to
>> avoid repeated case conversion. But, typical searches will still have
>> to scan the whole index to perform the projection and match the final
>> value tests on the right-hand side.
>> 
>> If you want to do things like substring matching on field values, you
>> might stick with text and using regexp matches:
>> 
>>(lower(json_column::text)) ~ ‘valuepattern'

In this case a regular index will be ignored even though IMO it should scan the 
index and get the needed information
The criteria I am after gets back 9 rows max out of 100k+ records so I say the 
restriction is darn good. Wouldn’t that be the case for the optimizer to pick 
the path with the least resistance aka best restriction ? Granted it uses a 
lower function which and the search in the text column which is the third in 
the index is not really starting form left. But the index starts with show_id , 
file_id and those are always part of the key. I can see though once the 
show_id, file_id is NOT a good restriction anymore , than the last column will 
make the difference . Either case will that not translate into an index scan ? 
Or the index to be considered in this case, event the last column search has to 
follow the left to right, aka not in between search ?  


>> 
>> or more structural searches:
>> 
>>(lower(json_column::text)) ~ '"key": "[^"]*substring[^"]*"'
>> 
>> Here, the left-hand expression could be trigram indexed to help with
>> sparse, substring matching without a full index scan.  We've had good
>> luck using trigram indexing with regexp matching, though I've honestly
>> never used it for the purpose sketched above...
>> 
>> Karl
> 


Seems to me trigram could be the answer since I have some decent results once I 
applied it, more to dig

Overall could it be that the optimizer blatantly ignores a scan index which is 
cheaper than a table scan, or jsonb implementation still has a long way to come 
up  or the way it is used in my case is not the one designed for ?


thanks
Armand






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb case insensitive search

2017-06-01 Thread Karl Czajkowski
On Jun 01, armand pirvu modulated:

> Overall could it be that the optimizer blatantly ignores a scan index which 
> is cheaper than a table scan, or jsonb implementation still has a long way to 
> come up  or the way it is used in my case is not the one designed for ?
> 

If I remember correctly, isn't a compound index always just using
btree?  In general, I have found better luck using several smaller
btree indices than one large compound one.  Unless your entire query
can be answered from an index-only lookup, the extra columns just
bloat the btree index.

So, you might as well use a simpler compound index for the regular
scalar row keys, and this index will be much smaller without the
baggage of the jsonb values at its leaves.  The planner can use the
jsonb from the actual candidate rows if it is going to have to visit
them anyway for other WHERE or SELECT clauses.

If the sparseness of your query is due to the content within the jsonb
values rather than the other scalar row keys, I think you'd need some
kind of GIN index over the contents of the jsonb documents to find the
small subset of candidate rows by these sparse criteria.  Trigram is
just one example of a GIN indexing scheme.

If your jsonb documents are "flat", i.e. just a bag of key value pairs
and not arbitrary nested jsonb structures, you might also explode them
into arrays of keys or values as separate indexed expressions?  Then,
you could GIN index the arrays and quickly find the subset of rows with
certain unusual keys or unusual values, but would still have to follow
up with a more exact check for the combination of key and value.


Karl



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb case insensitive search

2017-06-01 Thread Guyren Howe
On Jun 1, 2017, at 13:44 , Karl Czajkowski  wrote:
> If I remember correctly, isn't a compound index always just using
> btree?  In general, I have found better luck using several smaller
> btree indices than one large compound one.  Unless your entire query
> can be answered from an index-only lookup, the extra columns just
> bloat the btree index.

The issue is *much* more subtle than this. For repetitive queries, a compound 
index can be a great speedup.

Best discussion I’ve seen around index design is on the website 
use-the-index-luke.com.


Re: [GENERAL] jsonb and comparison operators

2014-09-02 Thread Peter Geoghegan
On Tue, Sep 2, 2014 at 9:38 PM, Joe Van Dyk  wrote:
> I want to return all rows that have a value of less than 10. I have
> arbitrary keys I want to check (not just 'a').


If you created an expression B-Tree index on 'a' it would work for
'a', but you'd have to use a jsonb literal, not a json/int4 literal.
If you want to be able to query every key at the top nesting level of
an object, such that all rows are returned with jsonbs that have
object values of which in each case one of them is, say, below 10,
then that's something that no existing opclass can support. But, why
should it be supported? That's a very fuzzy criteria to search on.

-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb and comparison operators

2014-09-02 Thread Joe Van Dyk
On Tue, Sep 2, 2014 at 9:55 PM, Peter Geoghegan  wrote:

> On Tue, Sep 2, 2014 at 9:38 PM, Joe Van Dyk  wrote:
> > I want to return all rows that have a value of less than 10. I have
> > arbitrary keys I want to check (not just 'a').
>
>
> If you created an expression B-Tree index on 'a' it would work for
> 'a', but you'd have to use a jsonb literal, not a json/int4 literal.
> If you want to be able to query every key at the top nesting level of
> an object, such that all rows are returned with jsonbs that have
> object values of which in each case one of them is, say, below 10,
>

Just a particular key specified in the query, not just any of them.

I may want key 'a' one time, and 'b' the next time. Not sure if that's what
you meant.

I figured since I could do equality, I should be able to do less than and
greater than.

Joe



> then that's something that no existing opclass can support. But, why
> should it be supported? That's a very fuzzy criteria to search on.
>
> --
> Regards,
> Peter Geoghegan
>


Re: [GENERAL] jsonb and comparison operators

2014-09-03 Thread Oleg Bartunov
jsquery (https://github.com/akorotkov/jsquery) should works for you.


On Wed, Sep 3, 2014 at 8:38 AM, Joe Van Dyk  wrote:

> Is it possible to get this query (or a similar one) to use an index?
>
> I want to return all rows that have a value of less than 10. I have
> arbitrary keys I want to check (not just 'a').
>
> drop table if exists test;
>
> create table test (j jsonb);
>
> insert into test select json_build_object('a', i)::jsonb from
> generate_series(1, 10) i;
> create index on test using gin(j);
>
> vacuum analyze test;
>
>
>
> select * from test where (j->>'a')::int < 10;
>
> I tried
> select * from test where j->'a' < 10::json::jsonb;
> but didn't seem to use the index.
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-28 Thread John R Pierce

On 11/28/2015 6:27 PM, Tom Smith wrote:
Is there a plan for 9.6 to resolve the issue of very slow 
query/retrieval of jsonb fields

when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of 
thousands and query/retrieve
field values,  the whole document has to be first decompressed and 
load to memory

before searching for the specific field key/value.


If it was my data, I'd be decomposing that large JSON thing into 
multiple SQL records, and storing as much stuff as possible in named SQL 
fields, using JSON in the database only for things that are too 
ambiguous for SQL.




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Bill Moran
On Sat, 28 Nov 2015 21:27:51 -0500
Tom Smith  wrote:
> 
> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
> of jsonb fields
> when there are large number (maybe several thousands) of top level keys.
> Currently, if I save a large json document with top level keys of thousands
> and query/retrieve
> field values,  the whole document has to be first decompressed and load to
> memory
> before searching for the specific field key/value.

I could be off-base here, but have you tried:

ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

?

The default storage for a JSONB field is EXTENDED. Switching it to
EXTERNAL will disable compression. You'll have to insert your data over
again, since this change doesn't alter any existing data, but see
if that change improves performance.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Francisco Olarte
Hi:
On Sun, Nov 29, 2015 at 1:09 PM, Bill Moran  wrote:
> On Sat, 28 Nov 2015 21:27:51 -0500
>> Currently, if I save a large json document with top level keys of thousands
** LARGE **
> The default storage for a JSONB field is EXTENDED. Switching it to
> EXTERNAL will disable compression. You'll have to insert your data over
> again, since this change doesn't alter any existing data, but see
> if that change improves performance.

Good recomendation, but see if it improves AND if it affects other
queries in the system. Turning off compresion CAN decrease the
eficiency  ( hit ratio ) of the shared buffers and the cache, IIRC (
but worth testing anyway ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Thomas Kellerer

Tom Smith schrieb am 29.11.2015 um 03:27:

Hello:

Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of 
jsonb fields
when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of thousands and 
query/retrieve
field values,  the whole document has to be first decompressed and load to 
memory
before searching for the specific field key/value.

Thanks in Advance


If you are concerned about the compression overhead, then why don't you use (or 
try) JSON instead?






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Hi, Thanks for everyone's response.

The issue is not just compression, but lack of "indexing" or "segmentation"
when a
single doc has, say 2000 top level keys (with multiple levels of subkeys).
 right now, if I query for one key,  the whole doc
has to be first uncompressed and loaded and then search for the single key.

Compared to traditional way of storing each top level key with a separate
column, this is huge overhead when table scan is required.  Some kind of
"keyed/slotted" storage for the doc could
help, (for illustration, all keys starting with 'A' would have its own
storage unit, so on,
so when I search for key  "A1" only that unit would be unpacked and
traversed to get :"A1" value". it is like postgresql predfine 26
columns/slots for the whole doc. an internal indexing
within each doc for fast retrieval of individual field values.

Someone mentioned a plan in roadmap for this route but I'd like to know if
it is in 9.6 plan.

below url mentions the similar issue. I am not sure if it has been
completely resolved.

http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase

below url mentions the potential issue.

https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/

Thanks



On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer  wrote:

> Tom Smith schrieb am 29.11.2015 um 03:27:
>
>> Hello:
>>
>> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
>> of jsonb fields
>> when there are large number (maybe several thousands) of top level keys.
>> Currently, if I save a large json document with top level keys of
>> thousands and query/retrieve
>> field values,  the whole document has to be first decompressed and load
>> to memory
>> before searching for the specific field key/value.
>>
>> Thanks in Advance
>>
>
> If you are concerned about the compression overhead, then why don't you
> use (or try) JSON instead?
>
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Unfortunately, the keys can not be predefined or fixed. it is a doc, the
reason jsonb
is used.  It works well for small docs with small number of keys.
but really slow with large number of keys. If this issue is resolved, I
think Postgresql
would be an absolutely superior choice over MongoDB.for document data.

On Sun, Nov 29, 2015 at 12:37 AM, John R Pierce  wrote:

> On 11/28/2015 6:27 PM, Tom Smith wrote:
>
>> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
>> of jsonb fields
>> when there are large number (maybe several thousands) of top level keys.
>> Currently, if I save a large json document with top level keys of
>> thousands and query/retrieve
>> field values,  the whole document has to be first decompressed and load
>> to memory
>> before searching for the specific field key/value.
>>
>
> If it was my data, I'd be decomposing that large JSON thing into multiple
> SQL records, and storing as much stuff as possible in named SQL fields,
> using JSON in the database only for things that are too ambiguous for SQL.
>
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Bill Moran
On Sun, 29 Nov 2015 08:24:12 -0500
Tom Smith  wrote:

> Hi, Thanks for everyone's response.
> 
> The issue is not just compression, but lack of "indexing" or "segmentation"
> when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
> 
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.

Sounds like you're pushing the limits of what JSONB is designed to do
(at this stage, at least). I'm not aware of any improvements in recent
versions (or head) that would do much to improve the situation, but I
don't track ever commit either. If you really need this improvement and
you're willing to wait for 9.6, then I suggest you check out the latest
git version and test on that to see if anything has been done.

I doubt you'll see much, though. As a thought experiment, the only way
I can think to improve this use case is to ditch the current TOAST
system and replace it with something that stores large JSON values in
a form optimized for indexed access. That's a pretty massive change
to some fairly core stuff just to optimize a single use-case of a
single data type. Not saying it won't happen ... in fact, all things
considered, it's pretty likely to happen at some point.

As far as a current solution: my solution would be to decompose the
JSON into an optimized table. I.e.:

CREATE TABLE store1 (
 id SERIAL PRIMARY KEY,
 data JSONB
);

CREATE TABLE store2 (
 id INT NOT NULL REFERENCES store1(id),
 top_level_key VARCHAR(1024),
 data JSONB,
 PRIMARY KEY(top_level_key, id)
);

You can then use a trigger to ensure that store2 is always in sync with
store1. Lookups can then use store2 and will be quite fast because of
the index. A lot of the design is conjectural: do you even still need
the data column on store1? Are there other useful indexes? etc. But,
hopefully the general idea is made clear.

This probably aren't the answers you want, but (to the best of my
knowledge) they're the best answers available at this time. I'd really
like to build the alternate TOAST storage, but I'm not in a position to
start on a project that ambitious right ... I'm not even really keeping
up with the project I'm currently supposed to be doing.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Arthur Silva
Is this correct? I'm fairly sure jsonb supports lazily parsing objects and
each object level is actually searched using binary search.
Em 29/11/2015 11:25 AM, "Tom Smith"  escreveu:

> Hi, Thanks for everyone's response.
>
> The issue is not just compression, but lack of "indexing" or
> "segmentation" when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
>
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.
>
> Someone mentioned a plan in roadmap for this route but I'd like to know if
> it is in 9.6 plan.
>
> below url mentions the similar issue. I am not sure if it has been
> completely resolved.
>
>
> http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase
>
> below url mentions the potential issue.
>
>
> https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/
>
> Thanks
>
>
>
> On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer 
> wrote:
>
>> Tom Smith schrieb am 29.11.2015 um 03:27:
>>
>>> Hello:
>>>
>>> Is there a plan for 9.6 to resolve the issue of very slow
>>> query/retrieval of jsonb fields
>>> when there are large number (maybe several thousands) of top level keys.
>>> Currently, if I save a large json document with top level keys of
>>> thousands and query/retrieve
>>> field values,  the whole document has to be first decompressed and load
>>> to memory
>>> before searching for the specific field key/value.
>>>
>>> Thanks in Advance
>>>
>>
>> If you are concerned about the compression overhead, then why don't you
>> use (or try) JSON instead?
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Lane
Bill Moran  writes:
> Tom Smith  wrote:
>> Is there a plan for 9.6 to resolve the issue of very slow
>> query/retrieval of jsonb fields when there are large number (maybe
>> several thousands) of top level keys.  Currently, if I save a large
>> json document with top level keys of thousands and query/retrieve field
>> values, the whole document has to be first decompressed and load to
>> memory before searching for the specific field key/value.

> I could be off-base here, but have you tried:
> ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

There is just about zero chance we'll ever worry about this for compressed
columns.  However, even the uncompressed case does currently involve
loading the whole column value, as Tom says.  We did consider the
possibility of such an optimization when designing the JSONB storage
format, but I don't know of anyone actively working on it.

In any case, it's unlikely that it'd ever be super fast, since it's
certainly going to involve at least a couple of TOAST fetches.
Personally I'd be looking for a different representation.  If there
are specific fields that are known to be needed a lot, maybe functional
indexes would help?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Hi:

The goal is fast retrieval of a a field value with a row when the row is
already
picked, one scenario is download a particular field value (if exists) of
all rows in the table.
It is actually a very common use case of exporting data of several  user
selected fields.
The performance is extremely slow.

Thanks




On Sun, Nov 29, 2015 at 12:54 PM, Tom Lane  wrote:

> Bill Moran  writes:
> > Tom Smith  wrote:
> >> Is there a plan for 9.6 to resolve the issue of very slow
> >> query/retrieval of jsonb fields when there are large number (maybe
> >> several thousands) of top level keys.  Currently, if I save a large
> >> json document with top level keys of thousands and query/retrieve field
> >> values, the whole document has to be first decompressed and load to
> >> memory before searching for the specific field key/value.
>
> > I could be off-base here, but have you tried:
> > ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;
>
> There is just about zero chance we'll ever worry about this for compressed
> columns.  However, even the uncompressed case does currently involve
> loading the whole column value, as Tom says.  We did consider the
> possibility of such an optimization when designing the JSONB storage
> format, but I don't know of anyone actively working on it.
>
> In any case, it's unlikely that it'd ever be super fast, since it's
> certainly going to involve at least a couple of TOAST fetches.
> Personally I'd be looking for a different representation.  If there
> are specific fields that are known to be needed a lot, maybe functional
> indexes would help?
>
> regards, tom lane
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-12-03 Thread Jim Nasby

On 11/29/15 9:30 AM, Arthur Silva wrote:

Is this correct? I'm fairly sure jsonb supports lazily parsing objects
and each object level is actually searched using binary search.


The problem is there's no support for loading just part of a TOASTed 
field. Even if that existed, we'd still need a way to know what byte 
position in the TOASTed field a key lived at.


It's possible to add all that, but I think it'd be a serious amount of work.

Since someone else was just wondering about storing more specific types 
in JSON, it might be more useful/interesting to devise a 
Postgres-specific way to store variable schema documents. That would 
give us a lot more flexibility over implementation than the JSON type 
ever will.


Or think about it this way: there's really no great reason why everyone 
chose JSON. There's tons of other serialization storage formats for 
other languages out there, so why not one specialized to Postgres? (And 
of course we'd want to be able to cast from that to JSON and back...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-11 Thread Tom Smith
Hi,

Congrats on the official release of 9.5

And I'd like bring up the issue again about if 9.6 would address the jsonb
performance issue
with large number of top level keys.
It is true that it does not have to use JSON format. it is about
serialization and fast retrieval
of dynamic tree structure objects. (at top level, it might be called
dynamic columns)
So if postgresql can have its own way, that would work out too as long as
it can have intuitive query
(like what are implemented for json and jsonb) and fast retrieval of a tree
like object,
it can be called no-sql data type. After all, most motivations of using
no-sql dbs like MongoDB
is about working with dynamic tree object.

If postgresql can have high performance on this, then many no-sql dbs would
become history.

Thanks








On Thu, Dec 3, 2015 at 5:31 PM, Jim Nasby  wrote:

> On 11/29/15 9:30 AM, Arthur Silva wrote:
>
>> Is this correct? I'm fairly sure jsonb supports lazily parsing objects
>> and each object level is actually searched using binary search.
>>
>
> The problem is there's no support for loading just part of a TOASTed
> field. Even if that existed, we'd still need a way to know what byte
> position in the TOASTed field a key lived at.
>
> It's possible to add all that, but I think it'd be a serious amount of
> work.
>
> Since someone else was just wondering about storing more specific types in
> JSON, it might be more useful/interesting to devise a Postgres-specific way
> to store variable schema documents. That would give us a lot more
> flexibility over implementation than the JSON type ever will.
>
> Or think about it this way: there's really no great reason why everyone
> chose JSON. There's tons of other serialization storage formats for other
> languages out there, so why not one specialized to Postgres? (And of course
> we'd want to be able to cast from that to JSON and back...)
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-19 Thread Bruce Momjian
On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
> 
> Congrats on the official release of 9.5
> 
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
> 
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this.  TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful.  JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB.  It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved.  I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-19 Thread Alvaro Herrera
Bill Moran wrote:

> As far as a current solution: my solution would be to decompose the
> JSON into an optimized table. I.e.:
> 
> CREATE TABLE store1 (
>  id SERIAL PRIMARY KEY,
>  data JSONB
> );
> 
> CREATE TABLE store2 (
>  id INT NOT NULL REFERENCES store1(id),
>  top_level_key VARCHAR(1024),
>  data JSONB,
>  PRIMARY KEY(top_level_key, id)
> );

Isn't this what ToroDB already does?
https://www.8kdata.com/torodb/

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-20 Thread Oleg Bartunov
On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian  wrote:

> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> > Hi,
> >
> > Congrats on the official release of 9.5
> >
> > And I'd like bring up the issue again about if 9.6 would address the
> jsonb
> > performance issue
> > with large number of top level keys.
> > It is true that it does not have to use JSON format. it is about
> serialization
> > and fast retrieval
> > of dynamic tree structure objects. (at top level, it might be called
> dynamic
> > columns)
> > So if postgresql can have its own way, that would work out too as long
> as it
> > can have intuitive query
> > (like what are implemented for json and jsonb) and fast retrieval of a
> tree
> > like object,
> > it can be called no-sql data type. After all, most motivations of using
> no-sql
> > dbs like MongoDB
> > is about working with dynamic tree object.
> >
> > If postgresql can have high performance on this, then many no-sql dbs
> would
> > become history.
>
> I can give you some backstory on this.  TOAST was designed in 2001 as a
> way to store, in a data-type-agnostic way, long strings compressed and
> any other long data type, e.g. long arrays.
>
> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
> unique case because it is one of the few types that can be processed
> without reading the entire value, e.g. it has an index.
>
> We are going to be hesitant to do something data-type-specific for
> JSONB.  It would be good if we could develop a data-type-agnostic
> approach to has TOAST can be improved.  I know of no such work for 9.6,
> and it is unlikely it will be done in time for 9.6.
>

I'm looking on this time to time.


>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-20 Thread Dorian Hoxha
Is there any database that actually supports what the original poster
wanted ?

The only thing that I know that's similar is bigtable/hbase/hypertable wide
column store.
The way it works is:
break the lexicographically sorted rows into blocks of compressed XXKB, and
then keeps an index on the start_key+end_key of each block.

This way we can store the index(that links to several toast values) on the
row and depending on which key you need it will get+decompress the required
block.
You can interpret nested values by using a separator on the key like
"first_level:2ndlevel:3rd_level:value".
If the index is too big, you can store the index itself in a toast value.

Note: I have no idea how to(if it can be) actually code this.

On Wed, Jan 20, 2016 at 9:32 AM, Oleg Bartunov  wrote:

>
>
> On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian  wrote:
>
>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>> > Hi,
>> >
>> > Congrats on the official release of 9.5
>> >
>> > And I'd like bring up the issue again about if 9.6 would address the
>> jsonb
>> > performance issue
>> > with large number of top level keys.
>> > It is true that it does not have to use JSON format. it is about
>> serialization
>> > and fast retrieval
>> > of dynamic tree structure objects. (at top level, it might be called
>> dynamic
>> > columns)
>> > So if postgresql can have its own way, that would work out too as long
>> as it
>> > can have intuitive query
>> > (like what are implemented for json and jsonb) and fast retrieval of a
>> tree
>> > like object,
>> > it can be called no-sql data type. After all, most motivations of using
>> no-sql
>> > dbs like MongoDB
>> > is about working with dynamic tree object.
>> >
>> > If postgresql can have high performance on this, then many no-sql dbs
>> would
>> > become history.
>>
>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>> way to store, in a data-type-agnostic way, long strings compressed and
>> any other long data type, e.g. long arrays.
>>
>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>> unique case because it is one of the few types that can be processed
>> without reading the entire value, e.g. it has an index.
>>
>> We are going to be hesitant to do something data-type-specific for
>> JSONB.  It would be good if we could develop a data-type-agnostic
>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>> and it is unlikely it will be done in time for 9.6.
>>
>
> I'm looking on this time to time.
>
>
>>
>> --
>>   Bruce Momjian  http://momjian.us
>>   EnterpriseDB http://enterprisedb.com
>>
>> + As you are, so once was I. As I am, so you will be. +
>> + Roman grave inscription +
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-20 Thread Bill Moran
On Tue, 19 Jan 2016 23:53:19 -0300
Alvaro Herrera  wrote:

> Bill Moran wrote:
> 
> > As far as a current solution: my solution would be to decompose the
> > JSON into an optimized table. I.e.:
> > 
> > CREATE TABLE store1 (
> >  id SERIAL PRIMARY KEY,
> >  data JSONB
> > );
> > 
> > CREATE TABLE store2 (
> >  id INT NOT NULL REFERENCES store1(id),
> >  top_level_key VARCHAR(1024),
> >  data JSONB,
> >  PRIMARY KEY(top_level_key, id)
> > );
> 
> Isn't this what ToroDB already does?
> https://www.8kdata.com/torodb/

Looks like. I wasn't aware of ToroDB, thanks for the link.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-02-08 Thread Tom Smith
Using JSON/JSONB type in postgresql is usually due to the use case that the
keys (top  level included) can not be predefined.   this is the major
difference between NoSQL/Document and RDBMS.

Why would TOAST have to be used?  Can some speciailly structured "raw"
files be used
outside current database files? and jsonb column value would be a pointer
to that file.


On Wed, Jan 20, 2016 at 7:32 AM, Bill Moran 
wrote:

> On Tue, 19 Jan 2016 23:53:19 -0300
> Alvaro Herrera  wrote:
>
> > Bill Moran wrote:
> >
> > > As far as a current solution: my solution would be to decompose the
> > > JSON into an optimized table. I.e.:
> > >
> > > CREATE TABLE store1 (
> > >  id SERIAL PRIMARY KEY,
> > >  data JSONB
> > > );
> > >
> > > CREATE TABLE store2 (
> > >  id INT NOT NULL REFERENCES store1(id),
> > >  top_level_key VARCHAR(1024),
> > >  data JSONB,
> > >  PRIMARY KEY(top_level_key, id)
> > > );
> >
> > Isn't this what ToroDB already does?
> > https://www.8kdata.com/torodb/
>
> Looks like. I wasn't aware of ToroDB, thanks for the link.
>
> --
> Bill Moran
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-02-08 Thread Álvaro Hernández Tortosa



On 20/01/16 13:32, Bill Moran wrote:

On Tue, 19 Jan 2016 23:53:19 -0300
Alvaro Herrera  wrote:


Bill Moran wrote:


As far as a current solution: my solution would be to decompose the
JSON into an optimized table. I.e.:

CREATE TABLE store1 (
  id SERIAL PRIMARY KEY,
  data JSONB
);

CREATE TABLE store2 (
  id INT NOT NULL REFERENCES store1(id),
  top_level_key VARCHAR(1024),
  data JSONB,
  PRIMARY KEY(top_level_key, id)
);

Isn't this what ToroDB already does?
https://www.8kdata.com/torodb/

Looks like. I wasn't aware of ToroDB, thanks for the link.



Hi Bill.

Effectively, that's what ToroDB does. You will have a dynamic 
schema, but automatically created for you. It will be a relational 
schema, no json/jsonb needed for your data.


Please read the FAQ: https://github.com/torodb/torodb/wiki/FAQ and 
let us know (https://groups.google.com/forum/#!forum/torodb-dev) if you 
would have any additional question.


Thanks, Álvaro, for the reference :)

Cheers,

Álvaro


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-04-30 Thread Tom Smith
Hello:

I'd like to bring this JSONB performance issue again.
Below is a link of MySQL way of storing/retrieving Json key/value

https://dev.mysql.com/doc/refman/5.7/en/json.html

Instead of providing column indexing(like GIN for JSONB in Postgresql).
it provides only internal data structure level indexing within each
individual json object
for fast retrieval.  compression is not used.

Perhaps without implementing  complicated column level GIN indexing,
implementing
a new variant JSON type that only handle  individual json object indexing
would be
feasible?  Combined with current JSONB implementation,   both common use
cases
(one is global doc indexing, the other is fast retrieval of individual
values)
would work out and make postgresql unbeatable.









On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian  wrote:

> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> > Hi,
> >
> > Congrats on the official release of 9.5
> >
> > And I'd like bring up the issue again about if 9.6 would address the
> jsonb
> > performance issue
> > with large number of top level keys.
> > It is true that it does not have to use JSON format. it is about
> serialization
> > and fast retrieval
> > of dynamic tree structure objects. (at top level, it might be called
> dynamic
> > columns)
> > So if postgresql can have its own way, that would work out too as long
> as it
> > can have intuitive query
> > (like what are implemented for json and jsonb) and fast retrieval of a
> tree
> > like object,
> > it can be called no-sql data type. After all, most motivations of using
> no-sql
> > dbs like MongoDB
> > is about working with dynamic tree object.
> >
> > If postgresql can have high performance on this, then many no-sql dbs
> would
> > become history.
>
> I can give you some backstory on this.  TOAST was designed in 2001 as a
> way to store, in a data-type-agnostic way, long strings compressed and
> any other long data type, e.g. long arrays.
>
> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
> unique case because it is one of the few types that can be processed
> without reading the entire value, e.g. it has an index.
>
> We are going to be hesitant to do something data-type-specific for
> JSONB.  It would be good if we could develop a data-type-agnostic
> approach to has TOAST can be improved.  I know of no such work for 9.6,
> and it is unlikely it will be done in time for 9.6.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-05-01 Thread Oleg Bartunov
On Sun, May 1, 2016 at 6:46 AM, Tom Smith  wrote:

> Hello:
>
> I'd like to bring this JSONB performance issue again.
> Below is a link of MySQL way of storing/retrieving Json key/value
>
> https://dev.mysql.com/doc/refman/5.7/en/json.html
>
> Instead of providing column indexing(like GIN for JSONB in Postgresql).
> it provides only internal data structure level indexing within each
> individual json object
> for fast retrieval.  compression is not used.
>
> Perhaps without implementing  complicated column level GIN indexing,
> implementing
> a new variant JSON type that only handle  individual json object indexing
> would be
> feasible?  Combined with current JSONB implementation,   both common use
> cases
> (one is global doc indexing, the other is fast retrieval of individual
> values)
> would work out and make postgresql unbeatable.
>

It's called expression index ?


>
>
>
>
>
>
>
>
>
> On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian  wrote:
>
>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>> > Hi,
>> >
>> > Congrats on the official release of 9.5
>> >
>> > And I'd like bring up the issue again about if 9.6 would address the
>> jsonb
>> > performance issue
>> > with large number of top level keys.
>> > It is true that it does not have to use JSON format. it is about
>> serialization
>> > and fast retrieval
>> > of dynamic tree structure objects. (at top level, it might be called
>> dynamic
>> > columns)
>> > So if postgresql can have its own way, that would work out too as long
>> as it
>> > can have intuitive query
>> > (like what are implemented for json and jsonb) and fast retrieval of a
>> tree
>> > like object,
>> > it can be called no-sql data type. After all, most motivations of using
>> no-sql
>> > dbs like MongoDB
>> > is about working with dynamic tree object.
>> >
>> > If postgresql can have high performance on this, then many no-sql dbs
>> would
>> > become history.
>>
>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>> way to store, in a data-type-agnostic way, long strings compressed and
>> any other long data type, e.g. long arrays.
>>
>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>> unique case because it is one of the few types that can be processed
>> without reading the entire value, e.g. it has an index.
>>
>> We are going to be hesitant to do something data-type-specific for
>> JSONB.  It would be good if we could develop a data-type-agnostic
>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>> and it is unlikely it will be done in time for 9.6.
>>
>> --
>>   Bruce Momjian  http://momjian.us
>>   EnterpriseDB http://enterprisedb.com
>>
>> + As you are, so once was I. As I am, so you will be. +
>> + Roman grave inscription +
>>
>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-05-01 Thread Tom Smith
No, it is within the individual json object storage. In a way, it would be
part of query plan,
but strictly for the individual json object storage structure, it is not
necessarily an "index"
one possible(but primitive) implementation could be like having multiple
"segments" in the storage,
all keys starting with "a"  is in first segment, etc.

On Sun, May 1, 2016 at 4:14 PM, Oleg Bartunov  wrote:

>
>
> On Sun, May 1, 2016 at 6:46 AM, Tom Smith 
> wrote:
>
>> Hello:
>>
>> I'd like to bring this JSONB performance issue again.
>> Below is a link of MySQL way of storing/retrieving Json key/value
>>
>> https://dev.mysql.com/doc/refman/5.7/en/json.html
>>
>> Instead of providing column indexing(like GIN for JSONB in Postgresql).
>> it provides only internal data structure level indexing within each
>> individual json object
>> for fast retrieval.  compression is not used.
>>
>> Perhaps without implementing  complicated column level GIN indexing,
>> implementing
>> a new variant JSON type that only handle  individual json object indexing
>> would be
>> feasible?  Combined with current JSONB implementation,   both common use
>> cases
>> (one is global doc indexing, the other is fast retrieval of individual
>> values)
>> would work out and make postgresql unbeatable.
>>
>
> It's called expression index ?
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian  wrote:
>>
>>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>>> > Hi,
>>> >
>>> > Congrats on the official release of 9.5
>>> >
>>> > And I'd like bring up the issue again about if 9.6 would address the
>>> jsonb
>>> > performance issue
>>> > with large number of top level keys.
>>> > It is true that it does not have to use JSON format. it is about
>>> serialization
>>> > and fast retrieval
>>> > of dynamic tree structure objects. (at top level, it might be called
>>> dynamic
>>> > columns)
>>> > So if postgresql can have its own way, that would work out too as long
>>> as it
>>> > can have intuitive query
>>> > (like what are implemented for json and jsonb) and fast retrieval of a
>>> tree
>>> > like object,
>>> > it can be called no-sql data type. After all, most motivations of
>>> using no-sql
>>> > dbs like MongoDB
>>> > is about working with dynamic tree object.
>>> >
>>> > If postgresql can have high performance on this, then many no-sql dbs
>>> would
>>> > become history.
>>>
>>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>>> way to store, in a data-type-agnostic way, long strings compressed and
>>> any other long data type, e.g. long arrays.
>>>
>>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>>> unique case because it is one of the few types that can be processed
>>> without reading the entire value, e.g. it has an index.
>>>
>>> We are going to be hesitant to do something data-type-specific for
>>> JSONB.  It would be good if we could develop a data-type-agnostic
>>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>>> and it is unlikely it will be done in time for 9.6.
>>>
>>> --
>>>   Bruce Momjian  http://momjian.us
>>>   EnterpriseDB http://enterprisedb.com
>>>
>>> + As you are, so once was I. As I am, so you will be. +
>>> + Roman grave inscription +
>>>
>>
>>
>


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread John R Pierce

On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:


Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.


can you show us an example of this?


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Ilya I. Ashchepkov
I'm sorry about sending email several times. I haven't understand, was it
sent by gmail or not.


On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce  wrote:

> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>
>>
>> Is spaces is necessary in text presentation of JSONB?
>> In my data resulting text contains ~12% of spaces.
>>
>
> can you show us an example of this?
>

One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
"runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
"gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
{"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
"2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
"altitude": 143, "latitude": 55.127888997395836, "longitude":
80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
"receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}

Whitespacis percents in this record:
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
  ?column?
-
 12.3417721518987342

Whitespace in test data
 # select count(*),avg(array_length(regexp_split_to_array(data::text, text
' '), 1)*100./length(data::text)) from events.data ;
 count  | avg
+-
 24 | 12.3649234646118312



>
> --
> john r pierce  37N 122W
> somewhere on the middle of the left coast
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
С уважением,
Ащепков Илья koc...@gmail.com


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Merlin Moncure
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov  wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce  wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>   ?column?
> -
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  | avg
> +-
>  24 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Adrian Klaver

On 09/24/2014 12:44 AM, Ilya I. Ashchepkov wrote:

I'm sorry about sending email several times. I haven't understand, was
it sent by gmail or not.


On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce mailto:pie...@hogranch.com>> wrote:

On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:


Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.


can you show us an example of this?


One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
74, "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
"gps": 1, "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used":
19, "speed": 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no":
256, "digital": {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0},
"out": {"1": 0, "2": 0}}, "visible": 20, "ignition": 1, "location":
{"course": 265, "altitude": 143, "latitude": 55.127888997395836,
"longitude": 80.8046142578125}, "protocol": 4, "coldstart": 1,
"timesource": "terminal", "receiver_on": 1, "external_power": 28.07,
"internal_power": 4.19}

Whitespacis percents in this record:
# select array_length(regexp_split_to_array(data::text, text ' '),
1)*100./length(data::text) from events.data limit 1;
   ?column?
-
  12.3417721518987342

Whitespace in test data
  # select count(*),avg(array_length(regexp_split_to_array(data::text,
text ' '), 1)*100./length(data::text)) from events.data ;
  count  | avg
+-
  24 | 12.3649234646118312



The only thing I can of is to use json not jsonb. Modified example taken 
from docs:


http://www.postgresql.org/docs/9.4/static/datatype-json.html

test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::jsonb;
  jsonb
--
 {"bar": "baz", "active": false, "balance": 7.77}



test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::json;
json
-
 {"bar":"baz","balance":7.77,"active":false}


json will return exactly what was put in.


--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Seref Arikan
This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong to
expect binary encoding to decrease size and saw that I'm not alone (which
still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref


On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure  wrote:

> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov 
> wrote:
> > I'm sorry about sending email several times. I haven't understand, was it
> > sent by gmail or not.
> >
> >
> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce 
> wrote:
> >>
> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
> >>>
> >>>
> >>> Is spaces is necessary in text presentation of JSONB?
> >>> In my data resulting text contains ~12% of spaces.
> >>
> >>
> >> can you show us an example of this?
> >
> >
> > One record
> > # select data from events.data limit 1;
> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
> 1,
> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
> "digital":
> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
> "terminal",
> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
> >
> > Whitespacis percents in this record:
> > # select array_length(regexp_split_to_array(data::text, text ' '),
> > 1)*100./length(data::text) from events.data limit 1;
> >   ?column?
> > -
> >  12.3417721518987342
> >
> > Whitespace in test data
> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
> text '
> > '), 1)*100./length(data::text)) from events.data ;
> >  count  | avg
> > +-
> >  24 | 12.3649234646118312
>
>
> For jsonb (unlike json), data is not actually stored as json but in a
> binary format.  It will generally be much larger than the text
> representation in fact but in exchange for that many operations will
> be faster.  The spaces you see are generated when the jsonb type is
> converted to text for output.  I actually think it's pretty reasonable
> to want to redact all spaces from such objects in all cases where
> converstion to text happens (output functions, xxxto_json, etc)
> because ~12% savings are nothing to sneeze at when moving large
> documents in and out of the database.
>
> On the flip side, a more verbose prettification would be pretty nice
> too.  I wonder if a hypothetical GUC is the best way to control this
> behavior...
>
> merlin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Adrian Klaver

On 09/24/2014 07:22 AM, Seref Arikan wrote:

This is interesting. Most binary encoding methods I use produce smaller
files than the text files for the same content.
Having read your mail, I've realized that I have no reason to accept the
same from the jsonb. I did a quick google search to see if it is wrong
to expect binary encoding to decrease size and saw that I'm not alone
(which still does not mean I'm being reasonable).
This project: http://ubjson.org/#size is one of the hits which mentions
some nice space gains thanks to binary encoding.

The "much larger" part is a bit scary. Is this documented somewhere?


I believe Merlin is referring to the issue in this thread:

http://www.postgresql.org/message-id/27839.1407467...@sss.pgh.pa.us



Best regards
Seref




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Ilya I. Ashchepkov
IMHO, prettification is useful only for debugging.
It would be nice to have a session variable for the debug output with
spaces, new lines and indentation.

On Wed, Sep 24, 2014 at 8:44 PM, Merlin Moncure  wrote:

> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov 
> wrote:
> > I'm sorry about sending email several times. I haven't understand, was it
> > sent by gmail or not.
> >
> >
> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce 
> wrote:
> >>
> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
> >>>
> >>>
> >>> Is spaces is necessary in text presentation of JSONB?
> >>> In my data resulting text contains ~12% of spaces.
> >>
> >>
> >> can you show us an example of this?
> >
> >
> > One record
> > # select data from events.data limit 1;
> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
> 1,
> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
> "digital":
> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
> "terminal",
> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
> >
> > Whitespacis percents in this record:
> > # select array_length(regexp_split_to_array(data::text, text ' '),
> > 1)*100./length(data::text) from events.data limit 1;
> >   ?column?
> > -
> >  12.3417721518987342
> >
> > Whitespace in test data
> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
> text '
> > '), 1)*100./length(data::text)) from events.data ;
> >  count  | avg
> > +-
> >  24 | 12.3649234646118312
>
>
> For jsonb (unlike json), data is not actually stored as json but in a
> binary format.  It will generally be much larger than the text
> representation in fact but in exchange for that many operations will
> be faster.  The spaces you see are generated when the jsonb type is
> converted to text for output.  I actually think it's pretty reasonable
> to want to redact all spaces from such objects in all cases where
> converstion to text happens (output functions, xxxto_json, etc)
> because ~12% savings are nothing to sneeze at when moving large
> documents in and out of the database.
>
> On the flip side, a more verbose prettification would be pretty nice
> too.  I wonder if a hypothetical GUC is the best way to control this
> behavior...
>
> merlin
>



-- 
С уважением,
Ащепков Илья koc...@gmail.com


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Ilya I. Ashchepkov
With the same data:

# create cast (jsonb as bytea) without function;
# select
sum(length(data::text))::float/sum(octet_length((data::jsonb)::bytea)) from
data.packets;
 ?column?
---
 0.630663654967513

and 0.554666142734544 without spaces

On Wed, Sep 24, 2014 at 9:22 PM, Seref Arikan  wrote:

> This is interesting. Most binary encoding methods I use produce smaller
> files than the text files for the same content.
> Having read your mail, I've realized that I have no reason to accept the
> same from the jsonb. I did a quick google search to see if it is wrong to
> expect binary encoding to decrease size and saw that I'm not alone (which
> still does not mean I'm being reasonable).
> This project: http://ubjson.org/#size is one of the hits which mentions
> some nice space gains thanks to binary encoding.
>
> The "much larger" part is a bit scary. Is this documented somewhere?
>
> Best regards
> Seref
>
>
> On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure 
> wrote:
>
>> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov 
>> wrote:
>> > I'm sorry about sending email several times. I haven't understand, was
>> it
>> > sent by gmail or not.
>> >
>> >
>> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce 
>> wrote:
>> >>
>> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>> >>>
>> >>>
>> >>> Is spaces is necessary in text presentation of JSONB?
>> >>> In my data resulting text contains ~12% of spaces.
>> >>
>> >>
>> >> can you show us an example of this?
>> >
>> >
>> > One record
>> > # select data from events.data limit 1;
>> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
>> 74,
>> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
>> 1,
>> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
>> "speed":
>> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
>> "digital":
>> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
>> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
>> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
>> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
>> "terminal",
>> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>> >
>> > Whitespacis percents in this record:
>> > # select array_length(regexp_split_to_array(data::text, text ' '),
>> > 1)*100./length(data::text) from events.data limit 1;
>> >   ?column?
>> > -
>> >  12.3417721518987342
>> >
>> > Whitespace in test data
>> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
>> text '
>> > '), 1)*100./length(data::text)) from events.data ;
>> >  count  | avg
>> > +-
>> >  24 | 12.3649234646118312
>>
>>
>> For jsonb (unlike json), data is not actually stored as json but in a
>> binary format.  It will generally be much larger than the text
>> representation in fact but in exchange for that many operations will
>> be faster.  The spaces you see are generated when the jsonb type is
>> converted to text for output.  I actually think it's pretty reasonable
>> to want to redact all spaces from such objects in all cases where
>> converstion to text happens (output functions, xxxto_json, etc)
>> because ~12% savings are nothing to sneeze at when moving large
>> documents in and out of the database.
>>
>> On the flip side, a more verbose prettification would be pretty nice
>> too.  I wonder if a hypothetical GUC is the best way to control this
>> behavior...
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


-- 
С уважением,
Ащепков Илья koc...@gmail.com


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Oleg Bartunov
Check slides 17-20 of
http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to
understand, what 'binary format' means. The slides describes binary storage
for nested hstore, not jsonb, but you'll get the idea.

On Wed, Sep 24, 2014 at 6:22 PM, Seref Arikan  wrote:

> This is interesting. Most binary encoding methods I use produce smaller
> files than the text files for the same content.
> Having read your mail, I've realized that I have no reason to accept the
> same from the jsonb. I did a quick google search to see if it is wrong to
> expect binary encoding to decrease size and saw that I'm not alone (which
> still does not mean I'm being reasonable).
> This project: http://ubjson.org/#size is one of the hits which mentions
> some nice space gains thanks to binary encoding.
>
> The "much larger" part is a bit scary. Is this documented somewhere?
>
> Best regards
> Seref
>
>
> On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure 
> wrote:
>
>> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov 
>> wrote:
>> > I'm sorry about sending email several times. I haven't understand, was
>> it
>> > sent by gmail or not.
>> >
>> >
>> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce 
>> wrote:
>> >>
>> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>> >>>
>> >>>
>> >>> Is spaces is necessary in text presentation of JSONB?
>> >>> In my data resulting text contains ~12% of spaces.
>> >>
>> >>
>> >> can you show us an example of this?
>> >
>> >
>> > One record
>> > # select data from events.data limit 1;
>> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
>> 74,
>> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
>> 1,
>> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
>> "speed":
>> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
>> "digital":
>> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
>> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
>> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
>> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
>> "terminal",
>> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>> >
>> > Whitespacis percents in this record:
>> > # select array_length(regexp_split_to_array(data::text, text ' '),
>> > 1)*100./length(data::text) from events.data limit 1;
>> >   ?column?
>> > -
>> >  12.3417721518987342
>> >
>> > Whitespace in test data
>> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
>> text '
>> > '), 1)*100./length(data::text)) from events.data ;
>> >  count  | avg
>> > +-
>> >  24 | 12.3649234646118312
>>
>>
>> For jsonb (unlike json), data is not actually stored as json but in a
>> binary format.  It will generally be much larger than the text
>> representation in fact but in exchange for that many operations will
>> be faster.  The spaces you see are generated when the jsonb type is
>> converted to text for output.  I actually think it's pretty reasonable
>> to want to redact all spaces from such objects in all cases where
>> converstion to text happens (output functions, xxxto_json, etc)
>> because ~12% savings are nothing to sneeze at when moving large
>> documents in and out of the database.
>>
>> On the flip side, a more verbose prettification would be pretty nice
>> too.  I wonder if a hypothetical GUC is the best way to control this
>> behavior...
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread John R Pierce

On 9/24/2014 7:22 AM, Seref Arikan wrote:
This is interesting. Most binary encoding methods I use produce 
smaller files than the text files for the same content. 


'1'   vs INTEGER 1 ... 1 byte vs 4 bytes.

now add metadata necessary to represent the original json structure.



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB spaces in text presentation

2014-09-24 Thread Seref Arikan
Thanks Oleg, I'll check the slides.


On Wed, Sep 24, 2014 at 8:07 PM, Oleg Bartunov  wrote:

> Check slides 17-20 of
> http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to
> understand, what 'binary format' means. The slides describes binary storage
> for nested hstore, not jsonb, but you'll get the idea.
>
> On Wed, Sep 24, 2014 at 6:22 PM, Seref Arikan 
> wrote:
>
>> This is interesting. Most binary encoding methods I use produce smaller
>> files than the text files for the same content.
>> Having read your mail, I've realized that I have no reason to accept the
>> same from the jsonb. I did a quick google search to see if it is wrong to
>> expect binary encoding to decrease size and saw that I'm not alone (which
>> still does not mean I'm being reasonable).
>> This project: http://ubjson.org/#size is one of the hits which mentions
>> some nice space gains thanks to binary encoding.
>>
>> The "much larger" part is a bit scary. Is this documented somewhere?
>>
>> Best regards
>> Seref
>>
>>
>> On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure 
>> wrote:
>>
>>> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov 
>>> wrote:
>>> > I'm sorry about sending email several times. I haven't understand, was
>>> it
>>> > sent by gmail or not.
>>> >
>>> >
>>> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce 
>>> wrote:
>>> >>
>>> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>> >>>
>>> >>>
>>> >>> Is spaces is necessary in text presentation of JSONB?
>>> >>> In my data resulting text contains ~12% of spaces.
>>> >>
>>> >>
>>> >> can you show us an example of this?
>>> >
>>> >
>>> > One record
>>> > # select data from events.data limit 1;
>>> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
>>> 74,
>>> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
>>> "gps": 1,
>>> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
>>> "speed":
>>> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
>>> "digital":
>>> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1":
>>> 0,
>>> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
>>> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
>>> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
>>> "terminal",
>>> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>>> >
>>> > Whitespacis percents in this record:
>>> > # select array_length(regexp_split_to_array(data::text, text ' '),
>>> > 1)*100./length(data::text) from events.data limit 1;
>>> >   ?column?
>>> > -
>>> >  12.3417721518987342
>>> >
>>> > Whitespace in test data
>>> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
>>> text '
>>> > '), 1)*100./length(data::text)) from events.data ;
>>> >  count  | avg
>>> > +-
>>> >  24 | 12.3649234646118312
>>>
>>>
>>> For jsonb (unlike json), data is not actually stored as json but in a
>>> binary format.  It will generally be much larger than the text
>>> representation in fact but in exchange for that many operations will
>>> be faster.  The spaces you see are generated when the jsonb type is
>>> converted to text for output.  I actually think it's pretty reasonable
>>> to want to redact all spaces from such objects in all cases where
>>> converstion to text happens (output functions, xxxto_json, etc)
>>> because ~12% savings are nothing to sneeze at when moving large
>>> documents in and out of the database.
>>>
>>> On the flip side, a more verbose prettification would be pretty nice
>>> too.  I wonder if a hypothetical GUC is the best way to control this
>>> behavior...
>>>
>>> merlin
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>
>>
>


Re: [GENERAL] JsonB Gin Index is very large; is there a work around?

2014-08-20 Thread Peter Geoghegan
On Wed, Aug 20, 2014 at 1:53 PM, Larry White  wrote:
> Is there anyway to index a subset of the data in a JSONB column? I'm
> thinking of something like declaring certain paths to be indexed?


Yes. See the expression index example in the jsonb documentation.

-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JsonB Gin Index is very large; is there a work around?

2014-08-20 Thread Larry White
Ok Thank you.

FWIW, the documents (which I can't share) consist mainly of a long list of
integers in the form {"n":"41515920318427252715"}, so they really are
outliers.


On Wed, Aug 20, 2014 at 5:09 PM, Peter Geoghegan <
peter.geoghega...@gmail.com> wrote:

> On Wed, Aug 20, 2014 at 1:53 PM, Larry White  wrote:
> > Is there anyway to index a subset of the data in a JSONB column? I'm
> > thinking of something like declaring certain paths to be indexed?
>
>
> Yes. See the expression index example in the jsonb documentation.
>
> --
> Regards,
> Peter Geoghegan
>