Re: [PERFORM] array size exceeds the maximum allowed (1073741823) when building a json

2016-06-09 Thread Merlin Moncure
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus  wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
 Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] array size exceeds the maximum allowed (1073741823) when building a json

2016-06-09 Thread Nicolas Paris
2016-06-09 15:31 GMT+02:00 Merlin Moncure :

> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane  wrote:
> > Michael Paquier  writes:
> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus  wrote:
> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>  Will this 1GO restriction is supposed to increase in a near future ?
> >
> >>> Not planned, no.  Thing is, that's the limit for a field in general,
> not
> >>> just JSON; changing it would be a fairly large patch.  It's desireable,
> >>> but AFAIK nobody is working on it.
> >
> >> And there are other things to consider on top of that, like the
> >> maximum allocation size for palloc, the maximum query string size,
> >> COPY, etc. This is no small project, and the potential side-effects
> >> should not be underestimated.
> >
> > It's also fair to doubt that client-side code would "just work" with
> > no functionality or performance problems for such large values.
> >
> > I await with interest the OP's results on other JSON processors that
> > have no issues with GB-sized JSON strings.
>
> Yup.  Most json libraries and tools are going to be disgusting memory
> hogs or have exponential behaviors especially when you consider you
> are doing the transformation as well.  Just prettifying json documents
> over 1GB can be a real challenge.
>
> Fortunately the workaround here is pretty easy.  Keep your query
> exactly as is but remove the final aggregation step so that it returns
> a set. Next, make a small application that runs this query and does
> the array bits around each row (basically prepending the final result
> with [ appending the final result with ] and putting , between rows).
>

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the
double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

​


> It's essential that you use a client library that does not buffer the
> entire result in memory before emitting results.   This can be done in
> psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
> node.js pg module can do this as well, and there certainty will be
> others.
>
> The basic objective is you want the rows to be streamed out of the
> database without being buffered.  If you do that, you should be able
> to stream arbitrarily large datasets out of the database to a json
> document assuming the server can produce the query.
>
> merlin
>


Re: [PERFORM] slony rpm help slony1-95-2.2.2-1.rhel6.x86_64

2016-06-09 Thread Glyn Astill
> From: avi Singh 
>To: pgsql-performance@postgresql.org 
>Sent: Saturday, 4 June 2016, 0:03
>Subject: [PERFORM] slony rpm help slony1-95-2.2.2-1.rhel6.x86_64
> 
>
>
>Hi All
> Can anyone please point me to location from where i can get slony 
> slony1-95-2.2.2-1.rhel5.x86_64  rpm. I'm upgrading database from version 9.3 
> to 9.5. Current version of rpm we are using is  slony1-93-2.2.2-1.el5.x86_64 
> and the one that is available on postgresql website for 9.5 is 
> slony1-95-2.2.4-4.rhel5.x86_64  which is not compatible and throws an error 
> when i test the upgrade.  In the past i was able to find the 2.2.2-1 version 
> rpm for previous versions on postgres website but not this time for 
> postgresql 9.5
>
>


What you'd be better off doing is installing Slony 2.2.4 on all your servers 
(or better a 2.2.5) rather than trying to get the older version.  If you can't 
get a package you could compile Slony yourself.


The not compatible error you mention is most likely because you've failed to 
update the Slony functions.  See:


http://slony.info/documentation/2.2/stmtupdatefunctions.html

Glyn


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of LIKE/NOT LIKE when used in single query

2016-06-09 Thread Jeff Janes
On Tue, Jun 7, 2016 at 9:57 PM, Ed Felstein  wrote:
> Hello,
> First time poster here.  Bear with me.
> Using PostgreSQL 9.5
> I have a situation where I have a LIKE and a NOT LIKE in the same query to
> identify strings in a varchar field.  Since I am using wildcards, I have
> created a GIN index on the field in question, which makes LIKE '%%'
> searches run very fast.  The problem is the NOT LIKE phrases, which (as
> would be expected) force a sequential scan.  Being that we're talking about
> millions of records, this is not desirable.
> Here's the question...
> Is there a way, using a single query, to emulate the process of running the
> LIKE part first, then running the NOT LIKE just on those results?

Just do it.  In my hands, the planner is smart enough to figure it out
for itself.

explain analyze select * from stuff where synonym like '%BAT%' and
synonym not like '%col not like%' ;

   QUERY PLAN
-
 Bitmap Heap Scan on stuff  (cost=16.10..63.08 rows=13 width=14)
(actual time=9.465..10.642 rows=23 loops=1)
   Recheck Cond: (synonym ~~ '%BAT%'::text)
   Rows Removed by Index Recheck: 76
   Filter: (synonym !~~ '%col not like%'::text)
   Heap Blocks: exact=57
   ->  Bitmap Index Scan on integrity_synonym_synonym_idx
(cost=0.00..16.10 rows=13 width=0) (actual time=8.847..8.847 rows=99
loops=1)
 Index Cond: (synonym ~~ '%BAT%'::text)
 Planning time: 18.261 ms
 Execution time: 10.932 ms


So it is using the index for the positive match, and filtering those
results for the negative match, just as you wanted.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] array size exceeds the maximum allowed (1073741823) when building a json

2016-06-09 Thread Merlin Moncure
On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris  wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure :
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane  wrote:
>> > Michael Paquier  writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus  wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>  Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no.  Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch.  It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup.  Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well.  Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy.  Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

2016-06-09 Thread Andres Freund
Hi,

On 2016-06-02 14:18:26 +0300, Антон Бушмелев wrote:
> UP. repeat tests on local vm.. reults are discouraging
> OSPG  TPS AVG latency
> Centos 7  9.5.3   23.711023   168.421
> Centos 7  9.5.3   26.609271   150.188
> Centos 7  9.5.3   25.220044   158.416
> Centos 7  9.5.3   25.598977   156.047
> Centos 7  9.4.8   278.572191  14.077
> Centos 7  9.4.8   247.237755  16.177
> Centos 7  9.4.8   240.007524  16.276
> Centos 7  9.4.8   237.862238  16.596

Could you provide profiles on 9.4 and 9.5?  Which kernel did you have
enabled? Is /proc/sys/kernel/sched_autogroup_enabled 1 or 0?

Regards,

Andres


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance