Re: [PERFORM] query performance issue

2017-11-15 Thread Pavel Stehule
2017-11-15 20:58 GMT+01:00 Gunther <r...@gusw.net>:

>
> On 11/15/2017 8:12, Pavel Stehule wrote:
>
> There is wrong plan due wrong estimation
>
> for this query you should to penalize nested loop
>
> set enable_nestloop to off;
>
> before evaluation of this query
>
>
> You are not the only one with this issue. May I suggest to look at this
> thread a little earlier this month.
>
> http://www.postgresql-archive.org/OLAP-reporting-queries-
> fall-into-nested-loops-over-seq-scans-or-other-horrible-
> planner-choices-tp5990160.html
>
> where this has been discussed in some length.
>

It is typical issue. The source of these problems are correlations between
columns (it can be fixed partially by multicolumn statistics in PostgreSQL
10). Another problem is missing multi table statistics - PostgreSQL planner
expects so any value from dictionary has same probability, what is not
usually true. Some OLAP techniques like calendar tables has usually very
bad impact on estimations with this results.

Regards

Pavel


> regards,
> -Gunther
>
>
>


Re: [PERFORM] query performance issue

2017-11-15 Thread Pavel Stehule
2017-11-15 13:54 GMT+01:00 Samir Magar <samirmag...@gmail.com>:

> please find the EXPLAIN ANALYZE output.
>
> On Wed, Nov 15, 2017 at 3:13 PM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
>
>> Hi
>>
>> please send EXPLAIN ANALYZE output.
>>
>> Regards
>>
>> Pavel
>>
>> 2017-11-15 10:33 GMT+01:00 Samir Magar <samirmag...@gmail.com>:
>>
>>> Hello,
>>> I am having performance issues with one of the query.
>>> The query is taking 39 min to fetch 3.5 mil records.
>>>
>>> I want to reduce that time to 15 mins.
>>> could you please suggest something to its performance?
>>>
>>> server configuration:
>>>  CPUs = 4
>>> memory = 16 GM
>>> shared_buffers = 3 GB
>>> work_mem = 100MB
>>> effective_cache_size = 12 GB
>>>
>>> we are doing the vacuum/analyze regularly on the database.
>>>
>>> attached is the query with its explain plan.
>>>
>>>

There is wrong plan due wrong estimation

for this query you should to penalize nested loop

set enable_nestloop to off;

before evaluation of this query


Thanks,
>>> Samir Magar
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.
>>> org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>>
>>
>


Re: [PERFORM] query performance issue

2017-11-15 Thread Pavel Stehule
Hi

please send EXPLAIN ANALYZE output.

Regards

Pavel

2017-11-15 10:33 GMT+01:00 Samir Magar :

> Hello,
> I am having performance issues with one of the query.
> The query is taking 39 min to fetch 3.5 mil records.
>
> I want to reduce that time to 15 mins.
> could you please suggest something to its performance?
>
> server configuration:
>  CPUs = 4
> memory = 16 GM
> shared_buffers = 3 GB
> work_mem = 100MB
> effective_cache_size = 12 GB
>
> we are doing the vacuum/analyze regularly on the database.
>
> attached is the query with its explain plan.
>
> Thanks,
> Samir Magar
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>


Re: [PERFORM] Stored Procedure Performance

2017-10-11 Thread Pavel Stehule
2017-10-11 18:52 GMT+02:00 Purav Chovatia :

> Yes, there is some code to catch exceptions like unique constraint
> violation and no data found. Do you suggest we trying by commenting that
> part? btw, the dataset is a controlled one, so what I can confirm is we are
> not hitting any exceptions.
>

If it is possible, don't do it in cycle, or use exception handling only
when it is necessary, not from pleasure.

Regards

Pavel


> Thanks
>
> On 11 October 2017 at 22:07, Adam Brusselback 
> wrote:
>
>> Is there any error handling in there?  I remember seeing performance
>> issues if you put in any code to catch exceptions.
>>
>
>


Re: [PERFORM] Stored Procedure Performance

2017-10-11 Thread Pavel Stehule
2017-10-11 15:59 GMT+02:00 Purav Chovatia :

> Thanks Laurenz, am having a look at perf.
>
> Can you pls help understand what exactly do you mean when you say "PL/pgSQL
> is not optimized for performance like PL/SQL". Do you mean to indicate that
> app firing queries/DMLs directly would be a better option as compared to
> putting those in Stored Procs?
>

PL/pgSQL is perfect glue for SQL. SQL queries has same speed without
dependency on environment that executed it.

This sentence mean, so PLpgSQL is not designed for intensive mathematics
calculation.  PL/SQL is self govering environment ... it has own data
types, it has own implementation of logical and mathematics operators.
PLpgSQL is layer over SQL engine - and has not own types, has not own
operators. Any expression is translated to SQL and then is interpreted by
SQL expression interpret. Maybe in next few years there will be a JIT
compiler. But it is not now. This is current bottleneck of PLpgSQL. If your
PL code is glue for SQL queries (implementation of some business
processes), then PLpgSQL is fast enough. If you try to calculate numeric
integration or derivation of some functions, then PLpgSQL is slow. It is
not too slow - the speed is comparable with PHP, but it is significantly
slower than C language.

PostgreSQL has perfect C API - so intensive numeric calculations are
usually implemented as C extension.

Regards

Pavel


>
> Regards
>
> On 3 October 2017 at 20:24, Laurenz Albe  wrote:
>
>> Purav Chovatia wrote:
>> > I come from Oracle world and we are porting all our applications to
>> postgresql.
>> >
>> > The application calls 2 stored procs,
>> > - first one does a few selects and then an insert
>> > - second one does an update
>> >
>> > The main table on which the insert and the update happens is truncated
>> before every performance test.
>> >
>> > We are doing about 100 executions of both of these stored proc per
>> second.
>> >
>> > In Oracle each exec takes about 1millisec whereas in postgres its
>> taking 10millisec and that eventually leads to a queue build up in our
>> application.
>> >
>> > All indices are in place. The select, insert & update are all single
>> row operations and use the PK.
>> >
>> > It does not look like any query taking longer but something else. How
>> can I check where is the time being spent? There are no IO waits, so its
>> all on the CPU.
>>
>> You could profile the PostgreSQL server while it is executing the
>> workload,
>> see for example https://wiki.postgresql.org/wiki/Profiling_with_perf
>>
>> That way you could see where the time is spent.
>>
>> PL/pgSQL is not optimized for performance like PL/SQL.
>>
>> Yours,
>> Laurenz Albe
>>
>
>


Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-11 Thread Pavel Stehule
2017-10-11 13:06 GMT+02:00 johannes graën :

> Hi,
>
> I wrote a query that joins several tables usually returning less than
> 1000 rows, groups them and generates a JSON object of the result. In
> 9.6 is was a question of milliseconds for that query to return the
> requested data. Now, after upgrading to 10, the query never returns -
> at least it hasn't returned in the last hour.
>
> To see what happens, I requested the query plan [1]. It looks complex
> and shows a lot of parallelization. I don't have the query plan from
> 9.6, but I remember it being considerably simpler.
>
> Can anyone have a guess what altered the performance here so
> dramatically? Is there a way to disable new parallelization features
> just for this query to see if it makes any difference?
>
>

have you fresh statistics? After upgrade is necessary to run ANALYZE command

Regards

Pavel


Best
>   Johannes
>
>
> [1] https://explain.depesz.com/s/xsPP
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Stored Procedure Performance

2017-10-03 Thread Pavel Stehule
2017-10-03 17:17 GMT+02:00 Adam Brusselback :

> There is also the option of pg_stat_statements: https://
> www.postgresql.org/docs/current/static/pgstatstatements.html and
> auto_explain: https://www.postgresql.org/docs/current/
> static/auto-explain.html
>
> These should help you identify what is slowing things down.  There is no
> reason I could think of you should be seeing a 10x slowdown between
> Postgres and Oracle, so you'll likely have to just profile it to find out.
>

depends what is inside.

The max 10x slow down is possible if you are hit some unoptimized cases.
The times about 1ms - 10ms shows so procedure (code) can be very sensitive
to some impacts.

Regards

Pavel


Re: [PERFORM] Query regarding EXPLAIN (ANALYZE,BUFFERS)

2017-09-21 Thread Pavel Stehule
2017-09-21 12:52 GMT+02:00 Subramaniam C :

> Hi
>
> I wanted to query top 20 rows by joining two tables, one table having
> around 1 lac rows and other table having 5 lac rows. Since I am using ORDER
> BY in the query so I created compound index with the columns being used in
> ORDER BY. Initially index size was  939 MB.
>
> Then I ran EXPLAIN(ANALYZE,BUFFERS) for this query which took around 20
> secs as it was not using the compound index for this query. So I drop this
> index and created again. The index size now got reduced to 559 MB.
>
> After this if I ran the EXPLAIN(ANALYZE,BUFFERS) for this query it was
> using the index and took only 5 secs.
>
> Can you please explain how the index size got reduced after recreating it
> and how the query started using the index after recreating?
>
>
The index can be bloated - when you recreate it or when you use REINDEX
command, then you remove a bloat content. VACUUM FULL recreate indexes too.

Fresh index needs less space on disc (the read is faster), in memory too
and has better structure - a access should be faster.



> Thanks and Regards
> Subramaniam
>


Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Pavel Stehule
2017-08-15 18:13 GMT+02:00 Jeff Janes :

> On Tue, Aug 15, 2017 at 3:06 AM, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> wrote:
>
>> Hi,
>> So I I run the cheks that jeff mentioned :
>> \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
>> hour and 35 minutes
>> \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
>> the remote oracle database is currently under maintenance work.
>>
>
> The "\copy...from" doesn't depend on oracle, it would be only depend on
> local file system (/tmp/tmp), provided that the "\copy...to" finished.
> Anyway, given the length of time it took, I think you can conclude the
> bottleneck is in oracle_fdw itself, or in Oracle, or the network.
>

dumping from Oracle is not fast - I seen it when oracle_fdw or ora2pg cases.

Regards

Pavel



>
> Cheers,
>
> Jeff
>


Re: [PERFORM] Create view

2017-08-03 Thread Pavel Stehule
Hi

This is wrong mailing list for this question - please, use pgsql-general
for similar questions. I don't see any relation to performance.

2017-08-03 9:18 GMT+02:00 Daulat Ram <daulat@cyient.com>:

> Dear team,
>
>
>
> Can you please let me know how we can create a view using db link,
>
> A base table column having serial datatype. And we want to create a view
> of that table on server B. But unable to create and getting the below issue.
>
>
>
> *Error*:
>
>
>
> ERROR:  type "serial" does not exist
>
> LINE 17: as roaster_test ( roaster_id serial,
>
>   ^
>
> ** Error **
>
>
>
> ERROR: type "serial" does not exist
>
> SQL state: 42704
>
> Character: 432
>
>
>
> *Script*:
>
>
>
> create or replace view roaster_test as
>
> select * from  dblink('port=5433 host=INN14U-DW1427 dbname=postgres
> user=postgres password=postgres94',
>
> 'select
>
>  roaster_id,  roaster_date,  pickdrop,  roaster_state,  cab_id,
> shift_key,  roaster_creation_date,
>
>   status integer,
>
>   notificationcount,  totaltraveldistance,  start_trip,  end_trip,
> trip_duration from public.roaster')
>
> *as roaster_test* ( roaster_id serial,
>
>   roaster_date date,
>
>   pickdrop "char",
>
>   roaster_state character varying,
>
>   cab_id character varying,
>
>   shift_key integer,
>
>   roaster_creation_date date,
>
>   status integer,
>
>   notificationcount integer,
>
>   totaltraveldistance double precision,
>
>   start_trip text,
>
>   end_trip text,
>
>   trip_duration text)
>
>
>
>
>
>
>
> Suggest me if there is any alternate way for the same.
>

Serial is "pseudotype" and can be used only for CREATE TABLE command. This
pseudotype is translated to "int DEFAULT nextval(automatic_sequence)"

Use int instead in your case.

Regards

Pavel Stehule

>
>
> Regards,
>
> Daulat
>
> --
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>


Re: [PERFORM]

2017-06-29 Thread Pavel Stehule
2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov <yevhenii.kur...@gmail.com>:

> Hello folks,
>
> Thank you very much for analysis and suggested - there is a lot to learn
> here. I just  tried UNION queries and got following error:
>
> ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
>

it is sad :(

maybe bitmap index scan can work

postgres=# create table test(id int, started date, failed date, status int);
CREATE TABLE
postgres=# create index on test(id) where status = 0;
CREATE INDEX
postgres=# create index on test(started) where status = 1;
CREATE INDEX
postgres=# create index on test(failed ) where status = 2;
CREATE INDEX
postgres=# explain select id from test where (status = 0 and id in
(1,2,3,4,5)) or (status = 1 and started < current_date) or (status = 2 and
failed > current_date);
┌
│
QUERY PLAN
╞
│ Bitmap Heap Scan on test  (cost=12.93..22.50 rows=6 width=4)

│   Recheck Cond: (((id = ANY ('{1,2,3,4,5}'::integer[])) AND (status = 0))
OR ((started < CURRENT_DATE) AND (status = 1)) OR ((faile
│   Filter: (((status = 0) AND (id = ANY ('{1,2,3,4,5}'::integer[]))) OR
((status = 1) AND (started < CURRENT_DATE)) OR ((status = 2)
│   ->  BitmapOr  (cost=12.93..12.93 rows=6 width=0)

│ ->  Bitmap Index Scan on test_id_idx  (cost=0.00..4.66 rows=1
width=0)
│   Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))

│ ->  Bitmap Index Scan on test_started_idx  (cost=0.00..4.13
rows=3 width=0)
│   Index Cond: (started < CURRENT_DATE)

│ ->  Bitmap Index Scan on test_failed_idx  (cost=0.00..4.13 rows=3
width=0)
│   Index Cond: (failed > CURRENT_DATE)

└
(10 rows)



>
> I made a table dump for anyone who wants to give it a spin
> https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr
> and here is the gist for the original commands https://gist.github.
> com/lessless/33215d0c147645db721e74e07498ac53
>
> On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <brad.dej...@infor.com>
> wrote:
>
>>
>>
>> On 2017-06-28, Pavel Stehule wrote ...
>> > On 2017-06-28, Yevhenii Kurtov wrote ...
>> >> On 2017-06-28, Pavel Stehule wrote ...
>> >>> On 2017-06-28, Yevhenii Kurtov wrote ...
>> >>>> We have a query that is run almost each second and it's very
>> important to squeeze every other ms out of it. The query is:
>> >>>> ...
>> >>>> I added following index: CREATE INDEX ON campaign_jobs(id, status,
>> failed_at, started_at, priority DESC, times_failed);
>> >>>> ...
>> >>> There are few issues
>> >>> a) parametrized LIMIT
>> >>> b) complex predicate with lot of OR
>> >>> c) slow external sort
>> >>>
>> >>> b) signalize maybe some strange in design .. try to replace "OR" by
>> "UNION" query
>> >>> c) if you can and you have good enough memory .. try to increase
>> work_mem .. maybe 20MB
>> >>>
>> >>> if you change query to union queries, then you can use conditional
>> indexes
>> >>>
>> >>> create index(id) where status = 0;
>> >>> create index(failed_at) where status = 2;
>> >>> create index(started_at) where status = 1;
>> >>
>> >> Can you please give a tip how to rewrite the query with UNION clause?
>> >
>> > SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
>> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0
>> > WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
>> > ORDER BY c0."priority" DESC, c0."times_failed"
>> > LIMIT $7
>> > FOR UPDATE SKIP LOCKED
>>
>>
>> Normally (at least for developers I've worked with), that kind of query
>> structure is used when the "status" values don't overlap and don't change
>> from query to query. Judging from Pavel's suggested conditional indexes
>> (i.e. "where status = "), he also thinks that is lik

Re: [PERFORM]

2017-06-28 Thread Pavel Stehule
2017-06-28 9:28 GMT+02:00 Yevhenii Kurtov <yevhenii.kur...@gmail.com>:

> Hello Pavel,
>
> Can you please give a tip how to rewrite the query with UNION clause? I
> didn't use it at all before actually and afraid that will not get it
> properly from the first time :)
>

SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
UNION SELECT c0."id" FROM "campaign_jobs" AS c0
WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT $7
FOR UPDATE SKIP LOCKED

Something like this

Pavel


>
> On Wed, Jun 28, 2017 at 2:12 PM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
>
>>
>>
>> 2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov <yevhenii.kur...@gmail.com>:
>>
>>> Hello,
>>>
>>> We have a query that is run almost each second and it's very important
>>> to squeeze every other ms out of it. The query is:
>>>
>>> SELECT c0."id" FROM "campaign_jobs" AS c0
>>> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
>>> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
>>> OR ((c0."status" = $5) AND (c0."started_at" < $6))
>>> ORDER BY c0."priority" DESC, c0."times_failed"
>>> LIMIT $7
>>> FOR UPDATE SKIP LOCKED
>>>
>>> I added following index:
>>>
>>> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at,
>>> priority DESC, times_failed);
>>>
>>> And it didn't help at all, even opposite - the planning phase time grew
>>> up from ~2ms  up to ~40 ms leaving execution time intact:
>>>
>>>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
>>> time=827.753..828.113 rows=100 loops=1)
>>>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
>>> time=827.752..828.096 rows=100 loops=1)
>>>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18)
>>> (actual time=827.623..827.653 rows=100 loops=1)
>>>Sort Key: priority DESC, times_failed
>>>Sort Method: external sort  Disk: 5472kB
>>>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
>>> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>>>  Filter: (((status = 0) AND (id <> ALL
>>> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>>> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
>>> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
>>> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
>>> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>>>  Planning time: 40.734 ms
>>>  Execution time: 913.638 ms
>>> (9 rows)
>>>
>>>
>>> I see that query still went through the Seq Scan instead of Index Scan.
>>> Is it due to poorly crafted index or because of query structure? Is it
>>> possible to make this query faster?
>>>
>>
>> There are few issues
>>
>> a) parametrized LIMIT
>> b) complex predicate with lot of OR
>> c)  slow external sort
>>
>> b) signalize maybe some strange in design .. try to replace "OR" by
>> "UNION" query
>> c) if you can and you have good enough memory .. try to increase work_mem
>> .. maybe 20MB
>>
>> if you change query to union queries, then you can use conditional indexes
>>
>> create index(id) where status = 0;
>> create index(failed_at) where status = 2;
>> create index(started_at) where status = 1;
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Thanks
>>>
>>
>>
>


Re: [PERFORM]

2017-06-28 Thread Pavel Stehule
2017-06-28 8:47 GMT+02:00 Yevhenii Kurtov :

> Hello,
>
> We have a query that is run almost each second and it's very important to
> squeeze every other ms out of it. The query is:
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2
> OR ((c0."status" = $3) AND (c0."failed_at" > $4))
> OR ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED
>
> I added following index:
>
> CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority
> DESC, times_failed);
>
> And it didn't help at all, even opposite - the planning phase time grew up
> from ~2ms  up to ~40 ms leaving execution time intact:
>
>  Limit  (cost=29780.02..29781.27 rows=100 width=18) (actual
> time=827.753..828.113 rows=100 loops=1)
>->  LockRows  (cost=29780.02..32279.42 rows=199952 width=18) (actual
> time=827.752..828.096 rows=100 loops=1)
>  ->  Sort  (cost=29780.02..30279.90 rows=199952 width=18) (actual
> time=827.623..827.653 rows=100 loops=1)
>Sort Key: priority DESC, times_failed
>Sort Method: external sort  Disk: 5472kB
>->  Seq Scan on campaign_jobs c0  (cost=0.00..22138.00
> rows=199952 width=18) (actual time=1.072..321.410 rows=20 loops=1)
>  Filter: (((status = 0) AND (id <> ALL
> ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
> 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,
> 43,44,45,46,47,48}'::integer[]))) OR ((status = 2) AND (failed_at >
> '2017-06-22 03:18:09'::timestamp without time zone)) OR ((status = 1) AND
> (started_at < '2017-06-23 03:11:09'::timestamp without time zone)))
>  Planning time: 40.734 ms
>  Execution time: 913.638 ms
> (9 rows)
>
>
> I see that query still went through the Seq Scan instead of Index Scan. Is
> it due to poorly crafted index or because of query structure? Is it
> possible to make this query faster?
>

There are few issues

a) parametrized LIMIT
b) complex predicate with lot of OR
c)  slow external sort

b) signalize maybe some strange in design .. try to replace "OR" by "UNION"
query
c) if you can and you have good enough memory .. try to increase work_mem
.. maybe 20MB

if you change query to union queries, then you can use conditional indexes

create index(id) where status = 0;
create index(failed_at) where status = 2;
create index(started_at) where status = 1;

Regards

Pavel


>
> Thanks
>


Re: [PERFORM] Can postgresql plan a query using multiple CPU cores?

2017-05-22 Thread Pavel Stehule
Hi

2017-05-22 22:21 GMT+02:00 Clemens Eisserer :

> Hi,
>
> I have a letancy-sensitive legacy application, where the time consumed
> by query planning was always causing some headaches.
> Currently it is running on postgresql-8.4 - will postgresql-10 support
> generating plans using multiple CPU cores to reduce the time required
> to generate a single plan?
>

 no. PostgreSQL 9.6 and higher uses more CPU only for execution.

For planner speed are important GUC parameters join_collapse_limit,
from_collapse_limit and show geqo_threshold.

You can try to decrease geqo_threshold - with low geqo_threshold you can
increase join_collapse_limit and from_collapse_limit

Regards

Pavel

>
> Thank you in advance and best regards, Clemens
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Query with no result set, really really slow adding ORBDER BY / LIMIT clause

2017-04-21 Thread Pavel Stehule
2017-04-21 9:05 GMT+02:00 Marco Renzi :

>
>
>> I am thinking so limit 1 should be ok. Too big number can be messy
>> for optimizer similarly like too small number.
>>
>> The planner is driven by statistics - and the statistics are not perfect
>> - usually it is working on 80% - like weather forecasting.
>>
>> Usually it is working, but sometimes not.
>>
>> Regards
>>
>> Pavel
>>
>>
>>
> Thanks Pavel, i almost found two solutions at the end:
> One is to use an inner limit as you said, and the other, when you just
> know what the filter is,
> is to try to join with SELECTS that have to be executed first from the
> planner.
>
> Eg
> SELECT  fase.id
> FROMtipofase
> JOIN   fase
> ON (fase.tipofase = (SELECT tipofase.id FROM tipofase WHERE
> tipofase.agendafrontoffice = true))
>
> ORDER BYfase.id DESC   limit 10 offset 0
>
> Thanks for the help
>

yes, sometimes when the data are not homogeneous more queries are necessary

Regards

Pavel


>
> --
> 
> 
> ---
> Ing. Marco Renzi
> OCA - Oracle Certified Associate Java SE7 Programmer
> OCP - Oracle Certified Mysql 5 Developer
>
> via Zegalara 57
> 62014 Corridonia(MC)
> Mob: 3208377271 <(320)%20837-7271>
>
>
> "The fastest way to change yourself is to hang out with people who are
> already the way you want to be" Reid Hoffman
>


Re: [PERFORM] Query with no result set, really really slow adding ORBDER BY / LIMIT clause

2017-04-21 Thread Pavel Stehule
2017-04-21 8:49 GMT+02:00 Marco Renzi <renzi@gmail.com>:

> This could look strange, but is fast as hell!
> The main problem is:
> Is everytime ok doing query like this with order by and limit? Is ok using
> an upperlimit to 1.000.000.000 records?
>

I am thinking so limit 1 should be ok. Too big number can be messy for
optimizer similarly like too small number.

The planner is driven by statistics - and the statistics are not perfect -
usually it is working on 80% - like weather forecasting.

Usually it is working, but sometimes not.

Regards

Pavel


>
> SELECT * FROM (
> SELECT  fase.id
> FROMtipofase
> JOIN   fase
> ON (fase.tipofase = tipofase.id)
> WHERE agendafrontoffice = true
> ORDER BYfase.id DESC   limit 10 offset 0
> ) A
> ORDER BYA.id DESC   limit 10 offset 0
>
> 2017-04-20 18:05 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>
>>
>>
>>
>> I am afraid so is not possible to solve this issue by one query. In this
>>> case the planner expects early stop due finding few values. But because
>>> there are not any value, the LIMIT clause has not any benefit in executor
>>> time, but the planner is messed. Maybe try to increase LIMIT to some higher
>>> value .. 1000, 1 so planner don't fall to this trap. PostgreSQL
>>> statistics are about most common values, but the values without any
>>> occurrence are not well registered by statistics.
>>>
>>> Regards
>>>
>>
>> It can looks strange, but it can work
>>
>> SELECT *
>>FROM (your query ORDER BY .. OFFSET 0 LIMIT 1) s
>>   ORDER BY ...
>>   LIMIT 10;
>>
>> Regards
>>
>> Pavel
>>
>


Re: [PERFORM] Query with no result set, really really slow adding ORBDER BY / LIMIT clause

2017-04-20 Thread Pavel Stehule
2017-04-20 17:57 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:

>
>
> 2017-04-20 9:19 GMT+02:00 Marco Renzi <renzi@gmail.com>:
>
>> Hi!, i've currently a big problem using  ORBDER BY / LIMIT in a query
>> with no result set.
>> If i add the order by/limit clause it runs really really slow.
>>
>>
>>
>> QUERY 1 FAST:
>> 
>>
>> SELECT  fase.id
>> FROMtipofase
>> JOIN   fase
>> ON (fase.tipofase = tipofase.id)
>> WHERE   tipofase.agendafrontoffice = true
>>
>> EXPLAIN ANALYZE:
>>
>> Nested Loop  (cost=0.43..790.19 rows=14462 width=4) (actual 
>> time=0.079..0.079 rows=0 loops=1)
>>
>>   ->  Seq Scan on tipofase  (cost=0.00..3.02 rows=1 width=4) (actual 
>> time=0.077..0.077 rows=0 loops=1)
>> Filter: agendafrontoffice
>> Rows Removed by Filter: 102
>>   ->  Index Only Scan using fase_test_prova_4 on fase  (cost=0.43..595.59 
>> rows=19158 width=8) (never executed)
>> Index Cond: (tipofase = tipofase.id)
>> Heap Fetches: 0
>> Planning time: 0.669 ms
>> Execution time: 0.141 ms
>>
>> ---
>>
>> It's perfect because it starts from tipofase, where there are no 
>> agendafrontoffice = true
>>
>> fase_test_prova_4 is a btree index ON (fase.tipofase, fase.id)
>> fase.id is PRIMARY key on fase,
>> tipofase.id is PRIMARY key on tipofase,
>> fase.tipofase is FK on tipofase.id
>> and tipofase.agendafrontoffice is a boolean.
>>
>> I've also created a btree index on tipofase.agendafrontoffice.
>>
>> **fase** is a large table with 1.475.146 records. There are no rows in
>> the table matching tipofase.agendafrontoffice = true, so the result set is
>> empty(QUERY 1)
>>
>>
>>
>>
>> QUERY 2 SLOW(WITH limit and order by):
>> 
>>
>>
>> SELECT  fase.id
>> FROMtipofase
>> JOIN   fase
>> ON (fase.tipofase = tipofase.id)
>> WHERE   tipofase.agendafrontoffice = true
>> ORDER BYfase.id DESC limit 10 offset 0
>>
>> Limit  (cost=0.43..149.66 rows=10 width=4) (actual 
>> time=173853.131..173853.131 rows=0 loops=1)
>>   ->  Nested Loop  (cost=0.43..215814.25 rows=14462 width=4) (actual 
>> time=173853.130..173853.130 rows=0 loops=1)
>> Join Filter: (fase.tipofase = tipofase.id)
>> ->  Index Scan Backward using test_prova_2 on fase  
>> (cost=0.43..193684.04 rows=1475146 width=8) (actual time=1.336..173128.418 
>> rows=1475146 loops=1)
>> ->  Materialize  (cost=0.00..3.02 rows=1 width=4) (actual 
>> time=0.000..0.000 rows=0 loops=1475146)
>>   ->  Seq Scan on tipofase  (cost=0.00..3.02 rows=1 width=4) 
>> (actual time=0.000..0.000 rows=0 loops=1)
>> Filter: agendafrontoffice
>> Rows Removed by Filter: 102
>> Planning time: 0.685 ms
>> Execution time: 173853.221 ms
>>
>>
>>
> I am afraid so is not possible to solve this issue by one query. In this
> case the planner expects early stop due finding few values. But because
> there are not any value, the LIMIT clause has not any benefit in executor
> time, but the planner is messed. Maybe try to increase LIMIT to some higher
> value .. 1000, 1 so planner don't fall to this trap. PostgreSQL
> statistics are about most common values, but the values without any
> occurrence are not well registered by statistics.
>
> Regards
>

It can looks strange, but it can work

SELECT *
   FROM (your query ORDER BY .. OFFSET 0 LIMIT 1) s
  ORDER BY ...
  LIMIT 10;

Regards

Pavel



>
> Pavel
>
>
>> Really really slow. looks like the planner is not doing a good job.
>> PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit
>>
>>
>> I also run VACUUM  AND VACUUM ANALYZE on both table
>> I tried to play with the
>> "alter table tipofase alter column agendafrontoffice set statistics 2"
>> but nothing.
>>
>> Thanks in advance Marco
>>
>>
>>
>> --
>> 
>> 
>> ---
>> Ing. Marco Renzi
>> OCA - Oracle Certified Associate Java SE7 Programmer
>> OCP - Oracle Certified Mysql 5 Developer
>>
>> via Zegalara 57
>> 62014 Corridonia(MC)
>> Mob: 3208377271 <(320)%20837-7271>
>>
>>
>> "The fastest way to change yourself is to hang out with people who are
>> already the way you want to be" Reid Hoffman
>>
>
>


Re: [PERFORM] Query with no result set, really really slow adding ORBDER BY / LIMIT clause

2017-04-20 Thread Pavel Stehule
2017-04-20 9:19 GMT+02:00 Marco Renzi :

> Hi!, i've currently a big problem using  ORBDER BY / LIMIT in a query with
> no result set.
> If i add the order by/limit clause it runs really really slow.
>
>
>
> QUERY 1 FAST:
> 
>
> SELECT  fase.id
> FROMtipofase
> JOIN   fase
> ON (fase.tipofase = tipofase.id)
> WHERE   tipofase.agendafrontoffice = true
>
> EXPLAIN ANALYZE:
>
> Nested Loop  (cost=0.43..790.19 rows=14462 width=4) (actual time=0.079..0.079 
> rows=0 loops=1)
>
>   ->  Seq Scan on tipofase  (cost=0.00..3.02 rows=1 width=4) (actual 
> time=0.077..0.077 rows=0 loops=1)
> Filter: agendafrontoffice
> Rows Removed by Filter: 102
>   ->  Index Only Scan using fase_test_prova_4 on fase  (cost=0.43..595.59 
> rows=19158 width=8) (never executed)
> Index Cond: (tipofase = tipofase.id)
> Heap Fetches: 0
> Planning time: 0.669 ms
> Execution time: 0.141 ms
>
> ---
>
> It's perfect because it starts from tipofase, where there are no 
> agendafrontoffice = true
>
> fase_test_prova_4 is a btree index ON (fase.tipofase, fase.id)
> fase.id is PRIMARY key on fase,
> tipofase.id is PRIMARY key on tipofase,
> fase.tipofase is FK on tipofase.id
> and tipofase.agendafrontoffice is a boolean.
>
> I've also created a btree index on tipofase.agendafrontoffice.
>
> **fase** is a large table with 1.475.146 records. There are no rows in the
> table matching tipofase.agendafrontoffice = true, so the result set is
> empty(QUERY 1)
>
>
>
>
> QUERY 2 SLOW(WITH limit and order by):
> 
>
>
> SELECT  fase.id
> FROMtipofase
> JOIN   fase
> ON (fase.tipofase = tipofase.id)
> WHERE   tipofase.agendafrontoffice = true
> ORDER BYfase.id DESC limit 10 offset 0
>
> Limit  (cost=0.43..149.66 rows=10 width=4) (actual 
> time=173853.131..173853.131 rows=0 loops=1)
>   ->  Nested Loop  (cost=0.43..215814.25 rows=14462 width=4) (actual 
> time=173853.130..173853.130 rows=0 loops=1)
> Join Filter: (fase.tipofase = tipofase.id)
> ->  Index Scan Backward using test_prova_2 on fase  
> (cost=0.43..193684.04 rows=1475146 width=8) (actual time=1.336..173128.418 
> rows=1475146 loops=1)
> ->  Materialize  (cost=0.00..3.02 rows=1 width=4) (actual 
> time=0.000..0.000 rows=0 loops=1475146)
>   ->  Seq Scan on tipofase  (cost=0.00..3.02 rows=1 width=4) 
> (actual time=0.000..0.000 rows=0 loops=1)
> Filter: agendafrontoffice
> Rows Removed by Filter: 102
> Planning time: 0.685 ms
> Execution time: 173853.221 ms
>
>
>
I am afraid so is not possible to solve this issue by one query. In this
case the planner expects early stop due finding few values. But because
there are not any value, the LIMIT clause has not any benefit in executor
time, but the planner is messed. Maybe try to increase LIMIT to some higher
value .. 1000, 1 so planner don't fall to this trap. PostgreSQL
statistics are about most common values, but the values without any
occurrence are not well registered by statistics.

Regards

Pavel


> Really really slow. looks like the planner is not doing a good job.
> PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit
>
>
> I also run VACUUM  AND VACUUM ANALYZE on both table
> I tried to play with the
> "alter table tipofase alter column agendafrontoffice set statistics 2"
> but nothing.
>
> Thanks in advance Marco
>
>
>
> --
> 
> 
> ---
> Ing. Marco Renzi
> OCA - Oracle Certified Associate Java SE7 Programmer
> OCP - Oracle Certified Mysql 5 Developer
>
> via Zegalara 57
> 62014 Corridonia(MC)
> Mob: 3208377271 <(320)%20837-7271>
>
>
> "The fastest way to change yourself is to hang out with people who are
> already the way you want to be" Reid Hoffman
>


Re: [PERFORM] Query performance changes significantly depending on limit value

2017-02-23 Thread Pavel Stehule
2017-02-23 17:45 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>:

> Not sure what other options we have other than an EAV approach since we
> allow users to define their own attribute types (attribute type is in
> contacts_contactfield, attribute value is in values_value). Would you
> expect modelling that with a JSON column to perform better?
>

Should be - maybe hstore, jsonb with special index. EAV works if you don't
do massive operations.

Usually the best approach is mix design - what can be relational - often
attributes used in filters should be rational (columnar) and others can be
in some unrelational type - XML, JSON, ...

Regards

Pavel


>
> Thanks for the tips!
>
> On 23 February 2017 at 17:35, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
>
>>
>>
>> 2017-02-23 15:02 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>:
>>
>>> Hi Pavel. That suggestion gets me as far as LIMIT 694 with the fast plan
>>> then things get slow again. This is now what happens at LIMIT 695:
>>>
>>> Limit  (cost=35945.78..50034.52 rows=695 width=88) (actual
>>> time=12852.580..12854.382 rows=695 loops=1)
>>>   Buffers: shared hit=6 read=66689
>>>   ->  Merge Join  (cost=35945.78..56176.80 rows=998 width=88) (actual
>>> time=12852.577..12854.271 rows=695 loops=1)
>>> Merge Cond: (contacts_contact.id =
>>> contacts_contactgroup_contacts.contact_id)
>>> Buffers: shared hit=6 read=66689
>>> ->  Sort  (cost=35944.53..35949.54 rows=2004 width=92) (actual
>>> time=12852.486..12852.577 rows=710 loops=1)
>>>   Sort Key: contacts_contact.id
>>>   Sort Method: quicksort  Memory: 34327kB
>>>   Buffers: shared hit=6 read=66677
>>>   ->  Hash Join  (cost=6816.19..35834.63 rows=2004 width=92)
>>> (actual time=721.293..12591.204 rows=200412 loops=1)
>>> Hash Cond: (contacts_contact.id = u0.contact_id)
>>> Buffers: shared hit=6 read=66677
>>> ->  Seq Scan on contacts_contact
>>>  (cost=0.00..25266.00 rows=100 width=88) (actual time=0.003..267.258
>>> rows=100 loops=1)
>>>   Buffers: shared hit=1 read=15265
>>> ->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
>>> (actual time=714.373..714.373 rows=200412 loops=1)
>>>   Buckets: 1024  Batches: 1  Memory Usage: 7046kB
>>>   Buffers: shared hit=5 read=51412
>>>   ->  HashAggregate  (cost=6810.70..6813.14
>>> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
>>> Buffers: shared hit=5 read=51412
>>> ->  Bitmap Heap Scan on values_value u0
>>>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976
>>> rows=200412 loops=1)
>>>   Recheck Cond: ((contact_field_id =
>>> 1) AND (upper(string_value) = 'F'::text))
>>>   Buffers: shared hit=5 read=51412
>>>   ->  Bitmap Index Scan on
>>> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
>>> width=0) (actual time=57.642..57.642 rows=200412 loops=1)
>>> Index Cond:
>>> ((contact_field_id = 1) AND (upper(string_value) = 'F'::text))
>>> Buffers: shared hit=5
>>> read=765
>>> ->  Index Only Scan Backward using 
>>> contacts_contactgroup_contacts_contactgroup_id_0f909f73_uniq
>>> on contacts_contactgroup_contacts  (cost=0.43..18967.29 rows=497992
>>> width=4) (actual time=0.080..0.651 rows=1707 loops=1)
>>>   Index Cond: (contactgroup_id = 1)
>>>   Heap Fetches: 0
>>>   Buffers: shared read=12
>>> Total runtime: 12863.938 ms
>>>
>>> https://explain.depesz.com/s/nfw1
>>>
>>> Can you explain a bit more about what you mean about " dependency
>>> between contact_field_id = 1 and upper(string_value) = 'F'::text"?
>>>
>>
>> look to related node in plan
>>
>>
>> ->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
>> (actual time=714.373..714.373 rows=200412 loops=1)
>>   Buckets: 1024  Batches: 1  Memory Usage: 7046kB
>>   

Re: [PERFORM] Query performance changes significantly depending on limit value

2017-02-23 Thread Pavel Stehule
2017-02-23 15:02 GMT+01:00 Rowan Seymour <rowanseym...@gmail.com>:

> Hi Pavel. That suggestion gets me as far as LIMIT 694 with the fast plan
> then things get slow again. This is now what happens at LIMIT 695:
>
> Limit  (cost=35945.78..50034.52 rows=695 width=88) (actual
> time=12852.580..12854.382 rows=695 loops=1)
>   Buffers: shared hit=6 read=66689
>   ->  Merge Join  (cost=35945.78..56176.80 rows=998 width=88) (actual
> time=12852.577..12854.271 rows=695 loops=1)
> Merge Cond: (contacts_contact.id = contacts_contactgroup_
> contacts.contact_id)
> Buffers: shared hit=6 read=66689
> ->  Sort  (cost=35944.53..35949.54 rows=2004 width=92) (actual
> time=12852.486..12852.577 rows=710 loops=1)
>   Sort Key: contacts_contact.id
>   Sort Method: quicksort  Memory: 34327kB
>   Buffers: shared hit=6 read=66677
>   ->  Hash Join  (cost=6816.19..35834.63 rows=2004 width=92)
> (actual time=721.293..12591.204 rows=200412 loops=1)
> Hash Cond: (contacts_contact.id = u0.contact_id)
> Buffers: shared hit=6 read=66677
> ->  Seq Scan on contacts_contact  (cost=0.00..25266.00
> rows=100 width=88) (actual time=0.003..267.258 rows=100 loops=1)
>   Buffers: shared hit=1 read=15265
> ->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
> (actual time=714.373..714.373 rows=200412 loops=1)
>   Buckets: 1024  Batches: 1  Memory Usage: 7046kB
>   Buffers: shared hit=5 read=51412
>   ->  HashAggregate  (cost=6810.70..6813.14
> rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
> Buffers: shared hit=5 read=51412
> ->  Bitmap Heap Scan on values_value u0
>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976
> rows=200412 loops=1)
>   Recheck Cond: ((contact_field_id =
> 1) AND (upper(string_value) = 'F'::text))
>   Buffers: shared hit=5 read=51412
>   ->  Bitmap Index Scan on
> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
> width=0) (actual time=57.642..57.642 rows=200412 loops=1)
> Index Cond: ((contact_field_id
> = 1) AND (upper(string_value) = 'F'::text))
> Buffers: shared hit=5 read=765
> ->  Index Only Scan Backward using contacts_contactgroup_
> contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts
>  (cost=0.43..18967.29 rows=497992 width=4) (actual time=0.080..0.651
> rows=1707 loops=1)
>   Index Cond: (contactgroup_id = 1)
>   Heap Fetches: 0
>   Buffers: shared read=12
> Total runtime: 12863.938 ms
>
> https://explain.depesz.com/s/nfw1
>
> Can you explain a bit more about what you mean about " dependency between
> contact_field_id = 1 and upper(string_value) = 'F'::text"?
>

look to related node in plan


->  Hash  (cost=6813.14..6813.14 rows=244 width=4)
(actual time=714.373..714.373 rows=200412 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 7046kB
  Buffers: shared hit=5 read=51412
  ->  HashAggregate  (cost=6810.70..6813.14
rows=244 width=4) (actual time=561.099..644.822 rows=200412 loops=1)
Buffers: shared hit=5 read=51412
->  Bitmap Heap Scan on values_value u0
 (cost=60.98..6805.69 rows=2004 width=4) (actual time=75.410..364.976
rows=200412 loops=1)
  Recheck Cond: ((contact_field_id = 1)
AND (upper(string_value) = 'F'::text))
  Buffers: shared hit=5 read=51412

There is lot of significant differences between estimation (2004) and
reality (200412) - two orders - so the plan must be suboptimal

I am looking to your schema - and it is variant on EAV table - this is
antippatern and for more then small returned rows it should be slow.

Regards

Pavel



> Btw I created the index values_value_field_string_value_contact as
>
> CREATE INDEX values_value_field_string_value_contact
> ON values_value(contact_field_id, UPPER(string_value), contact_id DESC)
> WHERE contact_field_id IS NOT NULL;
>
> I'm not sure why it needs the contact_id column but without it the planner
> picks a slow approach for even smaller LIMIT values.
>
>
> On 23 February 2017 at 15:32, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:

Re: [PERFORM] Query performance changes significantly depending on limit value

2017-02-23 Thread Pavel Stehule
2017-02-23 14:11 GMT+01:00 Rowan Seymour :

> Hi guys
>
> I'm a bit stuck on a query that performs fantastically up to a certain
> limit value, after which the planner goes off in a completely different
> direction and performance gets dramatically worse. Am using Postgresql 9.3
>
> You can see all the relevant schemas at http://pastebin.com/PNEqw2id and
> in the test database there are 1,000,000 records in contacts_contact, and
> about half of those will match the subquery on values_value.
>
> The query in question is:
>
> SELECT "contacts_contact".* FROM "contacts_contact"
> INNER JOIN "contacts_contactgroup_contacts" ON ("contacts_contact"."id" =
> "contacts_contactgroup_contacts"."contact_id")
> WHERE ("contacts_contactgroup_contacts"."contactgroup_id" = 1
>AND "contacts_contact"."id" IN (
>  SELECT U0."contact_id" FROM "values_value" U0 WHERE
> (U0."contact_field_id" = 1 AND UPPER(U0."string_value"::text) = UPPER('F'))
>)
> ) ORDER BY "contacts_contact"."id" DESC LIMIT 222;
>
> With that limit of 222, it performs like:
>
> Limit  (cost=3.09..13256.36 rows=222 width=88) (actual time=0.122..3.358
> rows=222 loops=1)
>   Buffers: shared hit=708 read=63
>   ->  Nested Loop  (cost=3.09..59583.10 rows=998 width=88) (actual
> time=0.120..3.304 rows=222 loops=1)
> Buffers: shared hit=708 read=63
> ->  Merge Semi Join  (cost=2.65..51687.89 rows=2004 width=92)
> (actual time=0.103..1.968 rows=227 loops=1)
>   Merge Cond: (contacts_contact.id = u0.contact_id)
>   Buffers: shared hit=24 read=63
>   ->  Index Scan Backward using contacts_contact_pkey on
> contacts_contact  (cost=0.42..41249.43 rows=100 width=88) (actual
> time=0.008..0.502 rows=1117 loops=1)
> Buffers: shared hit=22 read=2
>   ->  Index Scan using values_value_field_string_value_contact
> on values_value u0  (cost=0.43..7934.72 rows=2004 width=4) (actual
> time=0.086..0.857 rows=227 loops=1)
> Index Cond: ((contact_field_id = 1) AND
> (upper(string_value) = 'F'::text))
> Buffers: shared hit=2 read=61
> ->  Index Only Scan using contacts_contactgroup_
> contacts_contactgroup_id_0f909f73_uniq on contacts_contactgroup_contacts
>  (cost=0.43..3.93 rows=1 width=4) (actual time=0.005..0.005 rows=1
> loops=227)
>   Index Cond: ((contactgroup_id = 1) AND (contact_id =
> contacts_contact.id))
>   Heap Fetches: 0
>   Buffers: shared hit=684
> Total runtime: 3.488 ms
>
> https://explain.depesz.com/s/iPPJ
>
> But if increase the limit to 223 then it performs like:
>
> Limit  (cost=8785.68..13306.24 rows=223 width=88) (actual
> time=2685.830..2686.534 rows=223 loops=1)
>   Buffers: shared hit=767648 read=86530
>   ->  Merge Join  (cost=8785.68..29016.70 rows=998 width=88) (actual
> time=2685.828..2686.461 rows=223 loops=1)
> Merge Cond: (contacts_contact.id = contacts_contactgroup_
> contacts.contact_id)
> Buffers: shared hit=767648 read=86530
> ->  Sort  (cost=8784.44..8789.45 rows=2004 width=92) (actual
> time=2685.742..2685.804 rows=228 loops=1)
>   Sort Key: contacts_contact.id
>   Sort Method: quicksort  Memory: 34327kB
>   Buffers: shared hit=767648 read=86524
>   ->  Nested Loop  (cost=6811.12..8674.53 rows=2004 width=92)
> (actual time=646.573..2417.291 rows=200412 loops=1)
>

There is pretty bad estimation probably due dependency between
contact_field_id = 1 and upper(string_value) = 'F'::text

The most simple solution is disable nested loop - set enable_nestloop to off

Regards

Pavel


> Buffers: shared hit=767648 read=86524
> ->  HashAggregate  (cost=6810.70..6813.14 rows=244
> width=4) (actual time=646.532..766.200 rows=200412 loops=1)
>   Buffers: shared read=51417
>   ->  Bitmap Heap Scan on values_value u0
>  (cost=60.98..6805.69 rows=2004 width=4) (actual time=92.016..433.709
> rows=200412 loops=1)
> Recheck Cond: ((contact_field_id = 1) AND
> (upper(string_value) = 'F'::text))
> Buffers: shared read=51417
> ->  Bitmap Index Scan on
> values_value_field_string_value_contact  (cost=0.00..60.47 rows=2004
> width=0) (actual time=70.647..70.647 rows=200412 loops=1)
>   Index Cond: ((contact_field_id = 1)
> AND (upper(string_value) = 'F'::text))
>   Buffers: shared read=770
> ->  Index Scan using contacts_contact_pkey on
> contacts_contact  (cost=0.42..7.62 rows=1 width=88) (actual
> time=0.007..0.007 rows=1 loops=200412)
>   Index Cond: (id = u0.contact_id)
>   Buffers: shared hit=767648 read=35107
> ->  Index Only Scan Backward using 

Re: [PERFORM] Backup taking long time !!!

2017-01-20 Thread Pavel Stehule
2017-01-20 12:53 GMT+01:00 Dinesh Chandra 12108 <dinesh.chan...@cyient.com>:

> Dear Pavel,
>
>
>
> Thanks for quick response.
>
> May I know how can I use physical full backup with export transaction
> segments.
>

https://www.postgresql.org/docs/9.1/static/continuous-archiving.html

This process can be automatized by some applications like barman
http://www.pgbarman.org/

Regards

Pavel


>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Sent:* 20 January, 2017 5:19 PM
> *To:* Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
> *Cc:* Madusudanan.B.N <b.n.madusuda...@gmail.com>;
> pgsql-performance@postgresql.org
>
> *Subject:* Re: [PERFORM] Backup taking long time !!!
>
>
>
> Hi
>
>
>
> 2017-01-20 12:43 GMT+01:00 Dinesh Chandra 12108 <dinesh.chan...@cyient.com
> >:
>
> Exactly parallel option is there in version 9.3 but I can’t upgrade new
> version due to some concerns.
>
> Could you suggest in 9.1 how may I fix it.
>
>
>
> 1. don't use it - you can use physical full backup with export transaction
> segments.
>
>
>
> or
>
>
>
> 2. buy faster IO
>
>
>
> Regards
>
>
>
> Pavel Stehule
>
>
>
>
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
> *From:* Madusudanan.B.N [mailto:b.n.madusuda...@gmail.com]
> *Sent:* 20 January, 2017 5:04 PM
> *To:* Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Backup taking long time !!!
>
>
>
> If you can upgrade to a newer version, there is parallel pg dump.
>
>
>
> Documentation - https://www.postgresql.org/docs/current/static/backup-
> dump.html
>
>
>
> Related blog - http://paquier.xyz/postgresql-2/postgres-9-3-
> feature-highlight-parallel-pg_dump/
>
>
>
> Which can give significant speed up depending on your machine's I/O
> capabilities.
>
>
>
>
>
>
>
> On Fri, Jan 20, 2017 at 4:54 PM, Dinesh Chandra 12108 <
> dinesh.chan...@cyient.com> wrote:
>
> Hi Expert,
>
>
>
> I have a database having size around 1350 GB, created in PostgreSQL-9.1 in
> Linux platform.
>
> I am using pg_dump to take backup which takes around 12 hours to complete.
>
> Could you please suggest me how I can make my backup fast so that it
> complete in less hours?
>
>
>
> Thanks in advance.
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
>
> --
>
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>
>
>
>
>
> --
>
> Regards,
> Madusudanan.B.N <http://madusudanan.com>
>
>
>
>
>


Re: [PERFORM] Backup taking long time !!!

2017-01-20 Thread Pavel Stehule
Hi

2017-01-20 12:43 GMT+01:00 Dinesh Chandra 12108 <dinesh.chan...@cyient.com>:

> Exactly parallel option is there in version 9.3 but I can’t upgrade new
> version due to some concerns.
>
> Could you suggest in 9.1 how may I fix it.
>

1. don't use it - you can use physical full backup with export transaction
segments.

or

2. buy faster IO

Regards

Pavel Stehule



>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
> *From:* Madusudanan.B.N [mailto:b.n.madusuda...@gmail.com]
> *Sent:* 20 January, 2017 5:04 PM
> *To:* Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Backup taking long time !!!
>
>
>
> If you can upgrade to a newer version, there is parallel pg dump.
>
>
>
> Documentation - https://www.postgresql.org/docs/current/static/backup-
> dump.html
>
>
>
> Related blog - http://paquier.xyz/postgresql-2/postgres-9-3-
> feature-highlight-parallel-pg_dump/
>
>
>
> Which can give significant speed up depending on your machine's I/O
> capabilities.
>
>
>
>
>
>
>
> On Fri, Jan 20, 2017 at 4:54 PM, Dinesh Chandra 12108 <
> dinesh.chan...@cyient.com> wrote:
>
> Hi Expert,
>
>
>
> I have a database having size around 1350 GB, created in PostgreSQL-9.1 in
> Linux platform.
>
> I am using pg_dump to take backup which takes around 12 hours to complete.
>
> Could you please suggest me how I can make my backup fast so that it
> complete in less hours?
>
>
>
> Thanks in advance.
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
>
> --
>
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>
>
>
>
>
> --
>
> Regards,
> Madusudanan.B.N <http://madusudanan.com>
>
>
>


Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Pavel Stehule
Hi

2017-01-02 15:34 GMT+01:00 Андрей Хозов :

> Hello there!
>
> I have an performance issue with functions and args type.
>
> Table and data:
> create table t1 (id serial, str char(32));
> insert into t1 (str) select md5(s::text) from generate_series(1, 100)
> as s;
>
> And simple functions:
> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> ​Query:
> test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
> rows=1 loops=1)
>  Planning time: 0.039 ms
>  Execution time: 189.039 ms
> (3 rows)
>
> Time: 189,524 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
> rows=1 loops=1)
>  Planning time: 0.024 ms
>  Execution time: 513.757 ms
> (3 rows)
>
> Time: 514,125 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d2
> 8f33cf'::char(32));
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
> rows=1 loops=1)
>  Planning time: 0.074 ms
>  Execution time: 513.535 ms
> (3 rows)
>

This explain shows nothing - you need to use nested explain

look on auto-explain
https://www.postgresql.org/docs/current/static/auto-explain.html

Maybe index was not used due different types.

Regards

Pavel


> Time: 514,104 ms
> test=>
> ​
> ​Seems that casting param from text to char(32) needs to be done only once
> and​ f1 and f2 must be identical on performance. But function f2 with text
> param significantly slower, even with casting arg while pass it to function.
>
> Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
> behavior or it's can be fixed?
>
> --
> ​Andrey Khozov
>


Re: [PERFORM] Why query plan is different?

2016-10-11 Thread Pavel Stehule
2016-10-11 13:19 GMT+02:00 Andrzej Zawadzki <zawa...@gmail.com>:

> On 11.10.2016 03:47, Pavel Stehule wrote:
>
>
>
> 2016-10-10 23:17 GMT+02:00 Andrzej Zawadzki <zawa...@gmail.com>:
>
>> On 10.10.2016 17:31, Andrzej Zawadzki wrote:
>>
>> Hi,
>> Today, I noticed strange situation:
>>
>> The same query run on different servers has very different plan:
>>
>> Q: SELECT b.* FROM kredytob b  WHERE pesel = '222'  ORDER BY b.id
>> DESC LIMIT 1
>>
>> Slow plan:
>>
>> "Limit  (cost=0.43..28712.33 rows=1 width=4) (actual
>> time=2574.041..2574.044 rows=1 loops=1)"
>> "  Output: id"
>> "  Buffers: shared hit=316132 read=110001"
>> "  ->  Index Scan Backward using kredytob_pkey on public.kredytob b
>> (cost=0.43..324.80 rows=113 width=4) (actual time=2574.034..2574.034
>> rows=1 loops=1)"
>> "Output: id"
>> "Filter: (b.pesel = '222'::bpchar)"
>> "Rows Removed by Filter: 433609"
>> "Buffers: shared hit=316132 read=110001"
>> "Planning time: 0.414 ms"
>> "Execution time: 2574.139 ms"
>>
>>
>> Fast plan:
>> "Limit  (cost=115240.66..115240.66 rows=1 width=4) (actual
>> time=463.275..463.276 rows=1 loops=1)"
>> "  Output: id"
>> "  Buffers: shared hit=14661 read=4576"
>> "  ->  Sort  (cost=115240.66..115240.94 rows=112 width=4) (actual
>> time=463.271..463.271 rows=1 loops=1)"
>> "Output: id"
>> "Sort Key: b.id DESC"
>> "Sort Method: top-N heapsort  Memory: 25kB"
>> "Buffers: shared hit=14661 read=4576"
>> "->  Index Scan using kredytob_pesel_typkred_opclass_idx on
>> public.kredytob b  (cost=0.43..115240.10 rows=112 width=4) (actual
>> time=311.347..463.183 rows=5 loops=1)"
>> "  Output: id"
>> "  Index Cond: (b.pesel = '222'::bpchar)"
>> "  Buffers: shared hit=14661 read=4576"
>> "Planning time: 0.383 ms"
>> "Execution time: 463.324 ms"
>>
>> Data is almost equal - "slow" has a few more rows in table. ("Fast" is a
>> copy from 1 am today).
>> Why runtime is slower?
>>
>>
>> I made another INDEX, without opclass:
>>
>> CREATE INDEX kredytob_pesel_typkred_idx
>>   ON public.kredytob
>>   USING btree
>>   (pesel COLLATE pg_catalog."default", typkred);
>>
>> after that: analyze kredytob;
>>
>> And now:
>> "Limit  (cost=333.31..333.31 rows=1 width=4) (actual time=0.100..0.102
>> rows=1 loops=1)"
>> "  Output: id"
>> "  Buffers: shared hit=8"
>> "  ->  Sort  (cost=333.31..333.59 rows=114 width=4) (actual
>> time=0.095..0.095 rows=1 loops=1)"
>> "Output: id"
>> "Sort Key: b.id DESC"
>> "Sort Method: top-N heapsort  Memory: 25kB"
>> "Buffers: shared hit=8"
>> "->  Index Scan using kredytob_pesel_typkred_idx on
>> public.kredytob b  (cost=0.43..332.74 rows=114 width=4) (actual
>> time=0.046..0.065 rows=5 loops=1)"
>> "  Output: id"
>> "  Index Cond: (b.pesel = '222'::bpchar)"
>> "  Buffers: shared hit=8"
>> "Planning time: 0.438 ms"
>> "Execution time: 0.154 ms"
>>
>> So, what is a reason that "SLOW" server doesn't like opclass index?
>>
>
> what is default locales?
>
> LATIN2 - that's why I use opclass.
>

Is it this local in both cases?

Regards

Pavel


>
> --
> Andrzej
>


Re: [PERFORM] Why query plan is different?

2016-10-10 Thread Pavel Stehule
2016-10-10 23:17 GMT+02:00 Andrzej Zawadzki :

> On 10.10.2016 17:31, Andrzej Zawadzki wrote:
>
> Hi,
> Today, I noticed strange situation:
>
> The same query run on different servers has very different plan:
>
> Q: SELECT b.* FROM kredytob b  WHERE pesel = '222'  ORDER BY b.id
> DESC LIMIT 1
>
> Slow plan:
>
> "Limit  (cost=0.43..28712.33 rows=1 width=4) (actual
> time=2574.041..2574.044 rows=1 loops=1)"
> "  Output: id"
> "  Buffers: shared hit=316132 read=110001"
> "  ->  Index Scan Backward using kredytob_pkey on public.kredytob b
> (cost=0.43..324.80 rows=113 width=4) (actual time=2574.034..2574.034
> rows=1 loops=1)"
> "Output: id"
> "Filter: (b.pesel = '222'::bpchar)"
> "Rows Removed by Filter: 433609"
> "Buffers: shared hit=316132 read=110001"
> "Planning time: 0.414 ms"
> "Execution time: 2574.139 ms"
>
>
> Fast plan:
> "Limit  (cost=115240.66..115240.66 rows=1 width=4) (actual
> time=463.275..463.276 rows=1 loops=1)"
> "  Output: id"
> "  Buffers: shared hit=14661 read=4576"
> "  ->  Sort  (cost=115240.66..115240.94 rows=112 width=4) (actual
> time=463.271..463.271 rows=1 loops=1)"
> "Output: id"
> "Sort Key: b.id DESC"
> "Sort Method: top-N heapsort  Memory: 25kB"
> "Buffers: shared hit=14661 read=4576"
> "->  Index Scan using kredytob_pesel_typkred_opclass_idx on
> public.kredytob b  (cost=0.43..115240.10 rows=112 width=4) (actual
> time=311.347..463.183 rows=5 loops=1)"
> "  Output: id"
> "  Index Cond: (b.pesel = '222'::bpchar)"
> "  Buffers: shared hit=14661 read=4576"
> "Planning time: 0.383 ms"
> "Execution time: 463.324 ms"
>
> Data is almost equal - "slow" has a few more rows in table. ("Fast" is a
> copy from 1 am today).
> Why runtime is slower?
>
>
> I made another INDEX, without opclass:
>
> CREATE INDEX kredytob_pesel_typkred_idx
>   ON public.kredytob
>   USING btree
>   (pesel COLLATE pg_catalog."default", typkred);
>
> after that: analyze kredytob;
>
> And now:
> "Limit  (cost=333.31..333.31 rows=1 width=4) (actual time=0.100..0.102
> rows=1 loops=1)"
> "  Output: id"
> "  Buffers: shared hit=8"
> "  ->  Sort  (cost=333.31..333.59 rows=114 width=4) (actual
> time=0.095..0.095 rows=1 loops=1)"
> "Output: id"
> "Sort Key: b.id DESC"
> "Sort Method: top-N heapsort  Memory: 25kB"
> "Buffers: shared hit=8"
> "->  Index Scan using kredytob_pesel_typkred_idx on
> public.kredytob b  (cost=0.43..332.74 rows=114 width=4) (actual
> time=0.046..0.065 rows=5 loops=1)"
> "  Output: id"
> "  Index Cond: (b.pesel = '222'::bpchar)"
> "  Buffers: shared hit=8"
> "Planning time: 0.438 ms"
> "Execution time: 0.154 ms"
>
> So, what is a reason that "SLOW" server doesn't like opclass index?
>

what is default locales?

Pavel


>
> --
> Andrzej
>


Re: [PERFORM] Why query plan is different?

2016-10-10 Thread Pavel Stehule
2016-10-10 17:31 GMT+02:00 Andrzej Zawadzki :

> Hi,
> Today, I noticed strange situation:
>
> The same query run on different servers has very different plan:
>
> Q: SELECT b.* FROM kredytob b  WHERE pesel = '222'  ORDER BY b.id
> DESC LIMIT 1
>
> Slow plan:
>
> "Limit  (cost=0.43..28712.33 rows=1 width=4) (actual
> time=2574.041..2574.044 rows=1 loops=1)"
> "  Output: id"
> "  Buffers: shared hit=316132 read=110001"
> "  ->  Index Scan Backward using kredytob_pkey on public.kredytob b
> (cost=0.43..324.80 rows=113 width=4) (actual time=2574.034..2574.034
> rows=1 loops=1)"
> "Output: id"
> "Filter: (b.pesel = '222'::bpchar)"
> "Rows Removed by Filter: 433609"
>

here is backward index scan with - lot of rows is thrown

Rows Removed by Filter: 433609"

probably index definition on these servers are different

regards

Pavel



> "Buffers: shared hit=316132 read=110001"
> "Planning time: 0.414 ms"
> "Execution time: 2574.139 ms"
>
>
> Fast plan:
> "Limit  (cost=115240.66..115240.66 rows=1 width=4) (actual
> time=463.275..463.276 rows=1 loops=1)"
> "  Output: id"
> "  Buffers: shared hit=14661 read=4576"
> "  ->  Sort  (cost=115240.66..115240.94 rows=112 width=4) (actual
> time=463.271..463.271 rows=1 loops=1)"
> "Output: id"
> "Sort Key: b.id DESC"
> "Sort Method: top-N heapsort  Memory: 25kB"
> "Buffers: shared hit=14661 read=4576"
> "->  Index Scan using kredytob_pesel_typkred_opclass_idx on
> public.kredytob b  (cost=0.43..115240.10 rows=112 width=4) (actual
> time=311.347..463.183 rows=5 loops=1)"
> "  Output: id"
> "  Index Cond: (b.pesel = '222'::bpchar)"
> "  Buffers: shared hit=14661 read=4576"
> "Planning time: 0.383 ms"
> "Execution time: 463.324 ms"
>
> Data is almost equal - "slow" has a few more rows in table. ("Fast" is a
> copy from 1 am today).
> Why runtime is slower?
>
> --
> Andrzej Zawadzki
>


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-29 Thread Pavel Stehule
2016-09-29 20:49 GMT+02:00 Sven R. Kunze <srku...@mail.de>:

> On 29.09.2016 20:12, Pavel Stehule wrote:
>
>> In ideal world then plan should be independent on used form. The most
>> difficult is safe estimation of OR predicates. With correct estimation the
>> transformation to UNION form should not be necessary I am think.
>>
>
> Ah, okay. That's interesting.
>
> So how can I help here?
>

try to write a patch :) or better, help with enhancing PostgreSQL's
estimation model. Tomas Vondra is working 2 years on multicolumn
statistics. He needs help with review.

Regards

Pavel

>
> Regards,
> Sven
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-29 Thread Pavel Stehule
2016-09-29 14:20 GMT+02:00 Sven R. Kunze <srku...@mail.de>:

> On 23.09.2016 11:00, Pavel Stehule wrote:
>
> 2016-09-23 8:35 GMT+02:00 Sven R. Kunze <srku...@mail.de>:
>
>> I was wondering: would it be possible for PostgreSQL to rewrite the query
>> to generate the UNION (or subquery plan if it's also fast) on it's own?
>>
>
> It depends on real data. On your specific data the UNION variant is pretty
> fast, on different set, the UNION can be pretty slow. It is related to
> difficult OR predicate estimation.
>
>
> I figure that the UNION is fast if the sub-results are small (which they
> are in our case). On the contrary, when they are huge, the OUTER JOIN
> variant might be preferable.
>
>
> Is there something I can do to help here?
>
> Or do you think it's naturally application-dependent and thus should be
> solved with application logic just as we did?
>

In ideal world then plan should be independent on used form. The most
difficult is safe estimation of OR predicates. With correct estimation the
transformation to UNION form should not be necessary I am think.

Regards

Pavel


>
> Cheers,
> Sven
>


Re: [PERFORM] [HACKERS] temporary table vs array performance

2016-09-26 Thread Pavel Stehule
2016-09-26 17:39 GMT+02:00 dby...@163.com :

> test:
> create type  h3 as (id int,name char(10));
>
> CREATE or replace FUNCTION proc17()
> RETURNS SETOF h3  AS $$
> DECLARE
> v_rec h3;
> BEGIN
> create temp table abc(id int,name varchar) on commit drop;
> insert into abc select 1,'lw';
> insert into abc select 2,'lw2';
> for v_rec in
> select * from abc loop
> return next v_rec;
> end loop;
> END;
> $$
> LANGUAGE plpgsql;
>
>
> CREATE or replace FUNCTION proc16()
> RETURNS   SETOF h3 AS $$
> DECLARE
>  id_array int[];
>  name_arr varchar[];
>  v_rec h3;
> BEGIN
> id_array =array[1,2];
> name_arr=array['lw','lw2'];
> for v_rec in
> select unnest(id_array)  ,unnest(name_arr) loop
> return next v_rec;
> end loop;
> END;
> $$
> LANGUAGE plpgsql;
> postgres=# select * from proc17();
>  id |name
> +
>   1 | lw
>   2 | lw2
> (2 rows)
>
> Time: 68.372 ms
> postgres=# select * from proc16();
>  id |name
> +
>   1 | lw
>   2 | lw2
> (2 rows)
>
> Time: 1.357 ms
>
> temp talbe result:
> [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c
> 2 -j 2 -T 10 -f temporary_test_1.sql
> transaction type: Custom query
> scaling factor: 1
> query mode: prepared
> number of clients: 2
> number of threads: 2
> duration: 10 s
> number of transactions actually processed: 5173
> latency average: 3.866 ms
> tps = 517.229191 (including connections establishing)
> tps = 517.367956 (excluding connections establishing)
> statement latencies in milliseconds:
> 3.863798 select * from proc17();
>
> array result:
> [postgres@pg95 test_sql]$ pgbench -M prepared -n -r -c
> 2 -j 2 -T 10 -f arrary_test_1.sql
> transaction type: Custom query
> scaling factor: 1
> query mode: prepared
> number of clients: 2
> number of threads: 2
> duration: 10 s
> number of transactions actually processed: 149381
> latency average: 0.134 ms
> tps = 14936.875176 (including connections establishing)
> tps = 14940.234960 (excluding connections establishing)
> statement latencies in milliseconds:
> 0.132983 select * from proc16();
>
> Array is not convenient to use in function, whether
> there are other methods can be replaced temp table in function
>
>
Temporary tables are pretty expensive - from more reasons, and horrible
when you use fresh table for two rows only. More if you recreate it every
transaction.

More often pattern is create first and delete repeatedly. Better don't use
temp tables when it is necessary. It is one reason why PostgreSQL supports
a arrays. Partially - PostgreSQL arrays are analogy to T-SQL memory tables.

Regards

Pavel




>
> --
> dby...@163.com
>


Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Pavel Stehule
2016-08-26 22:26 GMT+02:00 Mike Sofen :

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Tommi K
> *Sent:* Friday, August 26, 2016 7:25 AM
> *To:* Craig James 
> *Cc:* andreas kretschmer ;
> pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Slow query with big tables
>
>
>
> Ok, sorry that I did not add the original message. I thought that it would
> be automatically added to the message thread.
>
>
>
> Here is the question again:
>
>
>
> Is there way to keep query time constant as the database size grows.
> Should I use partitioning or partial indexes?
>

try to disable nested_loop - there are bad estimations.

This query should not be fast - there are two ILIKE filters with negative
impact on estimations.

Regards

Pavel


>
>
> Thanks,
>
> Tommi Kaksonen
>
>
>
>
>
>
>
> > Hello,
>
> >
>
> > I have the following tables and query. I would like to get some help to
> find out why it is slow and how its performance could be improved.
>
> >
>
> > Thanks,
>
> > Tommi K.
>
> >
>
> >
>
> > --Table definitions---
>
> > CREATE TABLE "Measurement"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   product_id bigserial NOT NULL,
>
> >   nominal_data_id bigserial NOT NULL,
>
> >   description text,
>
> >   serial text,
>
> >   measurement_time timestamp without time zone,
>
> >   status smallint,
>
> >   system_description text,
>
> >   CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY
> (nominal_data_id)
>
> >   REFERENCES "Nominal_data" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION,
>
> >   CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)
>
> >   REFERENCES "Product" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX measurement_time_index
>
> >   ON "Measurement"
>
> >   USING btree
>
> >   (measurement_time);
>
> > ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;
>
> >
>
> > CREATE TABLE "Product"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   system_name text,
>
> >   CONSTRAINT "Product_pkey" PRIMARY KEY (id)
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> >
>
> > CREATE TABLE "Extra_info"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   measurement_id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   type text,
>
> >   value_string text,
>
> >   value_double double precision,
>
> >   value_integer bigint,
>
> >   value_bool boolean,
>
> >   CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY
> (measurement_id)
>
> >   REFERENCES "Measurement" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX extra_info_measurement_id_index
>
> >   ON "Extra_info"
>
> >   USING btree
>
> >   (measurement_id);
>
> >
>
> > CREATE TABLE "Feature"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   measurement_id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   CONSTRAINT "Feature_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)
>
> >   REFERENCES "Measurement" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX feature_measurement_id_and_name_index
>
> >   ON "Feature"
>
> >   USING btree
>
> >   (measurement_id, name COLLATE pg_catalog."default");
>
> >
>
> > CREATE INDEX feature_measurement_id_index
>
> >   ON "Feature"
>
> >   USING hash
>
> >   (measurement_id);
>
> >
>
> >
>
> > CREATE TABLE "Point"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   feature_id bigserial NOT NULL,
>
> >   x double precision,
>
> >   y double precision,
>
> >   z double precision,
>
> >   status_x smallint,
>
> >   status_y smallint,
>
> >   status_z smallint,
>
> >   difference_x double precision,
>
> >   difference_y double precision,
>
> >   difference_z double precision,
>
> >   CONSTRAINT "Point_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)
>
> >   REFERENCES "Feature" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX point_feature_id_index
>
> >   ON "Point"
>
> >   USING btree
>
> >   (feature_id);
>
> >
>
> > CREATE TABLE "Warning"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   feature_id bigserial NOT NULL,
>
> >   "number" smallint,
>
> >   info text,
>
> >   CONSTRAINT "Warning_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)
>
> >   

Re: [PERFORM] pgsql-performance issue

2016-08-20 Thread Pavel Stehule
2016-08-20 14:17 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:

> On 2016-08-20 12:05, Pavel Stehule wrote:
>
>> 2016-08-20 13:59 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>
>> On 2016-08-20 11:42, Pavel Stehule wrote:
>>>
>>> 2016-08-20 13:31 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>>
>>> On 2016-08-20 08:58, Pavel Stehule wrote:
>>> 2016-08-20 10:27 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>>
>>> On 2016-08-20 08:21, pgsql-performance-ow...@postgresql.org wrote:
>>>
>>> Welcome to the pgsql-performance mailing list!
>>> Your password at PostgreSQL Mailing Lists is
>>>
>>> x8DiA6
>>>
>>> To leave this mailing list, send the following command in the
>>> body
>>> of a message to majord...@postgresql.org:
>>>
>>> approve x8DiA6 unsubscribe pgsql-performance
>>> debasis.mohar...@ipathsolutions.co.in
>>>
>>> This command will work even if your address changes. For that
>>> reason,
>>> among others, it is important that you keep a copy of this
>>> message.
>>>
>>> To post a message to the mailing list, send it to
>>> pgsql-performance@postgresql.org
>>>
>>> If you need help or have questions about the mailing list, please
>>> contact the people who manage the list by sending a message to
>>> pgsql-performance-ow...@postgresql.org
>>>
>>> You can manage your subscription by visiting the following WWW
>>> location:
>>>
>>>
>>> <https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql
>> .org/debasis.moharana%40ipathsolutions.co.in
>>
>>> [1]
>>>
>>> [2]
>>>
>>> [1]>
>>> Dear Sir/Mam,
>>>
>>> I have a PostgreSQL 9.5 instance running on Windows 8 machine with
>>> 4GB of RAM.This server is mainly used for inserting/updating large
>>> amounts of data via copy/insert/update commands, and seldom for
>>> running select queries.
>>>
>>> Here are the relevant configuration parameters I changed:
>>>
>>> max_connections = 100
>>> shared_buffers = 512MB
>>> effective_cache_size = 3GB
>>> work_mem = 12233kB
>>> maintenance_work_mem = 256MB
>>> min_wal_size = 1GB max_wal_size = 2GB
>>> checkpoint_completion_target = 0.7
>>> wal_buffers = 16MB
>>> default_statistics_target = 100
>>>
>>> After setting in postgresql.conf. I run the select query to fetch
>>> large amount of record of 29000 in postgresql but it takes 10.3
>>> seconds but the same query takes 2 seconds for execution in MSSQL.
>>>
>>> So my query is how to improve the perfermance in postgresql.
>>>
>>
>>   hi
>>
>>   please, send execution plan of slow query
>>
>>   https://www.postgresql.org/docs/current/static/sql-explain.html [3]
>> [3]
>>  [3]
>>   https://explain.depesz.com/ [4] [4] [4]
>>
>>   p.s. Did you do VACUUM and ANALYZE on database?
>>
>>   Regards
>>
>>   Pavel
>>
>> Regards,
>>> Debasis Moharana
>>> .NET Software Developer
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance [2] [1] [2]
>>>
>>
>>   Links:
>>   --
>>   [1]
>>
>>
>> https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql.
>> org/debasis.moharana%40ipathsolutions.co.in
>> [1]
>>  [2]
>>   [2] http://www.postgresql.org/mailpref/pgsql-performance [2] [1]
>>   [3] https://www.postgresql.org/docs/current/static/sql-explain.html
>> [3]
>>  [3]
>>   [4] https://explain.depesz.com/ [4] [4]
>>
>>
>>   Hi,
>>
>>   Please check the execution plan details
>>
>>   Execution Query is = EXPLAIN (ANALYZE, BUFFERS) select * from
>>  tblPurchaseOrderstock cross join tblPurchaseOrderInfo;
>>
>>   "Nested Loop  (cost=0.00..507.51 rows=39593 width=224) (actual
>>  time=0.032..13.026 rows=39593 loops=1)"
>>   "  Buffers: shared read=8"
>>   "  I/O Timings: read=0.058"
>>   "  ->  Seq Scan on tblpurchaseorderstock  (cost=0.00..7.89 rows=289
>>  width=95) (actual time=0.014..0.082 rows=289 loops=1)"
>>   "Buffers: shared read=5"
>>   "

Re: [PERFORM] pgsql-performance issue

2016-08-20 Thread Pavel Stehule
2016-08-20 13:59 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:

> On 2016-08-20 11:42, Pavel Stehule wrote:
>
>> 2016-08-20 13:31 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>
>> On 2016-08-20 08:58, Pavel Stehule wrote:
>>> 2016-08-20 10:27 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>>
>>> On 2016-08-20 08:21, pgsql-performance-ow...@postgresql.org wrote:
>>>
>>> Welcome to the pgsql-performance mailing list!
>>> Your password at PostgreSQL Mailing Lists is
>>>
>>> x8DiA6
>>>
>>> To leave this mailing list, send the following command in the
>>> body
>>> of a message to majord...@postgresql.org:
>>>
>>> approve x8DiA6 unsubscribe pgsql-performance
>>> debasis.mohar...@ipathsolutions.co.in
>>>
>>> This command will work even if your address changes. For that
>>> reason,
>>> among others, it is important that you keep a copy of this
>>> message.
>>>
>>> To post a message to the mailing list, send it to
>>> pgsql-performance@postgresql.org
>>>
>>> If you need help or have questions about the mailing list, please
>>> contact the people who manage the list by sending a message to
>>> pgsql-performance-ow...@postgresql.org
>>>
>>> You can manage your subscription by visiting the following WWW
>>> location:
>>>
>>
>> <https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql
>> .org/debasis.moharana%40ipathsolutions.co.in
>> [2]
>>
>> [1]>
>>>>
>>> Dear Sir/Mam,
>>>
>>> I have a PostgreSQL 9.5 instance running on Windows 8 machine with
>>> 4GB of RAM.This server is mainly used for inserting/updating large
>>> amounts of data via copy/insert/update commands, and seldom for
>>> running select queries.
>>>
>>> Here are the relevant configuration parameters I changed:
>>>
>>> max_connections = 100
>>> shared_buffers = 512MB
>>> effective_cache_size = 3GB
>>> work_mem = 12233kB
>>> maintenance_work_mem = 256MB
>>> min_wal_size = 1GB max_wal_size = 2GB
>>> checkpoint_completion_target = 0.7
>>> wal_buffers = 16MB
>>> default_statistics_target = 100
>>>
>>> After setting in postgresql.conf. I run the select query to fetch
>>> large amount of record of 29000 in postgresql but it takes 10.3
>>> seconds but the same query takes 2 seconds for execution in MSSQL.
>>>
>>> So my query is how to improve the perfermance in postgresql.
>>>
>>
>>  hi
>>
>>  please, send execution plan of slow query
>>
>>  https://www.postgresql.org/docs/current/static/sql-explain.html [3]
>> [3]
>>  https://explain.depesz.com/ [4] [4]
>>
>>  p.s. Did you do VACUUM and ANALYZE on database?
>>
>>  Regards
>>
>>  Pavel
>>
>> Regards,
>>> Debasis Moharana
>>> .NET Software Developer
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance [1] [2]
>>>
>>
>>  Links:
>>  --
>>  [1]
>>
>> https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql.
>> org/debasis.moharana%40ipathsolutions.co.in
>> [2]
>>  [2] http://www.postgresql.org/mailpref/pgsql-performance [1]
>>  [3] https://www.postgresql.org/docs/current/static/sql-explain.html
>> [3]
>>  [4] https://explain.depesz.com/ [4]
>>
>>
>>  Hi,
>>
>>  Please check the execution plan details
>>
>>  Execution Query is = EXPLAIN (ANALYZE, BUFFERS) select * from
>> tblPurchaseOrderstock cross join tblPurchaseOrderInfo;
>>
>>  "Nested Loop  (cost=0.00..507.51 rows=39593 width=224) (actual
>> time=0.032..13.026 rows=39593 loops=1)"
>>  "  Buffers: shared read=8"
>>  "  I/O Timings: read=0.058"
>>  "  ->  Seq Scan on tblpurchaseorderstock  (cost=0.00..7.89 rows=289
>> width=95) (actual time=0.014..0.082 rows=289 loops=1)"
>>  "Buffers: shared read=5"
>>  "I/O Timings: read=0.040"
>>  "  ->  Materialize  (cost=0.00..5.05 rows=137 width=129) (actual
>> time=0.000..0.006 rows=137 loops=289)"
>>  "Buffers: shared read=3"
>>  "I/O Timings: read=0.019"
>>  "

Re: [PERFORM] pgsql-performance issue

2016-08-20 Thread Pavel Stehule
2016-08-20 13:31 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:

> On 2016-08-20 08:58, Pavel Stehule wrote:
>
>> 2016-08-20 10:27 GMT+02:00 <debasis.mohar...@ipathsolutions.co.in>:
>>
>> On 2016-08-20 08:21, pgsql-performance-ow...@postgresql.org wrote:
>>>
>>> Welcome to the pgsql-performance mailing list!
>>>> Your password at PostgreSQL Mailing Lists is
>>>>
>>>> x8DiA6
>>>>
>>>> To leave this mailing list, send the following command in the
>>>> body
>>>> of a message to majord...@postgresql.org:
>>>>
>>>> approve x8DiA6 unsubscribe pgsql-performance
>>>> debasis.mohar...@ipathsolutions.co.in
>>>>
>>>> This command will work even if your address changes. For that
>>>> reason,
>>>> among others, it is important that you keep a copy of this
>>>> message.
>>>>
>>>> To post a message to the mailing list, send it to
>>>> pgsql-performance@postgresql.org
>>>>
>>>> If you need help or have questions about the mailing list, please
>>>> contact the people who manage the list by sending a message to
>>>> pgsql-performance-ow...@postgresql.org
>>>>
>>>> You can manage your subscription by visiting the following WWW
>>>> location:
>>>>
>>>>
>>>>
>>> <https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql
>> .org/debasis.moharana%40ipathsolutions.co.in
>>
>>> [1]>
>>>>
>>> Dear Sir/Mam,
>>>
>>> I have a PostgreSQL 9.5 instance running on Windows 8 machine with
>>> 4GB of RAM.This server is mainly used for inserting/updating large
>>> amounts of data via copy/insert/update commands, and seldom for
>>> running select queries.
>>>
>>> Here are the relevant configuration parameters I changed:
>>>
>>> max_connections = 100
>>> shared_buffers = 512MB
>>> effective_cache_size = 3GB
>>> work_mem = 12233kB
>>> maintenance_work_mem = 256MB
>>> min_wal_size = 1GB max_wal_size = 2GB
>>> checkpoint_completion_target = 0.7
>>> wal_buffers = 16MB
>>> default_statistics_target = 100
>>>
>>> After setting in postgresql.conf. I run the select query to fetch
>>> large amount of record of 29000 in postgresql but it takes 10.3
>>> seconds but the same query takes 2 seconds for execution in MSSQL.
>>>
>>> So my query is how to improve the perfermance in postgresql.
>>>
>>
>> hi
>>
>> please, send execution plan of slow query
>>
>> https://www.postgresql.org/docs/current/static/sql-explain.html [3]
>> https://explain.depesz.com/ [4]
>>
>> p.s. Did you do VACUUM and ANALYZE on database?
>>
>> Regards
>>
>> Pavel
>>
>> Regards,
>>> Debasis Moharana
>>> .NET Software Developer
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance [2]
>>>
>>
>>
>>
>> Links:
>> --
>> [1]
>> https://lists.postgresql.org/mj/mj_wwwusr/domain=postgresql.
>> org/debasis.moharana%40ipathsolutions.co.in
>> [2] http://www.postgresql.org/mailpref/pgsql-performance
>> [3] https://www.postgresql.org/docs/current/static/sql-explain.html
>> [4] https://explain.depesz.com/
>>
>
>
>
> Hi,
>
> Please check the execution plan details
>
>
> Execution Query is = EXPLAIN (ANALYZE, BUFFERS) select * from
> tblPurchaseOrderstock cross join tblPurchaseOrderInfo;
>
> "Nested Loop  (cost=0.00..507.51 rows=39593 width=224) (actual
> time=0.032..13.026 rows=39593 loops=1)"
> "  Buffers: shared read=8"
> "  I/O Timings: read=0.058"
> "  ->  Seq Scan on tblpurchaseorderstock  (cost=0.00..7.89 rows=289
> width=95) (actual time=0.014..0.082 rows=289 loops=1)"
> "Buffers: shared read=5"
> "I/O Timings: read=0.040"
> "  ->  Materialize  (cost=0.00..5.05 rows=137 width=129) (actual
> time=0.000..0.006 rows=137 loops=289)"
> "Buffers: shared read=3"
> "I/O Timings: read=0.019"
> "->  Seq Scan on tblpurchaseorderinfo  (cost=0.00..4.37 rows=137
> width=129) (actual time=0.011..0.035 rows=137 loops=1)"
> "  Buffers: shared read=3"
> "  I/O Timings: read=0.019"
> "Planning time: 56.052 ms"
> "Execution time: 14.038 ms"
>

It is same query? It needs only 14ms

Regards

Pavel


>
> Regards,
> Debasis Moharana
>


Re: [PERFORM] pgsql-performance issue

2016-08-20 Thread Pavel Stehule
2016-08-20 10:27 GMT+02:00 :

> On 2016-08-20 08:21, pgsql-performance-ow...@postgresql.org wrote:
>
>> Welcome to the pgsql-performance mailing list!
>> Your password at PostgreSQL Mailing Lists is
>>
>> x8DiA6
>>
>> To leave this mailing list, send the following command in the body
>> of a message to majord...@postgresql.org:
>>
>> approve x8DiA6 unsubscribe pgsql-performance
>> debasis.mohar...@ipathsolutions.co.in
>>
>> This command will work even if your address changes.  For that reason,
>> among others, it is important that you keep a copy of this message.
>>
>> To post a message to the mailing list, send it to
>>   pgsql-performance@postgresql.org
>>
>> If you need help or have questions about the mailing list, please
>> contact the people who manage the list by sending a message to
>>   pgsql-performance-ow...@postgresql.org
>>
>> You can manage your subscription by visiting the following WWW location:
>>
>> > .org/debasis.moharana%40ipathsolutions.co.in>
>>
> Dear Sir/Mam,
>
> I have a PostgreSQL 9.5 instance running on Windows 8 machine with 4GB of
> RAM.This server is mainly used for inserting/updating large amounts of data
> via copy/insert/update commands, and seldom for running select queries.
>
> Here are the relevant configuration parameters I changed:
>
> max_connections = 100
> shared_buffers = 512MB
> effective_cache_size = 3GB
> work_mem = 12233kB
> maintenance_work_mem = 256MB
> min_wal_size = 1GB max_wal_size = 2GB
> checkpoint_completion_target = 0.7
> wal_buffers = 16MB
> default_statistics_target = 100
>
> After setting in postgresql.conf. I run the select query to fetch large
> amount of record of 29000 in postgresql but it takes 10.3 seconds but the
> same query takes 2 seconds for execution in MSSQL.
>
> So my query is how to improve the perfermance in postgresql.
>

hi

please, send execution plan of slow query

https://www.postgresql.org/docs/current/static/sql-explain.html
https://explain.depesz.com/

p.s. Did you do VACUUM and ANALYZE on database?

Regards

Pavel

>
> Regards,
> Debasis Moharana
> .NET Software Developer
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Logging queries using sequential scans

2016-08-10 Thread Pavel Stehule
Hi

2016-08-10 13:13 GMT+02:00 Ivan Voras :

> Hello,
>
> Is it possible to log queries using sequential scans? Or possibly every
> query in a way which allows grepping for those with sequential scans?
>
>
>
you can log execution plan with auto_explain extension

 https://www.postgresql.org/docs/current/static/auto-explain.html

Then you can grep the queries with seq scan

Regards

Pavel


Re: [PERFORM] Big number of connections

2016-04-04 Thread Pavel Stehule
2016-04-04 16:43 GMT+02:00 Moreno Andreo <moreno.and...@evolu-s.it>:

> Il 04/04/2016 15:33, Pavel Stehule ha scritto:
>
>>
>>
>> PostgreSQL doesn't contain integrated pooler - so any connection to
>> Postgres enforces one PostgreSQL proces. A performance benchmarks is
>> showing maximum performance about 10x cores.  With high number of
>> connections you have to use low size of work_mem, what enforces can have
>> negative impact on performance too. Too high number of active PostgreSQL
>> processes increase a risk of performance problems with spin locks, etc.
>>
>
> :-O
> I wasn't absolutely aware of this thing... is there a way to monitor
> active connections, or at least to report when they grow too much?
> (say, I have an 8-core system and want to track down if, and when, active
> connections grow over 80)
>

100 connections are probably ok, 200 is over the optimum - there is some
tolerance.

We are speaking about optimum - I had possibility to work with system where
max connections was 300, 600 - and it was working. But then the
max_connection doesn't work as safeguard against overloading. And the
system under higher load can be pretty slow.

Regards

Pavel


>
> Thanks
> Moreno.-
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Big number of connections

2016-04-04 Thread Pavel Stehule
Hi

2016-04-04 15:14 GMT+02:00 Mike Sofen :

> From: Jim Nasby Sent: Sunday, April 03, 2016 10:19 AM
>
> >>On 4/1/16 2:54 AM, jarek wrote:
> >> I'll be happy to hear form users of big PostgreSQL installations, how
> >> many users do you have and what kind of problems we may expect.
> >> Is there any risk, that huge number of roles will slowdown overall
> >> performance ?
>
> >Assuming you're on decent sized hardware though, 3000-4000 open
> connections shouldn't be much of an >issue *as long as very few are active
> at once*. If you get into a situation where there's a surge of activity
> >and you suddenly have 2x more active connections than cores, you won't be
> happy. I've seen that push >servers into a state where the only way to
> recover was to disconnect everyone.
> >--
> >Jim Nasby
>
> Jim - I don't quite understand the math here: on a server with 20 cores,
> it can only support 40 active users?
>
> I come from the SQL Server world where a single 20 core server could
> support hundreds/thousands of active users and/or many dozens of
> background/foreground data processes.  Is there something fundamentally
> different between the two platforms relative to active user loads?  How
> would we be able to use Postgres for larger web apps?
>

PostgreSQL doesn't contain integrated pooler - so any connection to
Postgres enforces one PostgreSQL proces. A performance benchmarks is
showing maximum performance about 10x cores.  With high number of
connections you have to use low size of work_mem, what enforces can have
negative impact on performance too. Too high number of active PostgreSQL
processes increase a risk of performance problems with spin locks, etc.

Usually Web frameworks has own pooling solution - so just use it. If you
need more logical connection than is optimum against number of cores, then
you should to use external pooler like pgpool II or pgbouncer.

http://www.pgpool.net/mediawiki/index.php/Main_Page
http://pgbouncer.github.io/

Pgbouncer is light with only necessary functions, pgpool is little bit
heavy with lot of functions.

Regards

Pavel


>
> Mike Sofen
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Nested Loop vs Hash Join based on predicate?

2016-03-19 Thread Pavel Stehule
2016-03-16 21:23 GMT+01:00 Doiron, Daniel :

> I have the following queries:
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select[…]
> from  f_calc_service   a11,
> d_patient_typea12
> where   a11.d_patient_pop_id in (336)
>  and a11.d_patient_type_id = a12.id
>  and a12.short_name = 'I'
> group by  a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> And
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select […]
> from  f_calc_service   a11,
> d_patient_typea12
> where   a11.d_patient_pop_id in (336)
>  and a11.d_patient_type_id = a12.id
>  and a12.short_name = 'O'
> group by  a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> Making this one change from short_name = ‘I’ to short_name = ‘O’ changes
> the query execution from 200k ms to 280ms. The first one chooses a Nested
> Loop, the second chooses a hash join. How do I get them both to choose the
> same? There are no values for d_patient_pop_id in (336) and short_name =
> ‘I’.
>

we don't see plans, so it is blind shot,

Probably the estimation for 'I' value is pretty underestimated - so planner
choose nested loop. The reasons can be different - possible correlation
inside data for example.

You can try:

0) ensure so your statistic are current - run statement ANALYZE

a) increase statistic by statement ALTER TABLE xx ALTER COLUMN yyy SET
STATISTICS some number

b) penalize nested loop - statement SET enable_nestloop TO off;

Regards

Pavel


>
> Thanks!
>
> Dan
>
>
>


Re: [PERFORM] autovacuum disk IO

2016-03-02 Thread Pavel Stehule
Hi

2016-03-02 16:25 GMT+01:00 Artem Tomyuk :

> Hi.
>
> I've noticed that autovac. process worked more than 10 minutes, during
> this zabbix logged more than 90% IO disk utilization on db volume
>
> ===>29237   2016-03-02 15:17:23 EET 0 [24-1]LOG:  automatic 
> vacuum of table "lb_upr.public._reference32": index scans: 1
>   pages: 0 removed, 263307 remain
>   tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable
>   buffer usage: 67814 hits, 265465 misses, 15647 dirtied
>   avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s
>   *system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec*
>
> Is it possible to log autovac. io impact during it execution?
> Is there any way to limit or "nice" autovac. process?
>
> Thanks to all for any help.
>
>
maybe offtopic - there is known problem of Zabbix. Any limits for vacuum
are usually way to hell.

But more times the partitioning helps to Zabbix

https://www.zabbix.org/wiki/Higher_performant_partitioning_in_PostgreSQL

Regards

Pavel


Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-01-29 Thread Pavel Stehule
Hi



> I ran operf on both backends, and they look quite similar, except that the
> number of samples is different (this is "opreport -c" output):
>
> CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 9
> samples  %image name   symbol name
>
> ---
>   112   0.0019  postgres ExecProcNode
>   3020116  49.9904  postgres ExecScanHashBucket
>   3021162  50.0077  postgres ExecHashJoin
> 3020116  92.8440  postgres ExecScanHashBucket
>   3020116  49.9207  postgres ExecScanHashBucket [self]
>   3020116  49.9207  postgres ExecScanHashBucket
>   8190  0.1354  vmlinux  apic_timer_interrupt
>
> What could be an explanation for this?
> Is this known behaviour?
>

one issue was fixed in 9.5

large hash table can introduce a lot of outs from L1, L2 caches.

Pavel


>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Plan differences

2015-12-31 Thread Pavel Stehule
Hi


> Does anyone have any ideas? All data are loaded into this table via copy
> and no updates are done. Autovacuum settings weren't changed (and is on
> both). Do I need to increase shared_buffers to half of available memory for
> the planner to make certain optimisations? Anything else I'm missing or can
> try? The new server has been running for almost two weeks now so I would
> have thought things would have had a chance to settle down.
>
>
It is looking like some missing optimization that was removed from RC
release.

Regards

Pavel


> Cheers,
> Anton
>
>


Re: [PERFORM] Connections "Startup"

2015-12-22 Thread Pavel Stehule
Hi

2015-12-22 8:59 GMT+01:00 Artem Tomyuk :

> Hi.
>
> I've noticed huge decrease in performance.
> During this in htop i see a lot (200 - 300) of connections in state
> "startup", each of them eats 3-3% of CPU time. This processes are not
> visible in pg_stat_activity so i cant understand what they are doing, and i
> cant kill them. I cant see the bottleneck in Disk IO to. The logs of
> postgres says nothing to. I am confused.
> What can be the cause of  huge amount of "startup" connections
> Maybe its better to start use connection pooler such as pgbouncer?
> Thanks a lot.
>

What is your max_connections? Can you ran "perf top" ? What is there.

Too high number can enforce system overloading. You cannot to see these
connections in pg_stat_activity because the process in this state isn't
fully initialized.

There was lot of bugfix releases after 9.1.2 - currently there is
PostgreSQL 9.2.19. Try to upgrade first.

Regards

Pavel


>
> PS.
> Server config is:
> 2 * Intel Xeon 2660 CPU with 64 gigs of RAM.
> Hardware RAID10.
> Centos 6.6, PostgreSQL 9.1.2
>
>
>
>


Re: [ADMIN] [PERFORM] Connections "Startup"

2015-12-22 Thread Pavel Stehule
2015-12-23 4:52 GMT+01:00 Om Prakash Jaiswal <op1...@yahoo.co.in>:

>
> *Postgres is designed in this way. It can handle such problem by adopting the 
> following steps: *
>
> 1.Increase the kernal level parameters:
> shmmax and shmall
> example for 2GB RAM size for postgres processing is below
>
> *vi /etc/sysctl.conf*
>
> kernel.shmmax = 2147483648
> kernel.shmall = 2883584
>
> similar way you increase the configuration paramater for half of RAM size of 
> your machine.
>
> 2. Edit your postgresql.conf file following settings:
> a. Increase the number of connection parameter.
>  Connection = 500
> b.Effective_cache_size = 2GB
> c. Shared_memory = 500MB
>
>
increasing max connection when you have these strange issues isn't good
advice. Running 500 connections on 2GB server is highly risky.

Pavel


>
>
>
>
>
>
>
> On Wednesday, 23 December 2015 8:04 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
>
>
> On 12/22/15 2:09 AM, Pavel Stehule wrote:
>
> >
> > There was lot of bugfix releases after 9.1.2 - currently there is
> > PostgreSQL 9.2.19.
>
>
> I'm sure Pavel meant 9.1.19, not 9.2.19.
>
> In any case, be aware that 9.1 goes end of life next year. You should
> start planning on a major version upgrade now if you haven't already.
> 9.5 should release in January so you might want to wait for that version.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> <http://bluetreble.com/>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-ad...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>
>
>


Re: [PERFORM] Estimation row error

2015-12-18 Thread Pavel Stehule
Hi

2015-12-18 16:21 GMT+01:00 Mathieu VINCENT :

> Hello,
>
> No one to help me to understand this bad estimation rows ?
> It's *NOT* caused by :
>
>- correlation between columns (cross-correlation)
>- bad statistics (i tried with  default_statistics_target to 10 000)
>- bad number of distinct values
>- complexe join conditions
>
> I have no more ideas.
>

PostgreSQL has not cross tables statistics - so expect uniform distribution
of foreign keys.  This expectation is broken in your example.

You can find some prototype solutions by Tomas Vondra in hackars mailing
list.

Regards

Pavel


>
> thank you for your help.
> Mathieu VINCENT
>
> 2015-12-17 11:58 GMT+01:00 Mathieu VINCENT 
> :
>
>> Adding foreign key between on t2 and t3, does not change the plan.
>>
>> drop table if exists t1;
>> drop table if exists t2;
>> drop table if exists t3;
>>
>> create table t1 as select generate_Series(1,20) as c1;
>> create table t2 as select generate_Series(1,20)%100+1 as c1;
>> create table t3 as select generate_Series(1,1500)%750+1 as c1;
>>
>> alter table t1 add PRIMARY KEY (c1);
>> create index on t2 (c1);
>> create index on t3 (c1);
>> ALTER TABLE t2  ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
>> ALTER TABLE t3  ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);
>>
>> analyze verbose t1;
>> analyze verbose t2;
>> analyze verbose t3;
>>
>> EXPLAIN (analyze on, buffers on, verbose on)
>> select
>> *
>> from
>> t1 t1
>> inner join t2 on t1.c1=t2.c1
>> inner join t3 on t1.c1=t3.c1
>>
>> Cordialement,
>> PSIH Décisionnel en santé
>> Mathieu VINCENT
>> Data Analyst
>> PMSIpilot - 61 rue Sully - 69006 Lyon - France
>>
>> 2015-12-17 11:37 GMT+01:00 Mathieu VINCENT > >:
>>
>>> Here, another issue with row estimate.
>>> And, in this example, there is not correlation beetween columns in a
>>> same table.
>>>
>>> drop table if exists t1;
>>> drop table if exists t2;
>>> drop table if exists t3;
>>>
>>> create table t1 as select generate_Series(1,20) as c1;
>>> create table t2 as select generate_Series(1,20)%100 as c1;
>>> create table t3 as select generate_Series(1,1500)%750 as c1;
>>>
>>> alter table t1 add PRIMARY KEY (c1);
>>> create index on t2 (c1);
>>> create index on t3 (c1);
>>>
>>> analyze verbose t1;
>>> analyze verbose t2;
>>> analyze verbose t3;
>>>
>>> EXPLAIN (analyze on, buffers on, verbose on)
>>> select
>>> *
>>> from
>>> t1 t1
>>> inner join t2 on t1.c1=t2.c1
>>> inner join t3 on t2.c1=t3.c1
>>> the explain plan : http://explain.depesz.com/s/YVw
>>> Do you understand how postgresql calculate the row estimate ?
>>>
>>> BR
>>> Mathieu VINCENT
>>>
>>> 2015-12-17 10:14 GMT+01:00 Matteo Grolla :
>>>
 Thank you both for the help!
 happy holidays

 2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <
 mathieu.vinc...@pmsipilot.com>:

> thks Gunnar,
>
> I removed the correlation between t3.c1 and t3.c2 in this sql script :
>
> drop table if exists t1;
> drop table if exists t2;
> drop table if exists t3;
> drop table if exists t4;
>
> create table t1 as select generate_Series(1,30) as c1;
> create table t2 as select generate_Series(1,400) as c1;
> create table t3 as select floor(random()*100+1) as c1, c2 from
> generate_Series(1,20) c2;
> create table t4 as select generate_Series(1,20) as c1;
>
> alter table t1 add PRIMARY KEY (c1);
> alter table t2 add PRIMARY KEY (c1);
> alter table t3 add PRIMARY KEY (c1,c2);
> create index on t3 (c1);
> create index on t3 (c2);
> alter table t4 add PRIMARY KEY (c1);
>
> analyze verbose t1;
> analyze verbose t2;
> analyze verbose t3;
> analyze verbose t4;
>
> EXPLAIN (analyze on, buffers on, verbose on)
> select
> *
> from
> t1 t1
> inner join t2 on t1.c1=t2.c1
> inner join t3 on t2.c1=t3.c1
> inner join t4 on t3.c2=t4.c1
>
> Now, the estimate is good : http://explain.depesz.com/s/gCX
>
> Have a good day
>
> Mathieu VINCENT
>
> 2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <
> gunnar.bluth.ext...@elster.de>:
>
>> Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
>> > Gunnar Nick Bluth  wrote:
>> >
>> >> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>> >>> Hello,
>> >>>
>> >>> No one to help me to understand this bad estimation rows ?
>> >>
>> >> Well,
>> >>
>> >> on a rather beefy machine, I'm getting quite a different plan:
>> >> http://explain.depesz.com/s/3y5r
>> >
>> > you are using 9.5, right? Got the same plan with 9.5.
>>
>> Nope...:
>>   version
>>
>>
>> 

Re: [PERFORM] Recursive query performance issue

2015-10-23 Thread Pavel Stehule
 81980 postgres  20   0 65.7g 528m 515m R73  0.0   0:06.02 postgres:
> user1 db 0.0.0.3(44671) SELECT
>  82007 postgres  20   0 65.7g 523m 510m R73  0.0   0:06.27 postgres:
> user1 db 0.0.0.3(44676) SELECT
>  82374 postgres  20   0 65.7g 367m 362m R73  0.0   0:03.48 postgres:
> user1 db 0.0.0.6(60873) SELECT
>  82385 postgres  20   0 65.7g 310m 306m R73  0.0   0:03.03 postgres:
> user1 db 0.0.0.6(60876) SELECT
>  82520 postgres  20   0 65.7g 220m 215m R73  0.0   0:02.00 postgres:
> user1 db 0.0.0.3(44785) SELECT
>  82676 postgres  20   0 65.7g 116m 111m R73  0.0   0:00.90 postgres:
> user1 db 0.0.0.2(52531) SELECT
>  18471 postgres  20   0 65.7g  73m  56m R69  0.0   8:14.08 postgres:
> user1 db 0.0.0.6(46144) SELECT
>  43890 postgres  20   0 65.7g  76m  56m R69  0.0   5:04.46 postgres:
> user1 db 0.0.0.3(36697) SELECT
>  46130 postgres  20   0 65.7g  70m  57m R69  0.0   4:46.56 postgres:
> user1 db 0.0.0.4(41871) SELECT
>  55604 postgres  20   0 65.7g  81m  57m R69  0.0   3:27.67 postgres:
> user1 db 0.0.0.3(39292) SELECT
>  59139 postgres  20   0 65.7g  81m  57m R69  0.0   3:01.18 postgres:
> user1 db 0.0.0.2(47670) SELECT
>  63523 postgres  20   0 65.7g  80m  56m R69  0.0   2:28.04 postgres:
> user1 db 0.0.0.2(48680) SELECT
>  81707 postgres  20   0 65.7g 528m 515m S69  0.0   0:08.44 postgres:
> user1 db 0.0.0.6(60737) SELECT
>  81830 postgres  20   0 65.7g 523m 510m R69  0.0   0:07.60 postgres:
> user1 db 0.0.0.4(49707) SELECT
>  81932 postgres  20   0 65.7g 528m 515m R69  0.0   0:06.65 postgres:
> user1 db 0.0.0.2(52352) SELECT
>  81950 postgres  20   0 65.7g 528m 515m R69  0.0   0:05.92 postgres:
> user1 db 0.0.0.6(60783) SELECT
>  81973 postgres  20   0 65.7g 522m 510m R69  0.0   0:06.18 postgres:
> user1 db 0.0.0.6(60789) SELECT
>  82193 postgres  20   0 65.7g 487m 479m R69  0.0   0:04.61 postgres:
> user1 db 0.0.0.2(52415) SELECT
>  82358 postgres  20   0 65.7g 299m 295m R69  0.0   0:03.11 postgres:
> user1 db 0.0.0.2(52453) SELECT
>  82372 postgres  20   0 65.7g 318m 313m R69  0.0   0:03.22 postgres:
> user1 db 0.0.0.4(49827) SELECT
>  82381 postgres  20   0 65.7g 331m 326m R69  0.0   0:03.30 postgres:
> user1 db 0.0.0.3(44757) SELECT
>  82404 postgres  20   0 65.7g 294m 289m R69  0.0   0:02.86 postgres:
> user1 db 0.0.0.3(44761) SELECT
>  82415 postgres  20   0 65.7g 270m 266m R69  0.0   0:02.80 postgres:
> user1 db 0.0.0.3(44767) SELECT
>  82521 postgres  20   0 65.7g 209m 205m R69  0.0   0:02.00 postgres:
> user1 db 0.0.0.3(44786) SELECT
>  82526 postgres  20   0 65.7g  35m  29m R69  0.0   0:01.20 postgres:
> user1 db 0.0.0.6(60906) SELECT
>  82550 postgres  20   0 65.7g 188m 184m R69  0.0   0:01.72 postgres:
> user1 db 0.0.0.4(49870) SELECT
>  82587 postgres  20   0 65.7g 183m 178m R69  0.0   0:01.64 postgres:
> user1 db 0.0.0.4(49882) SELECT
>  82683 postgres  20   0 65.7g  97m  93m R69  0.0   0:00.77 postgres:
> user1 db 0.0.0.4(49899) SELECT
>  82685 postgres  20   0 65.7g 103m  99m R69  0.0   0:00.84 postgres:
> user1 db 0.0.0.2(52532) SELECT
>  82687 postgres  20   0 65.7g 109m 104m R69  0.0   0:00.85 postgres:
> user1 db 0.0.0.3(44809) SELECT
>  82712 postgres  20   0 65.7g  68m  64m R69  0.0   0:00.55 postgres:
> user1 db 0.0.0.3(44814) SELECT
>  82715 postgres  20   0 65.7g  75m  70m R69  0.0   0:00.58 postgres:
> user1 db 0.0.0.4(49905) SELECT
>  19548 postgres  20   0 65.7g  79m  56m R65  0.0   8:02.44 postgres:
> user1 db 0.0.0.2(37887) SELECT
>  36714 postgres  20   0 65.7g  80m  56m R65  0.0   5:56.08 postgres:
> user1 db 0.0.0.3(35177) SELECT
>  43599 postgres  20   0 65.7g  80m  56m R65  0.0   5:05.03 postgres:
> user1 db 0.0.0.3(36638) SELECT
>
> -Original Message-
> From: Merlin Moncure [mailto:mmonc...@gmail.com]
> Sent: Wednesday, October 21, 2015 12:50 PM
> To: Pavel Stehule
> Cc: Jamie Koceniak; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Recursive query performance issue
>
> On Wed, Oct 21, 2015 at 2:45 PM, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
> > 2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
> >>
> >> adama_prod=# SHOW shared_buffers;
> >>
> >> shared_buffers
> >>
> >> 
> >>
> >> 64GB
> >
> >
> > can you try to increase shared buffers to 200GB and decrease effective
> > cache size to 180GB? If it is possibly - I am not sure, if this
> > setting is good fro production usage, but the result can be
> > interesting for bottleneck identification.
>
> we need to see a snapshot from
> *) top
> *) perf top
>
> merlin
>


Re: [PERFORM] Recursive query performance issue

2015-10-21 Thread Pavel Stehule
2015-10-21 21:32 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:

> adama_prod=# SHOW shared_buffers;
>
> shared_buffers
>
> 
>
> 64GB
>

can you try to increase shared buffers to 200GB and decrease effective
cache size to 180GB? If it is possibly - I am not sure, if this setting is
good fro production usage, but the result can be interesting for bottleneck
identification.

>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Sent:* Wednesday, October 21, 2015 12:26 PM
>
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
>
>
>
>
> 2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Hi Pavel,
>
>
>
> Or were you referring to SHMMAX?
>
>
>
> value of shared_buffers - run SQL statements SHOW shared_buffers;
>
> Regards
>
> Pavel
>
>
>
> Thanks
>
>
>
> *From:* Jamie Koceniak
> *Sent:* Wednesday, October 21, 2015 11:40 AM
> *To:* 'Pavel Stehule'
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* RE: [PERFORM] Recursive query performance issue
>
>
>
> Ok
>
>
>
> df -h /dev/shm
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> tmpfs   406G 0  406G   0% /run/shm
>
>
>
> Ok I will try lowering it.
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com
> <pavel.steh...@gmail.com>]
> *Sent:* Wednesday, October 21, 2015 11:24 AM
>
>
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
>
>
>
>
> 2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Hi Pavel,
>
>
>
> Thanks for the reply.
>
>
>
> 1. The queries aren’t waiting on any locks.
>
> The query has a recursive join that uses a table with only 80k records and
> that table is not updated often.
>
>
>
> 2. The I/O load was not high. CPU utilization was very high and load was
> very high.
>
> We have a large effective_cache_size = 512GB (25% of total memory)
>
>
>
> so your server has 2TB RAM? It is not usual server - so this issue can be
> pretty strange :(
>
> What is size of shared memory? Probably is significantly lower than
> effective_cache_size? Try to reduce effective cache size to be lower than
> shared buffers
>
> Regards
>
> Pavel
>
>
>
>
>
>
> Thanks,
>
> Jamie
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Sent:* Wednesday, October 21, 2015 12:04 AM
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
> Hi
>
>
>
> 2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Version:
>
>
> ---
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
>
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
>
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.
>
>
>
>
>
> there can be few reasons:
>
> 1. locking - are you sure, so your queries don't wait on locks?
>
> 2. issues with cache stability - is there high IO load? You can try to
> increase effective_cache_size (or decrease if you have not enough memory)
>
> Regards
>
> Pavel
>
>
>
>
>
>
>
>
>
>
>


Re: [PERFORM] Recursive query performance issue

2015-10-21 Thread Pavel Stehule
2015-10-21 20:51 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:

> Hi Pavel,
>
>
>
> Or were you referring to SHMMAX?
>

value of shared_buffers - run SQL statements SHOW shared_buffers;

Regards

Pavel

>
>
> Thanks
>
>
>
> *From:* Jamie Koceniak
> *Sent:* Wednesday, October 21, 2015 11:40 AM
> *To:* 'Pavel Stehule'
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* RE: [PERFORM] Recursive query performance issue
>
>
>
> Ok
>
>
>
> df -h /dev/shm
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> tmpfs   406G 0  406G   0% /run/shm
>
>
>
> Ok I will try lowering it.
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com
> <pavel.steh...@gmail.com>]
> *Sent:* Wednesday, October 21, 2015 11:24 AM
>
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
>
>
>
>
> 2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Hi Pavel,
>
>
>
> Thanks for the reply.
>
>
>
> 1. The queries aren’t waiting on any locks.
>
> The query has a recursive join that uses a table with only 80k records and
> that table is not updated often.
>
>
>
> 2. The I/O load was not high. CPU utilization was very high and load was
> very high.
>
> We have a large effective_cache_size = 512GB (25% of total memory)
>
>
>
> so your server has 2TB RAM? It is not usual server - so this issue can be
> pretty strange :(
>
> What is size of shared memory? Probably is significantly lower than
> effective_cache_size? Try to reduce effective cache size to be lower than
> shared buffers
>
> Regards
>
> Pavel
>
>
>
>
>
>
> Thanks,
>
> Jamie
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Sent:* Wednesday, October 21, 2015 12:04 AM
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
> Hi
>
>
>
> 2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Version:
>
>
> ---
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
>
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
>
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.
>
>
>
>
>
> there can be few reasons:
>
> 1. locking - are you sure, so your queries don't wait on locks?
>
> 2. issues with cache stability - is there high IO load? You can try to
> increase effective_cache_size (or decrease if you have not enough memory)
>
> Regards
>
> Pavel
>
>
>
>
>
>
>
>
>


Re: [PERFORM] Recursive query performance issue

2015-10-21 Thread Pavel Stehule
2015-10-21 19:55 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:

> Hi Pavel,
>
>
>
> Thanks for the reply.
>
>
>
> 1. The queries aren’t waiting on any locks.
>
> The query has a recursive join that uses a table with only 80k records and
> that table is not updated often.
>
>
>
> 2. The I/O load was not high. CPU utilization was very high and load was
> very high.
>
> We have a large effective_cache_size = 512GB (25% of total memory)
>

so your server has 2TB RAM? It is not usual server - so this issue can be
pretty strange :(

What is size of shared memory? Probably is significantly lower than
effective_cache_size? Try to reduce effective cache size to be lower than
shared buffers

Regards

Pavel



>
>
> Thanks,
>
> Jamie
>
>
>
> *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Sent:* Wednesday, October 21, 2015 12:04 AM
> *To:* Jamie Koceniak
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Recursive query performance issue
>
>
>
> Hi
>
>
>
> 2015-10-20 19:34 GMT+02:00 Jamie Koceniak <jkocen...@mediamath.com>:
>
> Version:
>
>
> ---
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
>
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
>
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.
>
>
>
>
>
> there can be few reasons:
>
> 1. locking - are you sure, so your queries don't wait on locks?
>
> 2. issues with cache stability - is there high IO load? You can try to
> increase effective_cache_size (or decrease if you have not enough memory)
>
> Regards
>
> Pavel
>
>
>
>
>
>
>


Re: [PERFORM] Recursive query performance issue

2015-10-21 Thread Pavel Stehule
Hi



2015-10-20 19:34 GMT+02:00 Jamie Koceniak :

> Version:
>
>
> ---
>
> PostgreSQL 9.1.14 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
> 4.7.2-5) 4.7.2, 64-bit
>
>
>
> Query Plan
>
> http://explain.depesz.com/s/4s37
>
>
>
> Normally, this query takes around 200-300 ms to execute.
>
> However when several queries are run concurrently, query performance drops
> to 30-60 seconds.
>
>
>

there can be few reasons:

1. locking - are you sure, so your queries don't wait on locks?

2. issues with cache stability - is there high IO load? You can try to
increase effective_cache_size (or decrease if you have not enough memory)

Regards

Pavel


>
>


Re: [PERFORM] SELECT slows down on sixth execution

2015-10-20 Thread Pavel Stehule
2015-10-20 8:55 GMT+02:00 Thomas Kellerer :

> Jonathan Rogers schrieb am 17.10.2015 um 04:14:
> >>> Yes, I have been looking at both plans and can see where they diverge.
> >>> How could I go about figuring out why Postgres fails to see the large
> >>> difference in plan execution time? I use exactly the same parameters
> >>> every time I execute the prepared statement, so how would Postgres come
> >>> to think that those are not the norm?
> >>
> >> PostgreSQL does not consider the actual query execution time, it only
> >> compares its estimates for there general and the custom plan.
> >> Also, it does not keep track of the parameter values you supply,
> >> only of the average custom plan query cost estimate.
> >
> > OK, that makes more sense then. It's somewhat tedious for the purpose of
> > testing to execute a prepared statement six times to see the plan which
> > needs to be optimized. Unfortunately, there doesn't seem to be any way
> > to force use of a generic plan in SQL based on Pavel Stehule's reply.
>
>
> If you are using JDBC the threshold can be changed:
>
>https://jdbc.postgresql.org/documentation/94/server-prepare.html
>
> https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/PGStatement.html#setPrepareThreshold%28int%29
>
> As I don't think JDBC is using anything "exotic" I would be surprised if
> this
> can't be changed with other programming environments also.
>

This is some different - you can switch between server side prepared
statements and client side prepared statements in JDBC.  It doesn't change
the behave of server side prepared statements in Postgres.

Pavel


>
> Thomas
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] SELECT slows down on sixth execution

2015-10-20 Thread Pavel Stehule
2015-10-20 16:48 GMT+02:00 Jonathan Rogers <jrog...@socialserve.com>:

> On 10/20/2015 03:45 AM, Pavel Stehule wrote:
> >
> >
> > 2015-10-20 8:55 GMT+02:00 Thomas Kellerer <spam_ea...@gmx.net
> > <mailto:spam_ea...@gmx.net>>:
> >
> > Jonathan Rogers schrieb am 17.10.2015 um 04:14:
> > >>> Yes, I have been looking at both plans and can see where they
> > diverge.
> > >>> How could I go about figuring out why Postgres fails to see the
> > large
> > >>> difference in plan execution time? I use exactly the same
> parameters
> > >>> every time I execute the prepared statement, so how would
> > Postgres come
> > >>> to think that those are not the norm?
> > >>
> > >> PostgreSQL does not consider the actual query execution time, it
> only
> > >> compares its estimates for there general and the custom plan.
> > >> Also, it does not keep track of the parameter values you supply,
> > >> only of the average custom plan query cost estimate.
> > >
> > > OK, that makes more sense then. It's somewhat tedious for the
> > purpose of
> > > testing to execute a prepared statement six times to see the plan
> > which
> > > needs to be optimized. Unfortunately, there doesn't seem to be any
> way
> > > to force use of a generic plan in SQL based on Pavel Stehule's
> reply.
> >
> >
> > If you are using JDBC the threshold can be changed:
> >
> >https://jdbc.postgresql.org/documentation/94/server-prepare.html
> >
> >
> https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/PGStatement.html#setPrepareThreshold%28int%29
> >
> > As I don't think JDBC is using anything "exotic" I would be
> > surprised if this
> > can't be changed with other programming environments also.
> >
> >
> > This is some different - you can switch between server side prepared
> > statements and client side prepared statements in JDBC.  It doesn't
> > change the behave of server side prepared statements in Postgres.
>
> I am using psycopg2 with a layer on top which can automatically PREPARE
> statements, so I guess that implements something similar to the JDBC
> interface. I did solve my problem by turning off the automatic preparation.
>

yes, you did off server side prepared statements.

Pavel


>
> --
> Jonathan Rogers
> Socialserve.com by Emphasys Software
> jrog...@emphasys-software.com
>


Re: [PERFORM] SELECT slows down on sixth execution

2015-10-17 Thread Pavel Stehule
2015-10-17 15:29 GMT+02:00 Yves Dorfsman :

> On 2015-10-14 03:00, Albe Laurenz wrote:
> >
> > You are encountering "custom plans", introduced in 9.2.
> >
> > When a statement with parameters is executed, PostgreSQL will not only
> generate
> > a generic plan, but for the first 5 executions it will substitute the
> arguments
> > and generate and execute a custom plan for that.
>
> Wow! Thanks. I feel this should be documented a bit better.
>
> Shouldn't this be explained in at least as much details as in your
> explanation, in the sql-prepare document?
>

probably - some section about benefits and risks can be useful - but it is
task for somebody with better English than is mine :)

Regards

Pavel


>
> Yves.
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] SELECT slows down on sixth execution

2015-10-16 Thread Pavel Stehule
2015-10-17 4:29 GMT+02:00 Jonathan Rogers <jrog...@socialserve.com>:

> On 10/14/2015 05:01 AM, Pavel Stehule wrote:
> > Hi
> >
> > 2015-10-14 9:38 GMT+02:00 Jonathan Rogers <jrog...@socialserve.com
> > <mailto:jrog...@socialserve.com>>:
> >
> > I have a very complex SELECT for which I use PREPARE and then
> EXECUTE.
> > The first five times I run "explain (analyze, buffers) execute ..."
> in
> > psql, it takes about 1s. Starting with the sixth execution, the plan
> > changes and execution time doubles or more. The slower plan is used
> from
> > then on. If I DEALLOCATE the prepared statement and PREPARE again,
> the
> > cycle is reset and I get five good executions again.
> >
> > This behavior is utterly mystifying to me since I can see no reason
> for
> > Postgres to change its plan after an arbitrary number of executions,
> > especially for the worse. When I did the experiment on a development
> > system, Postgres was doing nothing apart from the interactively
> executed
> > statements. No data were inserted, no settings were changed and no
> other
> > clients were active in any way. Is there some threshold for five or
> six
> > executions of the same query?
> >
> >
> > yes, there is. PostgreSQL try to run custom plans five times (optimized
> > for specific parameters) and then compare average cost with cost of
> > generic plan. If generic plan is cheaper, then PostgreSQL will use
> > generic plan (that is optimized for most common value (not for currently
> > used value)).
> >
> > see
> >
> https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c
> > , function choose_custom_plan
> >
> > What I know, this behave isn't possible to change from outside.
> > Shouldn't be hard to write a extension for own PREPARE function, that
> > set CURSOR_OPT_CUSTOM_PLAN option
>
> Thanks for the link. I can see the hard-coded "5" right there. I looked
> in the docs a bit and found the server C function "SPI_prepare_cursor"
> which allows explicit selection of a custom or generic plan. However, if
> I understand you correctly, there is currently no SQL interface to
> explicitly control what type of plan is used.
>
> So, the solution for my particular query is to avoid preparing it,
> ensuring it gets a custom plan every time. The decision to prepare it
> came from a client-side layer which defaults to preparing everything
> rather than any specific reason and we're now reconsidering that policy.
>

I was not 100% correct - you can use a parametrized queries via PQexecParams
http://www.postgresql.org/docs/9.4/static/libpq-exec.html

If this function is accessable from your environment, then you should to
use it. It is protection against SQL injection, and it doesn't use generic
plan. For your case the using of prepared statements is contra productive.

Any other solution is client side prepared statements - lot of API used by
default.

Regards

Pavel




>
> --
> Jonathan Rogers
> Socialserve.com by Emphasys Software
> jrog...@emphasys-software.com
>


Re: [PERFORM] SELECT slows down on sixth execution

2015-10-14 Thread Pavel Stehule
Hi

2015-10-14 9:38 GMT+02:00 Jonathan Rogers :

> I have a very complex SELECT for which I use PREPARE and then EXECUTE.
> The first five times I run "explain (analyze, buffers) execute ..." in
> psql, it takes about 1s. Starting with the sixth execution, the plan
> changes and execution time doubles or more. The slower plan is used from
> then on. If I DEALLOCATE the prepared statement and PREPARE again, the
> cycle is reset and I get five good executions again.
>
> This behavior is utterly mystifying to me since I can see no reason for
> Postgres to change its plan after an arbitrary number of executions,
> especially for the worse. When I did the experiment on a development
> system, Postgres was doing nothing apart from the interactively executed
> statements. No data were inserted, no settings were changed and no other
> clients were active in any way. Is there some threshold for five or six
> executions of the same query?
>

yes, there is. PostgreSQL try to run custom plans five times (optimized for
specific parameters) and then compare average cost with cost of generic
plan. If generic plan is cheaper, then PostgreSQL will use generic plan
(that is optimized for most common value (not for currently used value)).

see
https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c
, function choose_custom_plan

What I know, this behave isn't possible to change from outside. Shouldn't
be hard to write a extension for own PREPARE function, that set
CURSOR_OPT_CUSTOM_PLAN option

Regards

Pavel


>
> Without delving into the plans themselves yet, what could possibly cause
> the prepared statement to be re-planned? I have seen the same behavior
> on Postgres 9.2.10 and 9.4.1.
> --
> Jonathan Rogers
> Socialserve.com by Emphasys Software
> jrog...@emphasys-software.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Weird CASE WHEN behaviour causing query to be suddenly very slow

2015-03-31 Thread Pavel Stehule
Hi

long CASE can be problem. Why you don't use a dictionary table and join?

Regards

Pavel

2015-03-31 10:53 GMT+02:00 Kevin Viraud kevin.vir...@rocket-internet.de:

 Hi,



 I have an issue with a rather large CASE WHEN and I cannot figure out why
  it is so slow...



 First, here is my test query :



 SELECT CASE WHEN dwh_company_id = 1


 THEN CASE




  WHEN   wv.source ~ '^$' THEN 'Not tracked'



 WHEN  wv.source ~ '^1$' THEN 'Not tracked1'


 WHEN  wv.source ~ '^2$' THEN 'Not tracked2'


 WHEN  wv.source ~ '^3$' THEN 'Not tracked3'


 WHEN  wv.source ~ '^4$' THEN 'Not tracked4'


 WHEN  wv.source ~ '^5$' THEN 'Not tracked5'


 WHEN  wv.source ~ '^6$' THEN 'Not tracked6'


 WHEN  wv.source ~ '^7$' THEN 'Not tracked7'


 WHEN  wv.source ~ '^8$' THEN 'Not tracked8'


 WHEN  wv.source ~ '^9$' THEN 'Not tracked9'


 WHEN  wv.source ~ '^10$' THEN 'Not tracked10'


 WHEN  wv.source ~ '^11$' THEN 'Not tracked11'


 WHEN  wv.source ~ '^12$' THEN 'Not tracked12'


 WHEN  wv.source ~ '^13$' THEN 'Not tracked13'


 WHEN  wv.source ~ '^14$' THEN 'Not tracked14'


 WHEN  wv.source ~ '^15$' THEN 'Not tracked15'


 WHEN  wv.source ~ '^16$' THEN 'Not tracked16'


 WHEN  wv.source ~ '^17$' THEN 'Not tracked17'


 WHEN  wv.source ~ '^18$' THEN 'Not tracked18'


 WHEN  wv.source ~ '^19$' THEN 'Not tracked19'


 WHEN  wv.source ~ '^20$' THEN 'Not tracked20'


 WHEN  wv.source ~ '^21$' THEN 'Not tracked21'


 WHEN  wv.source ~ '^22$' THEN 'Not tracked22'


 WHEN  wv.source ~ '^23$' THEN 'Not tracked23'


 WHEN  wv.source ~ '^24$' THEN 'Not tracked24'


 WHEN  wv.source ~ '^25$' THEN 'Not tracked25'


 WHEN  wv.source ~ '^26$' THEN 'Not tracked26'


 WHEN  wv.source ~ '^27$' THEN 'Not tracked27'


 WHEN  wv.source ~ '^28$' THEN 'Not tracked28'


 --WHEN  wv.source ~ '^29$' THEN 'Not tracked29'


 WHEN  wv.source ~ '^30$' THEN 'Not tracked30'


 WHEN  wv.source ~ '^31$' THEN 'Not tracked31'


 WHEN  wv.source ~ '^32$' THEN 'Not tracked32'


 END

ELSE

'Others'

END as channel

 FROM (

SELECT wv.id,

   wv.ga_id,

   split_part(wv.ga_source_medium, ' /
 ', 1) as source,

   ga.dwh_source_id,

   s.dwh_company_id

FROM marketing.web_visits wv

INNER JOIN dwh_metadata.google_analytics ga
 ON ga.ga_id = wv.ga_id

INNER JOIN dwh_manager.sources s ON
 ga.dwh_source_id =s.dwh_source_id

--WHERE s.dwh_company_id = 1

LIMIT 10

 ) wv





 This is a pretty simple case,  my subquery (or CTE when using WITH
 statement) should return 5 fields with more or less this structure :

 Id : character(32)

 Ga_id : bigint

 Source : character(32)

 Medium : character(32)

 dwh_company_id : bigint



 On top of which I apply a case when statement…



 Now the weird thing is, using this query I notice a significant drop in
 performance as the “case when” is getting bigger. If I run the query as if,
 I get the following exec plain and execution time:

 Subquery Scan on wv  (cost=6.00..29098.17 rows=10 width=36) (actual
 time=0.828..22476.917 rows=10 loops=1)

Buffers: shared hit=3136

-  Limit  (cost=6.00..11598.17 rows=10 width=58) (actual
 time=0.209..133.429 rows=10 loops=1)

  Buffers: shared hit=3136

  -  Hash Join  (cost=6.00..1069811.24 rows=9228690 width=58)
 (actual time=0.208..119.297 rows=10 loops=1)

Hash Cond: (wv_1.ga_id = ga.ga_id)

Buffers: shared hit=3136

-  Seq Scan on web_visits wv_1  (cost=0.00..877005.78
 rows=20587078 width=50) (actual time=0.004..18.412 rows=10 loops=1)

  Buffers: shared hit=3133

-  Hash  (cost=5.50..5.50 rows=40 width=12) (actual
 time=0.184..0.184 rows=111 loops=1)

  Buckets: 1024  Batches: 1  Memory Usage: 5kB

  Buffers: shared hit=3

  -  Hash Join  (cost=1.88..5.50 rows=40 width=12)
 (actual time=0.056..0.148 rows=111 loops=1)

Hash Cond: (ga.dwh_source_id = s.dwh_source_id)

Buffers: shared hit=3

-  Seq Scan on google_analytics ga
 (cost=0.00..2.89 rows=89 width=8) (actual time=0.007..0.028 rows=111
 loops=1)

  Buffers: shared hit=2

-  Hash  (cost=1.39..1.39 rows=39 width=8)
 (actual time=0.042..0.042 rows=56 loops=1)

  Buckets: 1024  Batches: 1  Memory Usage:
 3kB

  Buffers: shared hit=1

  -  Seq Scan on 

Re: [PERFORM] Weird CASE WHEN behaviour causing query to be suddenly very slow

2015-03-31 Thread Pavel Stehule
2015-03-31 11:19 GMT+02:00 Kevin Viraud kevin.vir...@rocket-internet.de:

 Hi Pavel,



 Thanks for your answer.



 Yes sure, I could do that, but like I wrote the purpose is not to find a
 way to rewrite it. But to understand why at a certain point it is totally
 going off.  I’m aware that the longer my case when will be the longest the
 query will run. But 10x slower for adding one condition, something feels
 wrong here.


It is slow due lot of expressions evaluation. It is CPU expensive.
PostgreSQL uses interpreted expression evaluation - and if you have lot of
expressions, then you have problem.

Regards

Pavel




 Plus, the case when is part of a function so basically I use it this way :

 SELECT col1, col2, get_channel(company_id, source_id, …)

 FROM mytable;



 Get_channel is coming from another app. And even though I have, I need to
 assume that I don’t have the control over this one and that I’m using it as
 if.



 This is only my debugging query.



 Best regards,



 Kevin



 *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
 *Sent:* Dienstag, 31. März 2015 11:09
 *To:* Kevin Viraud
 *Cc:* pgsql-performance@postgresql.org
 *Subject:* Re: [PERFORM] Weird CASE WHEN behaviour causing query to be
 suddenly very slow



 Hi

 long CASE can be problem. Why you don't use a dictionary table and join?

 Regards

 Pavel



 2015-03-31 10:53 GMT+02:00 Kevin Viraud kevin.vir...@rocket-internet.de:

 Hi,



 I have an issue with a rather large CASE WHEN and I cannot figure out why
  it is so slow...



 First, here is my test query :



 SELECT CASE WHEN dwh_company_id = 1


 THEN CASE




  WHEN   wv.source ~ '^$' THEN 'Not tracked'



 WHEN  wv.source ~ '^1$' THEN 'Not tracked1'


 WHEN  wv.source ~ '^2$' THEN 'Not tracked2'


 WHEN  wv.source ~ '^3$' THEN 'Not tracked3'


 WHEN  wv.source ~ '^4$' THEN 'Not tracked4'


 WHEN  wv.source ~ '^5$' THEN 'Not tracked5'


 WHEN  wv.source ~ '^6$' THEN 'Not tracked6'


 WHEN  wv.source ~ '^7$' THEN 'Not tracked7'


 WHEN  wv.source ~ '^8$' THEN 'Not tracked8'


 WHEN  wv.source ~ '^9$' THEN 'Not tracked9'


 WHEN  wv.source ~ '^10$' THEN 'Not tracked10'


 WHEN  wv.source ~ '^11$' THEN 'Not tracked11'


 WHEN  wv.source ~ '^12$' THEN 'Not tracked12'


 WHEN  wv.source ~ '^13$' THEN 'Not tracked13'


 WHEN  wv.source ~ '^14$' THEN 'Not tracked14'


 WHEN  wv.source ~ '^15$' THEN 'Not tracked15'


 WHEN  wv.source ~ '^16$' THEN 'Not tracked16'


 WHEN  wv.source ~ '^17$' THEN 'Not tracked17'


 WHEN  wv.source ~ '^18$' THEN 'Not tracked18'


 WHEN  wv.source ~ '^19$' THEN 'Not tracked19'


 WHEN  wv.source ~ '^20$' THEN 'Not tracked20'


 WHEN  wv.source ~ '^21$' THEN 'Not tracked21'


 WHEN  wv.source ~ '^22$' THEN 'Not tracked22'


 WHEN  wv.source ~ '^23$' THEN 'Not tracked23'


 WHEN  wv.source ~ '^24$' THEN 'Not tracked24'


 WHEN  wv.source ~ '^25$' THEN 'Not tracked25'


 WHEN  wv.source ~ '^26$' THEN 'Not tracked26'


 WHEN  wv.source ~ '^27$' THEN 'Not tracked27'


 WHEN  wv.source ~ '^28$' THEN 'Not tracked28'


 --WHEN  wv.source ~ '^29$' THEN 'Not tracked29'


 WHEN  wv.source ~ '^30$' THEN 'Not tracked30'


 WHEN  wv.source ~ '^31$' THEN 'Not tracked31'


 WHEN  wv.source ~ '^32$' THEN 'Not tracked32'


 END

ELSE

'Others'

END as channel

 FROM (

SELECT wv.id,

   wv.ga_id,

   split_part(wv.ga_source_medium, ' /
 ', 1) as source,

   ga.dwh_source_id,

   s.dwh_company_id

FROM marketing.web_visits wv

INNER JOIN dwh_metadata.google_analytics ga
 ON ga.ga_id = wv.ga_id

INNER JOIN dwh_manager.sources s ON
 ga.dwh_source_id =s.dwh_source_id

--WHERE s.dwh_company_id = 1

LIMIT 10

 ) wv





 This is a pretty simple case,  my subquery (or CTE when using WITH
 statement) should return 5 fields with more or less this structure :

 Id : character(32)

 Ga_id : bigint

 Source : character(32)

 Medium : character(32)

 dwh_company_id : bigint



 On top of which I apply a case when statement…



 Now the weird thing is, using this query I notice a significant drop in
 performance as the “case when” is getting bigger. If I run the query as if,
 I get the following exec plain and execution time:

 Subquery Scan on wv  (cost=6.00..29098.17 rows=10 width=36) (actual
 time=0.828..22476.917 rows=10 loops=1)

Buffers: shared hit=3136

-  Limit  (cost=6.00..11598.17 rows=10 width=58) (actual
 time=0.209..133.429 rows=10 loops=1)

  Buffers: shared hit=3136

  -  Hash Join  (cost=6.00..1069811.24 rows=9228690 width=58)
 (actual time=0.208..119.297 rows=10 loops

Re: [PERFORM] Merge Join chooses very slow index scan

2015-03-19 Thread Pavel Stehule
Hi

what is your random_page_cost and seq_page_cost?

Regards

Pavel Stehule

2015-03-19 7:23 GMT+01:00 Jake Magner jakemagne...@gmail.com:

 I am having problems with a join where the planner picks a merge join and
 an
 index scan on one of the tables. Manually disabling merge joins and running
 the query both ways shows the merge join takes over 10 seconds while a hash
 join takes less than 100ms. The planner total cost estimate favors the
 merge
 join, but the cost estimate for the index scan part is greater than the
 total cost estimate by a factor of 300x. My understanding of how this can
 occur is that it expects it won't actually have to scan all the rows,
 because using the histogram distribution stats it can know that all the
 relevant rows of the join column will be at the beginning of the scan. But
 in practice it appears to actually be index scanning all the rows, showing
 massive amounts of page hits. What is also confusing is that the planner
 estimate of the number of rows that match the second join condition is
 accurate and very low, so I would expect it to index scan on that column's
 index instead. Pasted at the bottom is the explain plan for the query and
 some other variations I think might be relevant. The table/index names are
 obfuscated. I ran ANALYZE on all the tables in the query first. All  the
 pages are cached in the explain plans but we wouldn't expect that to be
 true
 in the production system. There are btree indexes on all the columns in
 both
 the join conditions and the filters.

 Searching, I found this thread
 http://postgresql.nabble.com/merge-join-killing-performance-td2076433.html
 which sounds kind of similar, but there are no Nulls in this table.

 Thanks for your help.



 Postgres version info: PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu,
 compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit


 ---
 Original Query

 The estimated cost for Index Scan is 898k but the total cost estimate is
 2.6k. The planner has a good estimate of the number of rows, 1335, for the
 index scan, but by the number of page hits (8M) it appears it actually
 scanned the entire table which has about 8M rows.
 ---
 EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicles v LEFT JOIN usagestats ON
 v.id = tid AND type = 'vehicle';

 QUERY PLAN

 
  Merge Right Join  (cost=593.28..2634.10 rows=4155 width=619) (actual
 time=9.150..11464.949 rows=4155 loops=1)
Merge Cond: (usagestats.tid = s.id)
Buffers: shared hit=8063988
-  Index Scan using usagestats_tid_idx on usagestats
 (cost=0.00..898911.91 rows=1335 width=37) (actual time=0.027..11448.789
 rows=2979 loops=1)
  Filter: ((type)::text = 'vehicle'::text)
  Buffers: shared hit=8063686
-  Sort  (cost=593.28..603.67 rows=4155 width=582) (actual
 time=9.108..10.429 rows=4155 loops=1)
  Sort Key: s.id
  Sort Method: quicksort  Memory: 1657kB
  Buffers: shared hit=302
  -  Seq Scan on vehicles v  (cost=0.00..343.55 rows=4155
 width=582)
 (actual time=0.014..2.917 rows=4155 loops=1)
Buffers: shared hit=302
  Total runtime: 11466.122 ms
 (13 rows)

 
 Change the type='vehicle' condition to an always true condition

 If we change the filter from type = 'vehicle' (True for a small fraction
 of the rows) to freq  -1 (True for all rows) then the plan is the same,
 but the actual time and page hits are much less and the query returns is
 fast.
 
 EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vehicle v LEFT JOIN usagestats ON
 (v.id = tid AND freq  -1);


 QUERY PLAN


 -

  Merge Right Join  (cost=593.28..2434.79 rows=7733 width=619) (actual
 time=5.635..59.852 rows=17096 loops=1)

Merge Cond: (usagestats.tid = v.id)

Buffers: shared hit=17653

-  Index Scan using usagestats_tid_idx on usagestats
 (cost=0.00..898914.00 rows=8006976 width=37) (actual time=0.010..34.075
 rows=17225 loops=1)

  Filter: (freq  (-1))

  Buffers: shared hit=17351

-  Sort  (cost=593.28..603.67 rows=4155 width=582) (actual
 time=5.617..9.351 rows=17094 loops=1)

  Sort Key: v.id

  Sort Method: quicksort  Memory: 1657kB

  Buffers: shared hit=302

  -  Seq Scan on vehicle v  (cost=0.00..343.55 rows=4155 width=582)
 (actual time=0.009..1.803 rows=4157 loops=1)

Buffers: shared hit=302

  Total runtime: 62.868 ms

 (13 rows)


 --
 Original Query with merge joins disabled

 If we manually disable merge joins and run the original query we get a hash
 join with what seems

Re: [PERFORM] Hardware Configuration and other Stuff

2015-03-18 Thread Pavel Stehule
Hi

2015-03-18 20:07 GMT+01:00 Vivekanand Joshi vjo...@zetainteractive.com:

 Hi Team,



 I don't know under which section does this question comes, so I am posting
 this question to both Admin and performance mailing list. Apologies in
 advance.



 Objective:



 We are planning to use PostgreSQL instead of Netezza for our data
 warehouse as well as database solutions. Right now, we have all our clients
 in one Netezza box.  What we are thinking of migrating our clients to
 dedicated PostgreSQL for each of them. We will start with one of the
 client. If it works successfully, we will be migrating all the clients one
 by one. The objective is to get a better performance than our existing
 solution. We are hopeful of that mainly because of two reasons. Firstly, we
 will have a dedicated server for each of the client with good hardware
 instead of having one server with all the clients on that. Secondly, we can
 spend on hardware much easily than spending on a proprietary appliance.




It terrible depends on use case. Netezza is extremely optimized OLAP column
store database.  PoostgreSQL is optimized OLTP row store database. You
cannot to get same performance on OLAP queries on Postgres ever. I don't
think so dedicated hw can help. If you use Nettezza well, then it is
10-100x faster than Postgres.

You can try to use Postgres with cstore_fdw or maybe better MonetDB

Regards

Pavel


 I am hoping this community can help us to know that what would be the good
 infrastructure/hardware that can help us in achieving our goal.



 Here are few of the statistics which might act as a starting point.



 Availability: High (24*7).

 User Data : 700 GB which will increase to 1.5 TB in next 2-3 years.

 Number of User Databases : 2 (One is the main database, other is used only
 for working tables where tables gets deleted in every 48 hours)

 Number of tables : 200 (in the main database), (2000-3000 in working
 database)

 Size of top 5 biggest tables : 20-40 GB

 No of users concurrently accessing the system : 5-6 with write access. 10
 with read access.

 No of User Queries running on the system in a day : ~80K

 Read-only Queries (Select): ~60K

 Write queries: ~20K

 Data Import Queries: ~1K

 Typical Business Day : 18-20 hours.



 I can pass on few complex queries to let you guys know what are we doing.



 Here are few questions:



 1.) I don't need a load balancing solution. It must be high availability
 server and I can work with asynchronous replication. The most important
 thing here would be recovery should be as fast as possible.

 What approach would you recommend?



 2.) Recommendations on indexes, WAL, table spaces. I am not asking about
 on which key I need to make indexes, but an high level approach about how
 to keep them? This might come out as a weird question to many but please
 excuse me for being a novice.



 *Most Important Question:*



 3.) What would be the ideal hardware configuration for this requirement? I
 know there is not a one-stop answer for this, but let's take it is a
 starting point. We can come to a proper conclusion after a discussion.



 What are the best on-line resources/books which can tell us about the
 hardware requirements?



 Warm Regards,


 Vivekanand Joshi
 +919654227927



 [image: Zeta Interactive]

 185 Madison Ave. New York, NY 10016

 www.zetainteractive.com





Re: [PERFORM] How to get explain plan to prefer Hash Join

2015-03-11 Thread Pavel Stehule
2015-03-12 1:35 GMT+01:00 atxcanadian matthew.bo...@gmail.com:

 So I implemented two changes.

 - Moved random_page_cost from 1.1 to 2.0


random_page_cost 1 can enforce nested_loop - it is very cheap with it


 - Manually ran analyze on all the tables

 *Here is the new explain analyze:*
 QUERY PLAN
 HashAggregate  (cost=74122.97..74125.53 rows=256 width=24) (actual
 time=45.205..45.211 rows=24 loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=8.30..8.30 rows=1 width=9) (actual time=0.152..0.152
 rows=1 loops=1)
   -  Sort  (cost=8.30..8.78 rows=193 width=9) (actual
 time=0.150..0.150 rows=1 loops=1)
 Sort Key: c.cim
 Sort Method: top-N heapsort  Memory: 25kB
 -  Seq Scan on cim_calendar c  (cost=0.00..7.33 rows=193
 width=9) (actual time=0.008..0.085 rows=192 loops=1)
   Filter: (nodal_load = '2015-01-01'::date)
   Rows Removed by Filter: 36
   -  Nested Loop  (cost=22623.47..74111.47 rows=256 width=24) (actual
 time=43.798..45.181 rows=24 loops=1)
 -  Bitmap Heap Scan on api_settlement_points sp
 (cost=22622.91..67425.92 rows=12 width=9) (actual time=43.756..43.823
 rows=1
 loops=1)
   Recheck Cond: ((rt_model = $0) AND (start_date =
 '2015-01-01'::date) AND (end_date  '2015-01-01'::date))
   Filter: ((settlement_point_rdfid)::text =
 '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text)
   Rows Removed by Filter: 5298
   -  Bitmap Index Scan on api_settlement_points_idx
 (cost=0.00..22622.90 rows=72134 width=0) (actual time=42.998..42.998
 rows=5299 loops=1)
 Index Cond: ((rt_model = $0) AND (start_date =
 '2015-01-01'::date) AND (end_date  '2015-01-01'::date))
 -  Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual
 dp  (cost=0.56..556.88 rows=25 width=24) (actual time=0.033..1.333 rows=24
 loops=1)
   Index Cond: ((market_day = '2015-01-01'::date) AND
 (market_day = '2015-01-01'::date) AND (expiry_date IS NULL) AND
 ((settlement_point)::text = (sp.settlement_point)::text))
 Total runtime: 45.278 ms

 I'm a little perplexed why the autovacuum wasn't keeping up. Any
 recommendations for those settings to push it to do a bit more analyzing of
 the tables??



 --
 View this message in context:
 http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841520.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] empty string Vs NULL

2015-02-09 Thread Pavel Stehule
Hi

2015-02-09 12:22 GMT+01:00 sridhar bamandlapally sridhar@gmail.com:

 Hi All

 We are testing our Oracle compatible business applications on PostgreSQL
 database,

 the issue we are facing is empty string Vs NULL

 In Oracle '' (empty string) and NULL are treated as NULL

 but, in PostgreSQL '' empty string not treated as NULL

 I need some *implicit* way in PostgreSQL where ''empty string can be
 treated as NULL


It is not possible in PostgreSQL.  PostgreSQL respects ANSI SQL standard -
Oracle not.

Regards

Pavel

p.s. theoretically you can overwrite a type operators to support Oracle
behave, but you should not be sure about unexpected negative side effects.




 Please,

 Thanks
 Sridhar BN




Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Pavel Stehule
2015-02-06 9:44 GMT+01:00 Belal Al-Hamed belalha...@gmail.com:

 thanks,

 but isn't copy use the same plan ???


aha - I was wrong,

this slowdown can be enforced by slow client (or slow network). pgAdmin is
not terrible fast. Try to execute your query from psql.

Regards

Pavel



 any way this is the query play

 Sort  (cost=15402.76..15511.77 rows=43602 width=184)
   Output: Sessions.SesUser, Vouchers.VouID,
 Journals.JurMuniment, Journals.JurRefID, Journals.JurDate,
 Vouchers.VouJournal, Vouchers.VouMunNumber, Vouchers.VouDate,
 Vouchers.VouNote, Vouchers.VouDoc, Journals.JurM (...)
   Sort Key: VouItems.ItmDate, Vouchers.VouID,
 VouItems.ItmNumber
   -  Hash Join  (cost=4665.21..8164.77 rows=43602 width=184)
 Output: Sessions.SesUser, Vouchers.VouID,
 Journals.JurMuniment, Journals.JurRefID, Journals.JurDate,
 Vouchers.VouJournal, Vouchers.VouMunNumber, Vouchers.VouDate,
 Vouchers.VouNote, Vouchers.VouDoc, Journals (...)
 Hash Cond: (VouItems.ItmMaster = Vouchers.VouID)
 -  Seq Scan on public.VouItems  (cost=0.00..1103.02 rows=43602
 width=89)
   Output: VouItems.ItmMaster, VouItems.ItmNumber,
 VouItems.ItmCurDebit, VouItems.ItmCurCredit,
 VouItems.ItmAccount, VouItems.ItmBranch,
 VouItems.ItmSubAccount,
 VouItems.ItmMuniment, VouItems.ItmDate, VouItem (...)
 -  Hash  (cost=4107.41..4107.41 rows=20544 width=95)
   Output: Vouchers.VouID, Vouchers.VouJournal,
 Vouchers.VouMunNumber, Vouchers.VouDate, Vouchers.VouNote,
 Vouchers.VouDoc, Vouchers.VouIsHold, Vouchers.VouCreateDate,
 Vouchers.VouDebit, Vouchers.VouCredit (...)
   -  Hash Join  (cost=1793.25..4107.41 rows=20544 width=95)
 Output: Vouchers.VouID, Vouchers.VouJournal,
 Vouchers.VouMunNumber, Vouchers.VouDate, Vouchers.VouNote,
 Vouchers.VouDoc, Vouchers.VouIsHold, Vouchers.VouCreateDate,
 Vouchers.VouDebit, Vouchers.VouC (...)
 Hash Cond: (Vouchers.VouJournal =
 Journals.JurID)
 -  Hash Join  (cost=1236.16..3165.12 rows=20544
 width=74)
   Output: Vouchers.VouID,
 Vouchers.VouJournal, Vouchers.VouMunNumber, Vouchers.VouDate,
 Vouchers.VouNote, Vouchers.VouDoc, Vouchers.VouIsHold,
 Vouchers.VouCreateDate, Vouchers.VouDebit, Vouchers (...)
   Hash Cond: (Vouchers.VouSession =
 Sessions.SesID)
   -  Seq Scan on public.Vouchers
 (cost=0.00..883.44 rows=20544 width=78)
 Output: Vouchers.VouID,
 Vouchers.VouJournal, Vouchers.VouMunNumber, Vouchers.VouDate,
 Vouchers.VouNote, Vouchers.VouDoc, Vouchers.VouIsHold,
 Vouchers.VouCreateDate, Vouchers.VouDebit, Vou (...)
   -  Hash  (cost=654.85..654.85 rows=33385
 width=12)
 Output: Sessions.SesUser,
 Sessions.SesID
 -  Seq Scan on public.Sessions
 (cost=0.00..654.85 rows=33385 width=12)
   Output: Sessions.SesUser,
 Sessions.SesID
 -  Hash  (cost=417.04..417.04 rows=11204 width=29)
   Output: Journals.JurMuniment,
 Journals.JurRefID, Journals.JurDate, Journals.JurMunNumber,
 Journals.JurID
   -  Seq Scan on public.Journals
 (cost=0.00..417.04 rows=11204 width=29)
 Output: Journals.JurMuniment,
 Journals.JurRefID, Journals.JurDate, Journals.JurMunNumber,
 Journals.JurID




 --
 View this message in context:
 http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886p5836890.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Pavel Stehule
2015-02-06 10:50 GMT+01:00 Belal Al-Hamed belalha...@gmail.com:

 fast as

 Query returned successfully: 43602 rows affected, 1089 ms execution time.


so bottle neck have to be some where between client and server

Pavel






 --
 View this message in context:
 http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886p5836902.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Pavel Stehule
Hi

2015-02-06 9:30 GMT+01:00 belal belalha...@gmail.com:

 I made complex select using PGAdmin III Query Editor, Postgre server 9.3


 select ... from mytable join .. join ... order by 

 I get [Total query runtime: 8841 ms. 43602 rows retrieved.]

 but when I use

 copy ([same above select]) to '/x.txt'
 I get [Query returned successfully: 43602 rows affected, 683 ms execution
 time.]

 these test made on the same machine as the postgresql server.


 can anyone explain huge difference in executing time?


probably terrible uneffective execution plan

can you send a explain analyze of your slow query?

Regards

Pavel




 best regards all



 --
 View this message in context:
 http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Pavel Stehule
2015-02-06 10:15 GMT+01:00 Belal Al-Hamed belalha...@gmail.com:

 this slowdown can be enforced by slow client (or slow network).
 As I said i made the tow test on the same machine as the server using
 PGAdmin no network involved.

 pgAdmin is not terrible fast
 I also try the same query from my application using libpq I get same
 results


what is speed of

CREATE TABLE xx AS SELECT /* your query */ ?

regards

Pavel



 regards



 --
 View this message in context:
 http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886p5836893.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Pavel Stehule
2015-02-06 14:39 GMT+01:00 Belal Al-Hamed belalha...@gmail.com:

 Let me change my question to this perhaps it would be clearer

 why writing data result of select statment from PG server to file on disk
 using copy statement is much faster than getting same data through PGAdmin
 via libpg on the same PC on the same system on the same connection
 (localhost) ?


COPY to filesystem can use a more CPU, and on modern computers, a data are
stored to write cache first - and real IO operation can be processed later.

PgAdmin uses only one CPU and works with expensive interactive element -
grid - probably there are some space for optimization - usually fill 40K
rows to pgAdmin is not good idea (it is not good idea for any client).




 --
 View this message in context:
 http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886p5836933.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Query performance

2015-01-25 Thread Pavel Stehule
2015-01-25 8:20 GMT+01:00 Joe Van Dyk j...@tanga.com:

 On Sat, Jan 24, 2015 at 11:14 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:



 2015-01-25 7:38 GMT+01:00 Joe Van Dyk j...@tanga.com:



 On Sat, Jan 24, 2015 at 10:12 PM, Pavel Stehule pavel.steh...@gmail.com
  wrote:

 Hi

 this plan looks well

 Regards

 Pavel


 Here's one that's not quite as well: http://explain.depesz.com/s/SgT


 I see a possible issue

 (product_id  '81716'::citext) .. this operation is CPU expensive and
 maybe nonsense

 product_id should be integer -- and if it isn't - it should not be on 4M
 rows extremly fast - mainly on citext

 try to force a opposite cast - you will safe a case insensitive text
 comparation

 product_id::int  81716


 It might not always be an integer, just happens to be so here. Should I
 try text instead? I don't have to have the case-insensitive matching.


text can be better

this design is unhappy, but you cannot to change ot probably




 Joe



 Regards

 Pavel





 Joe



 2015-01-25 6:45 GMT+01:00 Joe Van Dyk j...@tanga.com:

 Oops, didn't run vacuum analyze after deleting the events. Here is
 another 'explain analyze': http://explain.depesz.com/s/AviN

 On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk j...@tanga.com wrote:

 On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk j...@tanga.com wrote:

 I have an events table that records page views and purchases (type =
 'viewed' or type='purchased'). I have a query that figures out people 
 who
 bought/viewed this also bought/viewed that.

 It worked fine, taking about 0.1 seconds to complete, until a few
 hours ago when it started taking hours to complete. Vacuum/analyze 
 didn't
 help.  Turned out there was one session_id that had 400k rows in the
 system. Deleting that made the query performant again.

 Is there anything I can do to make the query work better in cases
 like that? Missing index, or better query?

 This is on 9.3.5.

 The below is reproduced at the following URL if it's not formatted
 correctly in the email.
 https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt

 explain  select
e1.product_id,
e2.site_id,
e2.product_id,
count(nullif(e2.type='viewed', false)) view_count,
count(nullif(e2.type='purchased', false)) purchase_count
  from events e1
  join events e2 on e1.session_id = e2.session_id and e1.type = e2.type
  where
e1.product_id = '82503' and
e1.product_id != e2.product_id
  group by e1.product_id, e2.product_id, e2.site_id;
  QUERY PLAN
 
  GroupAggregate  (cost=828395.67..945838.90 rows=22110 width=19)
-  Sort  (cost=828395.67..840117.89 rows=465 width=19)
  Sort Key: e1.product_id, e2.product_id, e2.site_id
  -  Nested Loop  (cost=11.85..20371.14 rows=465 width=19)
-  Bitmap Heap Scan on events e1  (cost=11.29..1404.31 
 rows=369 width=49)
  Recheck Cond: (product_id = '82503'::citext)
  -  Bitmap Index Scan on 
 events_product_id_site_id_idx  (cost=0.00..11.20 rows=369 width=0)
Index Cond: (product_id = '82503'::citext)
-  Index Scan using 
 events_session_id_type_product_id_idx on events e2  (cost=0.56..51.28 
 rows=12 width=51)
  Index Cond: ((session_id = e1.session_id) AND 
 (type = e1.type))
  Filter: (e1.product_id  product_id)
 (11 rows)

 recommender_production= \d events
 Table public.events
Column|   Type   |  Modifiers
 -+--+-
  id  | bigint   | not null default 
 nextval('events_id_seq'::regclass)
  user_id | citext   |
  session_id  | citext   | not null
  product_id  | citext   | not null
  site_id | citext   | not null
  type| text | not null
  happened_at | timestamp with time zone | not null
  created_at  | timestamp with time zone | not null
 Indexes:
 events_pkey PRIMARY KEY, btree (id)
 events_product_id_site_id_idx btree (product_id, site_id)
 events_session_id_type_product_id_idx btree (session_id, type, 
 product_id)
 Check constraints:
 events_session_id_check CHECK (length(session_id::text)  255)
 events_type_check CHECK (type = ANY (ARRAY['purchased'::text, 
 'viewed'::text]))
 events_user_id_check CHECK (length(user_id::text)  255)




 After removing the session with 400k events, I was able to do an
 explain analyze, here is one of them:
 http://explain.depesz.com/s/PFNk









Re: [PERFORM] Query performance

2015-01-24 Thread Pavel Stehule
2015-01-25 7:38 GMT+01:00 Joe Van Dyk j...@tanga.com:



 On Sat, Jan 24, 2015 at 10:12 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 Hi

 this plan looks well

 Regards

 Pavel


 Here's one that's not quite as well: http://explain.depesz.com/s/SgT


I see a possible issue

(product_id  '81716'::citext) .. this operation is CPU expensive and
maybe nonsense

product_id should be integer -- and if it isn't - it should not be on 4M
rows extremly fast - mainly on citext

try to force a opposite cast - you will safe a case insensitive text
comparation

product_id::int  81716

Regards

Pavel





 Joe



 2015-01-25 6:45 GMT+01:00 Joe Van Dyk j...@tanga.com:

 Oops, didn't run vacuum analyze after deleting the events. Here is
 another 'explain analyze': http://explain.depesz.com/s/AviN

 On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk j...@tanga.com wrote:

 On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk j...@tanga.com wrote:

 I have an events table that records page views and purchases (type =
 'viewed' or type='purchased'). I have a query that figures out people who
 bought/viewed this also bought/viewed that.

 It worked fine, taking about 0.1 seconds to complete, until a few
 hours ago when it started taking hours to complete. Vacuum/analyze didn't
 help.  Turned out there was one session_id that had 400k rows in the
 system. Deleting that made the query performant again.

 Is there anything I can do to make the query work better in cases like
 that? Missing index, or better query?

 This is on 9.3.5.

 The below is reproduced at the following URL if it's not formatted
 correctly in the email.
 https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt

 explain  select
e1.product_id,
e2.site_id,
e2.product_id,
count(nullif(e2.type='viewed', false)) view_count,
count(nullif(e2.type='purchased', false)) purchase_count
  from events e1
  join events e2 on e1.session_id = e2.session_id and e1.type = e2.type
  where
e1.product_id = '82503' and
e1.product_id != e2.product_id
  group by e1.product_id, e2.product_id, e2.site_id;
  QUERY PLAN
 
  GroupAggregate  (cost=828395.67..945838.90 rows=22110 width=19)
-  Sort  (cost=828395.67..840117.89 rows=465 width=19)
  Sort Key: e1.product_id, e2.product_id, e2.site_id
  -  Nested Loop  (cost=11.85..20371.14 rows=465 width=19)
-  Bitmap Heap Scan on events e1  (cost=11.29..1404.31 
 rows=369 width=49)
  Recheck Cond: (product_id = '82503'::citext)
  -  Bitmap Index Scan on 
 events_product_id_site_id_idx  (cost=0.00..11.20 rows=369 width=0)
Index Cond: (product_id = '82503'::citext)
-  Index Scan using events_session_id_type_product_id_idx 
 on events e2  (cost=0.56..51.28 rows=12 width=51)
  Index Cond: ((session_id = e1.session_id) AND (type 
 = e1.type))
  Filter: (e1.product_id  product_id)
 (11 rows)

 recommender_production= \d events
 Table public.events
Column|   Type   |  Modifiers
 -+--+-
  id  | bigint   | not null default 
 nextval('events_id_seq'::regclass)
  user_id | citext   |
  session_id  | citext   | not null
  product_id  | citext   | not null
  site_id | citext   | not null
  type| text | not null
  happened_at | timestamp with time zone | not null
  created_at  | timestamp with time zone | not null
 Indexes:
 events_pkey PRIMARY KEY, btree (id)
 events_product_id_site_id_idx btree (product_id, site_id)
 events_session_id_type_product_id_idx btree (session_id, type, 
 product_id)
 Check constraints:
 events_session_id_check CHECK (length(session_id::text)  255)
 events_type_check CHECK (type = ANY (ARRAY['purchased'::text, 
 'viewed'::text]))
 events_user_id_check CHECK (length(user_id::text)  255)




 After removing the session with 400k events, I was able to do an
 explain analyze, here is one of them:
 http://explain.depesz.com/s/PFNk







Re: [PERFORM] Query performance

2015-01-24 Thread Pavel Stehule
Hi

this plan looks well

Regards

Pavel

2015-01-25 6:45 GMT+01:00 Joe Van Dyk j...@tanga.com:

 Oops, didn't run vacuum analyze after deleting the events. Here is another
 'explain analyze': http://explain.depesz.com/s/AviN

 On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk j...@tanga.com wrote:

 On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk j...@tanga.com wrote:

 I have an events table that records page views and purchases (type =
 'viewed' or type='purchased'). I have a query that figures out people who
 bought/viewed this also bought/viewed that.

 It worked fine, taking about 0.1 seconds to complete, until a few hours
 ago when it started taking hours to complete. Vacuum/analyze didn't help.
 Turned out there was one session_id that had 400k rows in the system.
 Deleting that made the query performant again.

 Is there anything I can do to make the query work better in cases like
 that? Missing index, or better query?

 This is on 9.3.5.

 The below is reproduced at the following URL if it's not formatted
 correctly in the email.
 https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt

 explain  select
e1.product_id,
e2.site_id,
e2.product_id,
count(nullif(e2.type='viewed', false)) view_count,
count(nullif(e2.type='purchased', false)) purchase_count
  from events e1
  join events e2 on e1.session_id = e2.session_id and e1.type = e2.type
  where
e1.product_id = '82503' and
e1.product_id != e2.product_id
  group by e1.product_id, e2.product_id, e2.site_id;
  QUERY PLAN
 
  GroupAggregate  (cost=828395.67..945838.90 rows=22110 width=19)
-  Sort  (cost=828395.67..840117.89 rows=465 width=19)
  Sort Key: e1.product_id, e2.product_id, e2.site_id
  -  Nested Loop  (cost=11.85..20371.14 rows=465 width=19)
-  Bitmap Heap Scan on events e1  (cost=11.29..1404.31 
 rows=369 width=49)
  Recheck Cond: (product_id = '82503'::citext)
  -  Bitmap Index Scan on events_product_id_site_id_idx 
  (cost=0.00..11.20 rows=369 width=0)
Index Cond: (product_id = '82503'::citext)
-  Index Scan using events_session_id_type_product_id_idx 
 on events e2  (cost=0.56..51.28 rows=12 width=51)
  Index Cond: ((session_id = e1.session_id) AND (type = 
 e1.type))
  Filter: (e1.product_id  product_id)
 (11 rows)

 recommender_production= \d events
 Table public.events
Column|   Type   |  Modifiers
 -+--+-
  id  | bigint   | not null default 
 nextval('events_id_seq'::regclass)
  user_id | citext   |
  session_id  | citext   | not null
  product_id  | citext   | not null
  site_id | citext   | not null
  type| text | not null
  happened_at | timestamp with time zone | not null
  created_at  | timestamp with time zone | not null
 Indexes:
 events_pkey PRIMARY KEY, btree (id)
 events_product_id_site_id_idx btree (product_id, site_id)
 events_session_id_type_product_id_idx btree (session_id, type, 
 product_id)
 Check constraints:
 events_session_id_check CHECK (length(session_id::text)  255)
 events_type_check CHECK (type = ANY (ARRAY['purchased'::text, 
 'viewed'::text]))
 events_user_id_check CHECK (length(user_id::text)  255)




 After removing the session with 400k events, I was able to do an explain
 analyze, here is one of them:
 http://explain.depesz.com/s/PFNk





Re: [PERFORM] query a table with lots of coulmns

2014-09-19 Thread Pavel Stehule
2014-09-19 13:51 GMT+02:00 Björn Wittich bjoern_witt...@gmx.de:

 Hi mailing list,

 I am relatively new to postgres. I have a table with 500 coulmns and about
 40 mio rows. I call this cache table where one column is a unique key
 (indexed) and the 499 columns (type integer) are some values belonging to
 this key.

 Now I have a second (temporary) table (only 2 columns one is the key of my
 cache table) and I want  do an inner join between my temporary table and
 the large cache table and export all matching rows. I found out, that the
 performance increases when I limit the join to lots of small parts.
 But it seems that the databases needs a lot of disk io to gather all 499
 data columns.
 Is there a possibilty to tell the databases that all these colums are
 always treated as tuples and I always want to get the whole row? Perhaps
 the disk oraganization could then be optimized?


sorry for offtopic

array databases are maybe better for your purpose

http://rasdaman.com/
http://www.scidb.org/




 Thank you for feedback and ideas
 Best
 Neo


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] query performance with hstore vs. non-hstore

2014-09-01 Thread Pavel Stehule
Hi

In this use case hstore should not help .. there is relative high overhead
related with unpacking hstore -- so classic schema is better.

Hstore should not to replace well normalized schema - it should be a
replace for some semi normalized structures as EAV.

Hstore can have some profit from TOAST .. comprimation, less system data
overhead, but this advantage started from some length of data. You should
to see this benefit on table size. When table with HStore is less than
without, then there is benefit of Hstore. Last benefit of Hstore are
indexes over tuple (key, value) .. but you don't use it.

Regards

Pavel


2014-09-01 8:10 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:

  Hi ,



 I’m tweaking table layout to get better performance of query. One table
 doesn’t use hstore but expand all metrics of cha_type to different rows.
 The other table has hstore for metrics column as cha_type-metrics so it
 has less records than the first one.



 I would be expecting the query on seconds table has better performance
 than the first one. However, it’s not the case at all. I’m wondering if
 there’s something wrong with my execution plan? With the hstore table, the
 optimizer has totally wrong estimation on row counts at hash aggregate
 stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10
 seconds on sort. However, with non-hstore table, it takes 17 seconds on
 hash join, 18 seconds on hashaggregate and 2 seconds on sort.



 Can someone help me to explain why this is happening? And is there a way
 to fine-tune the query?



 Table structure



 dev=# \d+ weekly_non_hstore

  Table test.weekly_non_hstore

   Column  |  Type  | Modifiers | Storage  | Stats target |
 Description


 --++---+--+--+-

 date | date   |   | plain|  |

 ref_id| character varying(256) |   | extended |  |

 cha_typel  | text   |   | extended |  |

 visits   | double precision   |   | plain|  |

 pages| double precision   |   | plain|  |

 duration | double precision   |   | plain|  |

 Has OIDs: no

 Tablespace: tbs_data



 dev=# \d+ weekly_hstore

Table test.weekly_hstore

   Column  |  Type  | Modifiers | Storage  | Stats target |
 Description


 --++---+--+--+-

 date | date   |   | plain|  |

 ref_id| character varying(256) |   | extended |  |

 visits   | hstore |   | extended |  |

 pages| hstore |   | extended |  |

 duration | hstore |   | extended |  |

 Has OIDs: no

 Tablespace: tbs_data



 dev=# select count(*) from weekly_non_hstore;

   count

 --

 71818882

 (1 row)





 dev=# select count(*) from weekly_hstore;

   count

 -

 1292314

 (1 row)





 Query

 dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore
 a  join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits)
 desc;


  QUERY PLAN


 

 Sort  (cost=3674073.37..3674431.16 rows=143115 width=27) (actual
 time=47520.637..47969.658 rows=3639539 loops=1)

Sort Key: (sum(a.visits))

Sort Method: quicksort  Memory: 391723kB

-  HashAggregate  (cost=3660386.70..3661817.85 rows=143115 width=27)
 (actual time=43655.637..44989.202 rows=3639539 loops=1)

  -  Hash Join  (cost=12029.58..3301286.54 rows=71820032 width=27)
 (actual time=209.789..26477.652 rows=36962761 loops=1)

Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)

-  Seq Scan on weekly_non_hstore a  (cost=0.00..1852856.32
 rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1)

-  Hash  (cost=7382.59..7382.59 rows=371759 width=47)
 (actual time=209.189..209.189 rows=371759 loops=1)

  Buckets: 65536  Batches: 1  Memory Usage: 28951kB

  -  Seq Scan on seg1 b  (cost=0.00..7382.59
 rows=371759 width=47) (actual time=0.014..64.695 rows=371759 loops=1)

 Total runtime: 48172.405 ms

 (11 rows)



 Time: 48173.569 ms



 dev=# explain analyze select cha_type, sum(visits) from (select
 (each(visits)).key as cha_type,(each(visits)).value::numeric as visits from
 weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group by cha_type
 order by sum(visits) desc;

QUERY
 PLAN


 

Re: [PERFORM] query performance with hstore vs. non-hstore

2014-09-01 Thread Pavel Stehule
2014-09-01 8:54 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:

  Thank you Pavel.



 The cost of unpacking hstore comparing to non-hstore could be calculated
 by:

 Seq scan on hstore table + hash join with seg1 table:

 Hstore: 416.741+ 34619.879 =~34 seconds

 Non-hstore: 8858.594 +26477.652 =~ 34 seconds



 The subsequent hash-aggregate and sort operation should be working on the
 unpacked hstore rows which has same row counts as non-hstore table.
 however, timing on those operations actually makes the big difference.



 I don’t quite get why…


These values can be messy -- timing in EXPLAIN ANALYZE has relative big
impact but different for some methods

try to watch complete time for EXPLAIN (ANALYZE, TIMING OFF)




 Thanks,

 Suya



 *From:* Pavel Stehule [mailto:pavel.steh...@gmail.com]
 *Sent:* Monday, September 01, 2014 4:22 PM
 *To:* Huang, Suya
 *Cc:* pgsql-performance@postgresql.org
 *Subject:* Re: [PERFORM] query performance with hstore vs. non-hstore



 Hi

 In this use case hstore should not help .. there is relative high overhead
 related with unpacking hstore -- so classic schema is better.

 Hstore should not to replace well normalized schema - it should be a
 replace for some semi normalized structures as EAV.

 Hstore can have some profit from TOAST .. comprimation, less system data
 overhead, but this advantage started from some length of data. You should
 to see this benefit on table size. When table with HStore is less than
 without, then there is benefit of Hstore. Last benefit of Hstore are
 indexes over tuple (key, value) .. but you don't use it.

 Regards

 Pavel



 2014-09-01 8:10 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:

 Hi ,



 I’m tweaking table layout to get better performance of query. One table
 doesn’t use hstore but expand all metrics of cha_type to different rows.
 The other table has hstore for metrics column as cha_type-metrics so it
 has less records than the first one.



 I would be expecting the query on seconds table has better performance
 than the first one. However, it’s not the case at all. I’m wondering if
 there’s something wrong with my execution plan? With the hstore table, the
 optimizer has totally wrong estimation on row counts at hash aggregate
 stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10
 seconds on sort. However, with non-hstore table, it takes 17 seconds on
 hash join, 18 seconds on hashaggregate and 2 seconds on sort.



 Can someone help me to explain why this is happening? And is there a way
 to fine-tune the query?



 Table structure



 dev=# \d+ weekly_non_hstore

  Table test.weekly_non_hstore

   Column  |  Type  | Modifiers | Storage  | Stats target |
 Description


 --++---+--+--+-

 date | date   |   | plain|  |

 ref_id| character varying(256) |   | extended |  |

 cha_typel  | text   |   | extended |  |

 visits   | double precision   |   | plain|  |

 pages| double precision   |   | plain|  |

 duration | double precision   |   | plain|  |

 Has OIDs: no

 Tablespace: tbs_data



 dev=# \d+ weekly_hstore

Table test.weekly_hstore

   Column  |  Type  | Modifiers | Storage  | Stats target |
 Description


 --++---+--+--+-

 date | date   |   | plain|  |

 ref_id| character varying(256) |   | extended |  |

 visits   | hstore |   | extended |  |

 pages| hstore |   | extended |  |

 duration | hstore |   | extended |  |

 Has OIDs: no

 Tablespace: tbs_data



 dev=# select count(*) from weekly_non_hstore;

   count

 --

 71818882

 (1 row)





 dev=# select count(*) from weekly_hstore;

   count

 -

 1292314

 (1 row)





 Query

 dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore
 a  join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits)
 desc;


  QUERY PLAN


 

 Sort  (cost=3674073.37..3674431.16 rows=143115 width=27) (actual
 time=47520.637..47969.658 rows=3639539 loops=1)

Sort Key: (sum(a.visits))

Sort Method: quicksort  Memory: 391723kB

-  HashAggregate  (cost=3660386.70..3661817.85 rows=143115 width=27)
 (actual time=43655.637..44989.202 rows=3639539 loops=1)

  -  Hash Join  (cost=12029.58..3301286.54 rows=71820032 width=27)
 (actual time

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Pavel Stehule
2014-06-30 20:34 GMT+02:00 Jeff Frost j...@pgexperts.com:

 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:




 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com
 wrote:


 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...


For this kind of issues a systemtap or dtrace can be useful

http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu

you can identify what locking is a problem - please, use a google translate

Regards

Pavel


  What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.

 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 It is
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think

 i also use hot standby. we have 4 streaming replica, some of them has
 active connection some has not. this issue has last more than 4 days. On
 one of the standby, above postgres process is the only process that consume
 high cpu load.


 compiled with -fno-omit-frame-pointer doesn't yield much more info:

  76.24%  postgres   [.] StandbyReleaseLocks
   2.64%  libcrypto.so.1.0.1e[.]
 md5_block_asm_data_order
   2.19%  libcrypto.so.1.0.1e[.] RC4
   2.17%  postgres   [.] RecordIsValid
   1.20%  [kernel]   [k]
 copy_user_generic_unrolled
   1.18%  [kernel]   [k] _spin_unlock_irqrestore
   0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
   0.87%  [kernel]   [k] __do_softirq
   0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
   0.69%  postgres   [.]
 hash_search_with_hash_value
   0.68%  [kernel]   [k] fin

 However, this server started progressing through the WAL files quite a bit
 better before I finished compiling, so we'll leave it running with this
 version and see if there's more info available the next time it starts
 replaying slowly.





Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24

2014-06-19 Thread Pavel Stehule
2014-06-20 1:44 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:



 From: Pavel Stehule [mailto:pavel.steh...@gmail.com]
 Sent: Thursday, June 19, 2014 3:41 PM
 To: Huang, Suya
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24



 2014-06-19 7:35 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:
 From: Pavel Stehule [mailto:pavel.steh...@gmail.com]
 Sent: Thursday, June 19, 2014 3:28 PM
 To: Huang, Suya
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24

 Hello

 The size of statfile is related to size of database objects in database.
 Depends on PostgreSQL version this file can be one per database cluster or
 one per database (from 9.3),
 These statistics should by reset by call pg_stat_reset()
 http://www.postgresql.org/docs/9.2/static/monitoring-stats.html
 Autovacuum on large stat files has significant overhead - it can be
 increasing by using new PostgreSQL (9.3) and by migration stat directory to
 ramdisk - by setting stats_temp_directory to some dir on ramdisk (tmpfs on
 Linux)
 Regards

 Pavel

 2014-06-19 6:38 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:
 Hi group,

 We’ve found huge pgstat.stat file on our production DB boxes, the size is
 over 100MB. autovacuum is enabled. So my question would be:
 1.   What’s a reasonable size of pgstat.stat file, can it be estimated?
 2.   What’s the safest way to reduce the file size to alleviate the IO
 impact on disk?
 3.   If need to drop all statistics, would a “analyze DB” command
 enough to eliminate the performance impact on queries?

 Thanks,
 Suya



 Hi Pavel,

 our version is 8.3.24, not 9.3. I also want to know the impact caused by
 run pg_stat_reset to application, is that able to be mitigated by doing an
 analyze database command?

 your version is too old  - you can try reset statistics. ANALYZE statement
 should not have a significant impact on these runtime statistics.
 Pavel

 Attention: PostgreSQL 8.3 is unsupported now



 Thanks,
 Suya


 Thanks Pavel, to be more clear, what does  pg_stat_reset really reset?
 In the document it says  Reset all statistics counters for the current
 database to zero(requires superuser privileges) .  I thought it would
 reset all statistics of all tables/indexes, thus why I am thinking of
 re-run analyze database to gather statistics. Because if table/indexes
 don't have statistics, the query plan would be affected which is not a good
 thing to a production box... I'm not so sure if I understand run
 statistics you mentioned here.


you have true - anyway you can clean a content of this directory - but if
your database has lot of database objects, your stat file will have a
original size very early

Pavel





 Thanks,
 Suya






Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24

2014-06-18 Thread Pavel Stehule
Hello


The size of statfile is related to size of database objects in database.
Depends on PostgreSQL version this file can be one per database cluster or
one per database (from 9.3),

These statistics should by reset by call pg_stat_reset()
http://www.postgresql.org/docs/9.2/static/monitoring-stats.html

Autovacuum on large stat files has significant overhead - it can be
increasing by using new PostgreSQL (9.3) and by migration stat directory to
ramdisk - by setting stats_temp_directory to some dir on ramdisk (tmpfs on
Linux)

Regards

Pavel



2014-06-19 6:38 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:

  Hi group,



 We’ve found huge pgstat.stat file on our production DB boxes, the size is
 over 100MB. autovacuum is enabled. So my question would be:

 1.   What’s a reasonable size of pgstat.stat file, can it be
 estimated?

 2.   What’s the safest way to reduce the file size to alleviate the
 IO impact on disk?

 3.   If need to drop all statistics, would a “analyze DB” command
 enough to eliminate the performance impact on queries?



 Thanks,

 Suya



Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24

2014-06-18 Thread Pavel Stehule
2014-06-19 7:35 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:

 From: Pavel Stehule [mailto:pavel.steh...@gmail.com]
 Sent: Thursday, June 19, 2014 3:28 PM
 To: Huang, Suya
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] huge pgstat.stat file on PostgreSQL 8.3.24

 Hello

 The size of statfile is related to size of database objects in database.
 Depends on PostgreSQL version this file can be one per database cluster or
 one per database (from 9.3),
 These statistics should by reset by call pg_stat_reset()
 http://www.postgresql.org/docs/9.2/static/monitoring-stats.html
 Autovacuum on large stat files has significant overhead - it can be
 increasing by using new PostgreSQL (9.3) and by migration stat directory to
 ramdisk - by setting stats_temp_directory to some dir on ramdisk (tmpfs on
 Linux)
 Regards

 Pavel

 2014-06-19 6:38 GMT+02:00 Huang, Suya suya.hu...@au.experian.com:
 Hi group,

 We’ve found huge pgstat.stat file on our production DB boxes, the size is
 over 100MB. autovacuum is enabled. So my question would be:
 1.   What’s a reasonable size of pgstat.stat file, can it be estimated?
 2.   What’s the safest way to reduce the file size to alleviate the IO
 impact on disk?
 3.   If need to drop all statistics, would a “analyze DB” command
 enough to eliminate the performance impact on queries?

 Thanks,
 Suya




 Hi Pavel,

 our version is 8.3.24, not 9.3. I also want to know the impact caused by
 run pg_stat_reset to application, is that able to be mitigated by doing an
 analyze database command?


your version is too old  - you can try reset statistics. ANALYZE statement
should not have a significant impact on these runtime statistics.

Pavel

Attention: PostgreSQL 8.3 is unsupported now




 Thanks,
 Suya




Re: [PERFORM] Profiling PostgreSQL

2014-05-23 Thread Pavel Stehule
Dne 23.5.2014 16:41 Dimitris Karampinas dkaram...@gmail.com napsal(a):

 Thanks for your answers. A script around pstack worked for me.

 (I'm not sure if I should open a new thread, I hope it's OK to ask
another question here)

 For the workload I run it seems that PostgreSQL scales with the number of
concurrent clients up to the point that these reach the number of cores
(more or less).
 Further increase to the number of clients leads to dramatic performance
degradation. pstack and perf show that backends block on LWLockAcquire
calls, so, someone could assume that the reason the system slows down is
because of multiple concurrent transactions that access the same data.
 However I did the two following experiments:
 1) I completely removed the UPDATE transactions from my workload. The
throughput turned out to be better yet the trend was the same. Increasing
the number of clients, has a very negative performance impact.
 2) I deployed PostgreSQL on more cores. The throughput improved a lot. If
the problem was due to concurrecy control, the throughput should remain the
same - no matter the number of hardware contexts.

 Any insight why the system behaves like this ?

Physical limits, there two possible botleneck: cpu or io. Postgres use one
cpu per session, and if you have cpu intensive benchmark, then max should
be in cpu related workers. Later a workers shares cpu, bu total throughput
should be same to cca 10xCpu (depends on test)


 Cheers,
 Dimitris


 On Fri, May 23, 2014 at 1:39 AM, Michael Paquier 
michael.paqu...@gmail.com wrote:

 On Thu, May 22, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Call graph data usually isn't trustworthy unless you built the program
  with -fno-omit-frame-pointer ...
 This page is full of ideas as well:
 https://wiki.postgresql.org/wiki/Profiling_with_perf
 --
 Michael




Re: [PERFORM] Stats collector constant I/O

2014-05-15 Thread Pavel Stehule
Hello

we had similar issue - you can try to move statfile to ramdisc

http://serverfault.com/questions/495057/too-much-i-o-generated-by-postgres-stats-collector-process

Regards

Pavel Stehule


2014-05-15 6:18 GMT+02:00 Craig James cja...@emolecules.com:

 Day and night, the postgres stats collector process runs at about 20
 MB/sec output.  vmstat shows this:

 $ vmstat 2
 procs ---memory-- ---swap-- -io -system--
 cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
 wa
  0  0  55864 135740 123804 1071292846   445  264200  5  1
 92  2
  1  0  55864 134820 123804 1071301200 0 34880  540  338  1  1
 98  0
  0  0  55864 135820 123812 1071289600 0 20980  545  422  1  1
 98  0

 iotop(1) shows that it's the stats collector, running at 20 MB/sec.

 Is this normal?

 Craig



Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
Hello


2014-05-01 21:17 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 20:35:07, skrev Jochem Berndsen 
 joc...@functor.nl:


 Hi Andreas,

 [New to this list, forgive my ignorance.]
 [snip]
 I'm getting better performance with:

 SELECT
 m.id AS message_id,
 1 AS person_id,
 FALSE AS is_read,
 m.subject
 FROM message m
 WHERE 1 = 1
 AND NOT EXISTS(SELECT
  *
  FROM message_property pr
  WHERE pr.message_id = m.id AND pr.person_id = 1 AND pr.is_read);

 You then lose the distinction between message_property with is_read =
 FALSE, and nonexistent message_property for the message row.

 If that is essential, I'm getting a roughly 2x speedup on my non-tuned
 PostgreSQL with:
   SELECT
  m.id  AS message_id,
  prop.person_id,
  coalesce(prop.is_read, FALSE) AS is_read,
  m.subject
 FROM message m
  LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
 prop.person_id = 1
 WHERE not coalesce(prop.is_read, false);



 Hi Jochem,

 Thansk for looking at it. I'm still seing ~500ms being spent and I was
 hoping for a way to do this using index so one could achieve 1-10ms, but
 maybe that's impossible given the schema?

 Is there a way to design an equivalent  schema to achieve 10ms
 execution-time?


I had a perfect success on similar use case with descent ordered partial
index

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html

Regards

Pavel



  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
 pavel.steh...@gmail.com:

 Hello
 [snip]

 I had a perfect success on similar use case with descent ordered partial
 index

 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html


 I'm not getting good performance. Are you able to craft an example using
 my schema and partial index?


maybe some like

CREATE INDEX ON message_property (person_id, message_id) WHERE pr.is_read

When I am thinking about your schema, it is designed well, but it is not
index friendly, so for some fast access you should to hold a cache (table)
of unread messages.

Regards

Pavel



 Thanks.

  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule 
 pavel.steh...@gmail.com:



 2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
 pavel.steh...@gmail.com:

 Hello
 [snip]

 I had a perfect success on similar use case with descent ordered partial
 index

 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html


 I'm not getting good performance. Are you able to craft an example using
 my schema and partial index?


 maybe some like

 CREATE INDEX ON message_property (person_id, message_id) WHERE pr.is_read

 When I am thinking about your schema, it is designed well, but it is not
 index friendly, so for some fast access you should to hold a cache (table)
 of unread messages


 Ah, that's what I was hoping to not having to do. In my system, messages
 arrive all the time and having to update a cache for all new messages for
 all users seems messy... Seems I could just as well create a
 message_property for all users when a new message arrives, so I can INNER
 JOIN it and get good performance. But that table will quickly grow *very*
 large...


What you need is a JOIN index, that is not possible in Postgres.

I afraid so some ugly solutions is necessary (when you require extra fast
access). You need a index (small index) and it require some existing set -
you cannot do index on the difference two sets.

I expect so some flag on the relation message - like it should not be
not read can helps little bit - and can be used in partial index as
conditions. Other possibility is some variant of partitioning - you can
divide a messages and users to distinct sets and then you decrease a number
of possible combinations.

Regards

Pavel



  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] pl/pgsql performance

2014-04-25 Thread Pavel Stehule
Hello


2014-04-25 10:48 GMT+02:00 Mehdi Ravanbakhsh baba...@gmail.com:

 Hi All

  I have one big Pl/Pgsql function (about 1500 line code) , i can divided
 it to about 5  part and call each script  from main script .  In this case
 i need to know  which way is faster .

 and some question about pgsql :

 1- is  pgsql engine open one process for each script ?


PostgreSQL uses one CPU per session.


 2- how i can chose max connection number for pgsql server based on cpu
 core and RAM capacity that have maximum Efficiency?


usually max performance is about 10 x CPU connections. But it highly
depends on load.

Regards

Pavel Stehule



 Thanks and Best ergards.



Re: [PERFORM] [BUGS] Very slow query in PostgreSQL 9.3.3

2014-03-13 Thread Pavel Stehule
 of complete partitions. And it is
slow - it is strange so reading 300K rows needs a 5K sec. Probably your IO
is overloaded.

Regards

Pavel Stehule



 thanks



Re: [PERFORM] increasing query time after analyze

2014-02-12 Thread Pavel Stehule
2014-02-12 9:58 GMT+01:00 Katharina Koobs katharina.ko...@uni-konstanz.de:

 explain.depesz.com/s/HuZ


fast query is fast due intesive use a hashjoins

but you can see

Hash Left Join  (cost=9343.05..41162.99 rows=6889 width=1350) (actual
time=211.767..23519.296 rows=639137 loops=1)

a estimation is out. Is strange so after ANALYSE a estimation is worse

Nested Loop Left Join  (cost=1122.98..28246.98 rows=1 width=1335)  (actual
time=33.222..14631581.741 rows=639137 loops=1)

So it looks some in data is strange - probably dependency between columns:
sos_stg_aggr.stichtag = sos_stichtag.tid, sos_stg_aggr.stuart = cifx.apnr

 Hash Join  (cost=1121.04..24144.02 rows=57 width=339)  (actual
time=2.407..11157.151 rows=639221 loops=1)

   - Hash Cond: (sos_stg_aggr.stuart = cifx.apnr)


Nested loop based plan is extremely sensitive to this wrong estimation.

You can try:

* penalize nested loop - set enable_nested_loop to off; -- for this query
* divide this query to more queries - store result temporary table and
analyze (fix wrong estimation)
* maybe you can increase a work_mem

Regards

Pavel Stehule


Re: [PERFORM] increasing query time after analyze

2014-02-05 Thread Pavel Stehule
Hello


2014-02-05 Katharina Koobs katharina.ko...@uni-konstanz.de:

 Hi,



 We have a PostgreSQL DB, version 9.3 on a Suse Linux system.

 We ran the update from postgresql 8.4 to 9.3.

 After importing the database the query time of one sql query is about 30
 sec.

 After ANALYZE the DB the query time of this sql query is about 45 minutes.

 We can see that after analyzing the indexes will no longer be used.



 Has anyone an idea why ANALYZE cause this problem?


yes, it is possible - sometimes due more reasons (some strange dataset or
correlation between columns) a statistics estimations are totally out. And
bad musty statistics can produces better estimations than fresh statistics

please send a EXPLAIN ANALYZE output for fast and slow queries.

Regards

Pavel Stehule




 Thanks a lot for your help!



 Katharina







Re: [PERFORM] Re: Problem with slow query with WHERE conditions with OR clause on primary keys

2013-12-19 Thread Pavel Stehule
2013/12/18 kolsze...@gmail.com kolsze...@gmail.com

 Thanx for your answer

 My example is trivial because i want to show strange (for me) postgres
 behavior with dealing with primary keys (extreme example), in real
 situation
 user put search condition e.g.  Panas and this generates query
 ...
 where gd.other_code like 'Panas%' OR g.code like 'Panas%'
 ..

 both columns has very good indexes and selectivity for like 'Panas%' ...

 I have experience from Oracle with this type of queries, and Oracle have no
 problem with it,
 executes select on index on other_code from gd and join g
 in next step executes select on index on code from g and join gd
 and this two results are connected in last step (like union)
 very fast on minimal cost

 and in my opinion read whole huge tables only for 10 rows in result where
 conditions are very good  ... is strange


Maybe index is not in good form

try to build index with varchar_pattern_ops flag

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#LIKE_optimalization

CREATE INDEX like_index ON people(surname varchar_pattern_ops);

Regards

Pavel Stehule










 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Problem-with-slow-query-with-WHERE-conditions-with-OR-clause-on-primary-keys-tp5782803p5783927.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] One huge db vs many small dbs

2013-12-05 Thread Pavel Stehule
2013/12/6 Josh Berkus j...@agliodbs.com

 On 12/05/2013 02:42 AM, Max wrote:
  Hello,
 
  We are starting a new project to deploy a solution in cloud with the
 possibility to be used for 2.000+ clients. Each of this clients will use
 several tables to store their information (our model has about 500+ tables
 but there's less than 100 core table with heavy use). Also the projected
 ammout of information per client could be from small (few hundreds
 tuples/MB) to huge (few millions tuples/GB).
 
  One of the many questions we have is about performance of the db if we
 work with only one (using a ClientID to separete de clients info) or
 thousands of separate dbs. The management of the dbs is not a huge concert
 as we have an automated tool.

 In addition to the excellent advice from others, I'll speak from
 experience:

 The best model here, if you can implement it, is to implement shared
 tables for all customers, but have a way you can break out customers
 to their own database(s).  This allows you to start with a single
 database, but to shard out your larger customers as they grow.  The
 small customers will always stay on the same DB.

 That means you'll also treat the different customers as different DB
 connections from day 1.  That way, when you move the large customers out
 to separate servers, you don't have to change the way the app connects
 to the database.

 If you can't implement shared tables, I'm going to say go for separate
 databases.  This will mean lots of additional storage space -- the
 per-DB overhead by itself will be 100GB -- but otherwise you'll be
 grappling with the issues involved in having a million tables, which Joe
 Conway outlined.  But if you don't have shared tables, your huge schema
 is always going to cause you to waste resources on the smaller customers.



+1

Pavel



 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Reseting statistics counters

2013-10-01 Thread Pavel Stehule
Hello


2013/10/1 Xenofon Papadopoulos xpa...@gmail.com

 If we reset the statistics counters using pg_stat_reset() will it affect
 the performance of the database? Eg are these the same statistics used by
 the planner?
 Thanks


these statistics are used only for autovacuum, what I know. So you can
impact a autovacuum, but no planner

Regards

Pavel


Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-26 Thread Pavel Stehule
Hello


It is little bit strange - can you send a info about your PostgreSQL
version, send a query, and table description?

In this case, PostgreSQL should to use a hash aggregate, but from some
strange reason, pg didn't do it.

Second strange issue is speed of external sort - it is less than I can
expect.

What I know - a usual advice for MS Win is setting minimal shared bufferes
- 512MB can be too much there.

Regards

Pavel Stehule


2013/8/26 Adam Ma'ruf adam.ma...@gmail.com

 Hi,

 I wasn't whether or not to mail to the novice mailing list of this one.
  Since this is performance related I'm posting it here, but I am definitely
 a novice at postgresql - converting from mssql just now.

 I have a ~2.5gb table with ~5M rows of data.  A query that groups by two
 fields and sums a floating field takes approximately 122 seconds.  The
 equivalent query takes ~ 8seconds in my previous sql server express
 installation.

 I've tried to vary the parameters in postgresql.conf:
 I've tried wavering shared buffers from 512mb to 4000mb
 and working_mem from 64mb to 4000mb (i thought this might be the answer
 since the execution plan (referenced below) indicates that the sort relies
 on an External Merge Disk method)
 I've increased the default_statistics_target  to 1 and full vacuum
 analyzed
 I realize there are no indexes on this table.  My main concern is why I
 can't get this to run as fast as in sql server express (which also has no
 indexes, and the same query takes about 8 seconds)

 My system:  Windows Professional 64-bit
 8 gb of ram
 Intel i5-220M CPU @ 2.5GHz

 Here is the link to the execution plan:  http://explain.depesz.com/s/Ytx3

 Thanks a lot in advance and do let me know if you require any more
 information to make an informed opinion,
 A



Re: [PERFORM] Function execute slow down in 9.2

2013-08-21 Thread Pavel Stehule
2013/8/16 Александр Белинский avinf...@gmail.com

 12.08.2013 18:24, Pavel Stehule пишет:

  Hello

 it looks like known issue of sometimes dysfunctional plan cache in
 plpgsql in 9.2.

 similar issuehttp://postgresql.**1045698.n5.nabble.com/**
 Performance-problem-in-**PLPgSQL-td5764796.htmlhttp://postgresql.1045698.n5.nabble.com/Performance-problem-in-PLPgSQL-td5764796.html

 Thanks for the link ) I read about issue, but I can't understand what
 should I do?


You can do nothing. You can check, so described issue is same as your
issue, and that is all :(.

It is bug in plan cache implementation.

Regards

Pavel


 i chage values of seq_page_cost
 =1.0
 =10.0
 = 100.0
 = 0.1
 =0.01

  but nothing chage, time of function execution the same.








Re: [PERFORM] Interesting case of IMMUTABLE significantly hurting performance

2013-08-13 Thread Pavel Stehule
2013/8/14 Craig Ringer cr...@2ndquadrant.com

 Hi folks

 I've run into an interesting Stack Overflow post where the user shows
 that marking a particular function as IMMUTABLE significantly hurts the
 performance of a query.

 http://stackoverflow.com/q/18220761/398670

 CREATE OR REPLACE FUNCTION
   to_datestamp_immutable(time_int double precision) RETURNS date AS $$
   SELECT date_trunc('day', to_timestamp($1))::date;
 $$ LANGUAGE SQL IMMUTABLE;

 With IMMUTABLE:  33060.918
 With STABLE: 6063.498

 The plans are the same for both, though the cost estimate for the
 IMMUTABLE variant is (surprisingly) massively higher.

 The question contains detailed instructions to reproduce the issue, and
 I can confirm the same results on my machine.

 It looks like the difference is created by to_timestamp , in that if
 to_timestamp is replaced with interval maths the difference goes away.

 I'm very curious and am doing a quick profile now, but I wanted to raise
 this on the list for comment/opinions, since it's very
 counter-intuitive. IIRC docs don't suggest that IMMUTABLE can ever be
 more expensive.



If I understand, a used IMMUTABLE flag disables inlining. What you see, is
SQL eval overflow.

My rule is - don't use flags in SQL functions, when it is possible.

Pavel



 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Function execute slow down in 9.2

2013-08-12 Thread Pavel Stehule
Hello

it looks like known issue of sometimes dysfunctional plan cache in
plpgsql in 9.2.

similar issue 
http://postgresql.1045698.n5.nabble.com/Performance-problem-in-PLPgSQL-td5764796.html

Regards

Pavel Stehule

2013/8/12 Александр Белинский avinf...@gmail.com:
 Hi!
 I can't explain why function is slow down on same data.
 Postgresql.conf the same, hardware is more powerful.
 Diffrents is postgresql version

 Here it;s my tests

 Server 1 PSQL 9.1

 FIRST RUN
 EXPLAIN (ANALYZE, BUFFERS)  SELECT webclient.prc_ti_cache_alloc_dbl_update(
 21325134
 );

 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1399.586..1399.587
 rows=1 loops=1)'
 '  Buffers: shared hit=40343 read=621'
 'Total runtime: 1399.613 ms'

 SECOND RUN
 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=42.540..42.541 rows=1
 loops=1)'
 '  Buffers: shared hit=37069'
 'Total runtime: 42.558 ms'

 THIRD RUN
 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=198.893..198.894
 rows=1 loops=1)'
 '  Buffers: shared hit=37069'
 'Total runtime: 198.908 ms'


 Server 2 PSQL 9.2

 FIRST RUN
 EXPLAIN (ANALYZE, BUFFERS)  SELECT webclient.prc_ti_cache_alloc_dbl_update(
 21325134
 );

 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1328.103..1328.104
 rows=1 loops=1)'
 '  Buffers: shared hit=43081 read=233 written=36'
 'Total runtime: 1328.129 ms'

 SECOND RUN
 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1699.711..1699.712
 rows=1 loops=1)'
 '  Buffers: shared hit=42919'
 'Total runtime: 1699.737 ms'

 THIRD RUN
 'Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1907.947..1907.948
 rows=1 loops=1)'
 '  Buffers: shared hit=42869'
 'Total runtime: 1907.965 ms'






 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Pavel Stehule
2013/8/7 Igor Neyman iney...@perceptron.com:


 From: pgsql-performance-ow...@postgresql.org 
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
 Sent: Wednesday, August 07, 2013 8:43 AM
 To: Pavel Stehule
 Cc: pgsql-performance@postgresql.org
 Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
 view with another view inside of it.

 Good day,

 I have included a link to the result of EXPLAIN ANALYZE. It's this one:
 https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h

 Here's a link to Depesz's explain (if links to the site are okay):
 http://explain.depesz.com/s/gCk

 I have just tried setting geqo_threshold, join_collapse_limit and 
 from_collapse_limit to 16, but it yielded no improvement.
 Changing those three parameters to 32 did speed up the query from about 3.3 
 seconds to about a second (give or take 50 ms), which is a pretty good 
 improvement, but not quite there, as I'm looking to bring it down to about 
 300 ms if possible. Changing those three settings to 48 yielded no 
 improvements over 32.
 Is there possibly something something else to tweak there?

 Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
 http://explain.depesz.com/s/cj2

 Thank you.

 Peter Slapansky

 -

 Your last explain analyze (with 3 settings set to 32)  shows query duration 
 10ms, not 1sec.
 Am I wrong?

I afraid so 1 sec is planning time :( .. So execution is fast, but
planning is expensive and relatively slow .. maybe prepared statements
can helps in this case.

Regards

Pavel


 Regards,
 Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Pavel Stehule
2013/8/7  sl...@centrum.sk:
 You're right, it does... but it's quite odd, because I re-ran the
 explain-analyze statement and got the same results.

 Still, the query now runs for about a second as mentioned before, so it's
 almost like something's missing from the explain, but I'm certain I copied
 it all.

what is time of EXPLAIN only ?

Pavel




 I did this via pgadmin, but that shouldn't matter, should it?



 Thank you,



 Peter Slapansky

 __
 Od: Igor Neyman iney...@perceptron.com
 Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule
 pavel.steh...@gmail.com
 Dátum: 07.08.2013 15:47
 Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated
 query on a view with another view inside of it.


 CC: pgsql-performance@postgresql.org

 Your last explain analyze (with 3 settings set to 32)  shows query duration
 10ms, not 1sec.
 Am I wrong?

 Regards,
 Igor Neyman



 __


 From: pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of
 sl...@centrum.sk
 Sent: Wednesday, August 07, 2013 8:43 AM
 To: Pavel Stehule
 Cc: pgsql-performance@postgresql.org
 Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a
 view with another view inside of it.

 Good day,

 I have included a link to the result of EXPLAIN ANALYZE. It's this one:
 https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h

 Here's a link to Depesz's explain (if links to the site are okay):
 http://explain.depesz.com/s/gCk

 I have just tried setting geqo_threshold, join_collapse_limit and
 from_collapse_limit to 16, but it yielded no improvement.
 Changing those three parameters to 32 did speed up the query from about 3.3
 seconds to about a second (give or take 50 ms), which is a pretty good
 improvement, but not quite there, as I'm looking to bring it down to about
 300 ms if possible. Changing those three settings to 48 yielded no
 improvements over 32.
 Is there possibly something something else to tweak there?

 Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
 http://explain.depesz.com/s/cj2

 Thank you.

 Peter Slapansky



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-06 Thread Pavel Stehule
Hello

please, send result of EXPLAIN ANALYZE

please, use a http://explain.depesz.com/ for saving a plan

there is a more than 8 joins - so try to set geqo_threshold to 16,
join_collapse_limit to 16, and from_collapse_limit to 16.

Regards

Pavel Stehule

2013/8/2  sl...@centrum.sk:
 Good day,

 I have a performance issue when JOINing a view within another view more than 
 once.
 The query takes over three seconds to execute, which is too long in this 
 case. It's not a problem if the tables are nearly empty, but that isn't the 
 case on the production database.

 I suspect the planner thinks it's better to first put together the v_address 
 view and JOIN it to the parcel table later on, but the function 
 fx_get_user_tree_subordinates_by_id should be JOINed to the parcel table 
 first, as it reduces the number of rows to less than 200 and any following 
 JOINs would be much faster.

 I have also ran vacuum, reindex and analyze on the whole database, but it 
 seems to have had to effect.

 Is there any way to nudge the planner toward that way of execution?

 This is the query:
 https://app.box.com/s/jzxiuuxoyj28q4q8rzxr

 This is the query plan:
 https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h (plain text)
 https://app.box.com/s/jzxiuuxoyj28q4q8rzxr (graphical output)

 These are the views:
 https://app.box.com/s/uibzidsazwv3eeauovuk (paginated view)
 https://app.box.com/s/v71vyexmdyl97m4f3m6u (used three times in the paginated 
 view).


 Thank you.

 Peter Slapansky


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fwd: Relatively high planner overhead on partitions?

2013-07-19 Thread Pavel Stehule
Hello



2013/7/19 Skarsol skar...@gmail.com:
 I tried sending this a couple days ago but I wasn't a member of the group so
 I think it's in limbo. Apologies if a 2nd copy shows up at some point.

 We recently migrated a 1.3TB database from 8.4 to 9.2.2 on a new server. As
 part of this migration we added partitions to the largest tables so we could
 start removing old data to an archive database. Large queries perform much
 better due to not hitting the older data as expected. Small queries served
 from records in memory are suffering a much bigger performance hit than
 anticipated due to the partitioning.

 I'm able to duplicate this issue on our server trivially with these
 commands: http://pgsql.privatepaste.com/7223545173

 Running the queries from the command line 10k times (time psql testdb 
 test1.sql /dev/null) results in a 2x slowdown for the queries not using
 testtable_90 directly. (~4s vs ~2s).

if all data in your test living in memory - then bottleneck is in CPU
- and any other node in execution plan is significant.

It is not surprise, because OLTP relation databases are not well
optimized for this use case. A designers expected much more
significant impact of IO operations, and these databases are designed
to minimize bottleneck in IO - with relative low memory using. This
use case is better solved in OLAP databases (read optimized databases)
designed after 2000 year - like monetdb, verticadb, or last year cool
db HANA.

Regards

Pavel



 Running a similar single record select on a non-partitioned table averages
 10k in 2s.

 Running select 1; 10k times in the same method averages 1.8 seconds.

 This matches exactly what I'm seeing in our production database. The numbers
 are different, but the 2x slowdown persists. Doing a similar test on another
 table on production with 7 children and 3 check constraints per child
 results in a 3x slowdown.

 I'm aware that partitioning has an impact on the planner, but doubling the
 time of in memory queries with only 5 partitions with 1 check each is much
 greater than anticipated. Are my expectations off and this is normal
 behavior or is there something I can do to try and speed these in memory
 queries up? I was unable to find any information online as to the expected
 planner impact of X # of partitions.

 Database information follows:

 Red Hat Enterprise Linux Server release 6.4 (Santiago)
 Linux hostname.domainname 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29
 16:51:51 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
 PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
 20120305 (Red Hat 4.4.6-4), 64-bit

 Server info:
 4x Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
 128gb RAM

  DateStyle   | ISO, MDY
 | configuration file
  default_statistics_target   | 5000
 | configuration file
  default_text_search_config  | pg_catalog.english
 | configuration file
  effective_cache_size| 64000MB
 | configuration file
  effective_io_concurrency| 2
 | configuration file
  fsync   | on
 | configuration file
  lc_messages | C
 | configuration file
  lc_monetary | C
 | configuration file
  lc_numeric  | C
 | configuration file
  lc_time | C
 | configuration file
  max_connections | 500
 | configuration file
  max_stack_depth | 2MB
 | environment
  shared_buffers  | 32000MB
 | configuration file
  synchronous_commit  | on
 | configuration file
  TimeZone| CST6CDT
 | configuration file
  wal_buffers | 16MB
 | configuration file
  wal_level   | archive
 | configuration file
  wal_sync_method | fdatasync
 | configuration file





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] effective_cache_size on 32-bits postgres

2013-03-18 Thread Pavel Stehule
Hello

2013/3/18 Rodrigo Barboza rodrigombu...@gmail.com:
 Hi guys, I am worried about the effective_cache_size.
 I run a 32-bits postgres installation on a machine with 64 bits kernel.
 Should I limit effective_cache_size to a maximum of 2.5gb?

sure and probably little bit less

Regards

Pavel


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] effective_cache_size on 32-bits postgres

2013-03-18 Thread Pavel Stehule
2013/3/18 Pavel Stehule pavel.steh...@gmail.com:
 Hello

 2013/3/18 Rodrigo Barboza rodrigombu...@gmail.com:
 Hi guys, I am worried about the effective_cache_size.
 I run a 32-bits postgres installation on a machine with 64 bits kernel.
 Should I limit effective_cache_size to a maximum of 2.5gb?

 sure and probably little bit less

wrong reply - Rob has true

Pavel


 Regards

 Pavel


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] effective_cache_size on 32-bits postgres

2013-03-18 Thread Pavel Stehule
2013/3/18 Kevin Grittner kgri...@ymail.com:
 Rodrigo Barboza rodrigombu...@gmail.com wrote:

 So setting this as half of ram, as suggested in postgres tuning
 webpage should be safe?

 Half of RAM is likely to be a very bad setting for any work load.
 It will tend to result in the highest possible number of pages
 duplicated in PostgreSQL and OS caches, reducing the cache hit
 ratio.  More commonly given advice is to start at 25% of RAM,
 limited to 2GB on Windows or 32-bit systems or 8GB otherwise.  Try
 incremental adjustments from that point using your actual workload
 on you actual hardware to find the sweet spot.  Some DW
 environments report better performance assigning over 50% of RAM to
 shared_buffers; OLTP loads often need to reduce this to prevent
 periodic episodes of high latency.

you are speaking about shared_buffers now.

Pavel


 --
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Pavel Stehule
Hello

you can try procedural solution - use a cursor over ordered data in
plpgsql and returns table

Regards

Pavel Stehule

2013/3/11 Jeff Adams - NOAA Affiliate jeff.ad...@noaa.gov:
 Greetings,



 I have a large table (~90 million rows) containing vessel positions. In
 addition to a column that contains the location information (the_geom), the
 table also contains two columns that are used to uniquely identify the
 vessel (mmsi and name) and a column containing the Unix time (epoch) at
 which the position information was logged. I frequently need to assign
 records to vessel transits. To do this, I currently create a CTE that uses a
 Window function (partitioning the data by mmsi and name ordered by epoch) to
 examine the time that has elapsed between successive position reports for
 individual vessels. For every position record for a vessel (as identified
 using mmsi and name), if the time elapsed between the current position
 record and the previous record (using the lag function) is less than or
 equal to 2 hours, I assign the record a value of 0 to a CTE column named
 tr_index. If the time elapsed is greater than 2 hours, I assign the record a
 value of 1 to the tr_index column. I then use the CTE to generate transit
 numbers by summing the values in the tr_index field across a Window that
 also partitions the data by mmsi and name and is ordered by epoch. This
 works, but is very slow (hours). The table is indexed (multi-column index on
 mmsi, name and index on epoch). Does anyone see a way to get what I am after
 in a more efficient manner. What I am after is an assignment of transit
 number to vessels' position records based on whether the records were within
 two hours of each other. The SQL that I used is provided below. Any advice
 would be greatly appreciated...



 WITH

 cte_01 AS

 (

 SELECT

 a.id,

 a.mmsi,

 a.name,

 a.epoch,

 a.the_geom

 CASE

   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60)  120 THEN 1

   ELSE 0

 END AS tr_index

 FROM table a

 WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

 )





 SELECT

 a.id,

 a.mmsi,

 a.name,

 a.epoch,

 a.the_geom,

 1 + sum(a.tr_index) OVER w AS transit,

 a.active

 FROM cte_01 a

 WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)



 --
 Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   >