Re: [PERFORM] Slow update on column that is part of exclusion constraint

2016-04-15 Thread Evgeniy Shishkin

> On 14 Apr 2016, at 07:17, Adam Brusselback <adambrusselb...@gmail.com> wrote:
> 
> So fair enough, it does seem to be related to the lookup rather than 
> maintenance on the index. I was misguided in my initial assumption.  
> 
> Spent quite a bit of time trying to come up with a self contained test, and 
> it seems like I can't make it choose the GiST index unless I remove the 
> regular btree index in my test case, though the opposite is true for my table 
> in production.  Not really sure what that means as far as what I need to do 
> though. I've tried a vacuum full, analyze, rebuild index, drop and re-add the 
> constraint... It still uses that GiST index for this query.
> 
> Hell, a sequential scan is a ton faster even.
> 

As i understand it, postgres needs a way to find rows for update.
In explain analyze you provided, we see that it chose gist index for that.
And that is a poor chose. I think you need a proper btree index for update 
query to work properly fast. Like index on (product_id, company_id, date_range) 
WHERE upper(price_generated_test.active_range) IS NULL. 



> On Wed, Apr 13, 2016 at 2:54 PM, Evgeniy Shishkin <itparan...@gmail.com> 
> wrote:
> 
> > On 13 Apr 2016, at 20:14, Adam Brusselback <adambrusselb...@gmail.com> 
> > wrote:
> >
> > Sorry, brain stopped working and I forgot to include the normal info.
> >
> > Postgres version: 9.5.1
> > Hardware: 2 core, 4gb Digital Ocean virtual server
> > OS: Debian
> >
> > explain analyze for an example update:
> > 'Update on price_generated  (cost=32.45..644.83 rows=1 width=157) (actual 
> > time=29329.614..29329.614 rows=0 loops=1)'
> > '  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual 
> > time=29329.608..29329.608 rows=0 loops=1)'
> > '->  HashAggregate  (cost=32.04..34.35 rows=231 width=52) (actual 
> > time=1.137..2.090 rows=231 loops=1)'
> > '  Group Key: pti.product_id, pti.company_id, pti.date_range'
> > '  ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31 
> > rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
> > '->  Index Scan using 
> > price_generated_company_product_date_active_excl on price_generated  
> > (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0 
> > loops=231)'
> > '  Index Cond: (date_range = pti.date_range)'
> > '  Filter: ((upper(active_range) IS NULL) AND (pti.product_id = 
> > product_id) AND (pti.company_id = company_id))'
> > '  Rows Removed by Filter: 29460'
> > 'Planning time: 3.134 ms'
> > 'Execution time: 29406.717 ms'
> 
> Well, you see execution time of 30 seconds because there are 231 index 
> lookups,
> each taking 126 ms.
> 
> And that lookup is slow because of
> Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND 
> (pti.company_id = company_id))'
> 
> Can you provide self-containing example of update?
> I don't see there (upper(active_range) IS NULL condition is coming from.
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow update on column that is part of exclusion constraint

2016-04-13 Thread Evgeniy Shishkin

> On 13 Apr 2016, at 20:14, Adam Brusselback  wrote:
> 
> Sorry, brain stopped working and I forgot to include the normal info.
> 
> Postgres version: 9.5.1
> Hardware: 2 core, 4gb Digital Ocean virtual server
> OS: Debian 
> 
> explain analyze for an example update:
> 'Update on price_generated  (cost=32.45..644.83 rows=1 width=157) (actual 
> time=29329.614..29329.614 rows=0 loops=1)'
> '  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual 
> time=29329.608..29329.608 rows=0 loops=1)'
> '->  HashAggregate  (cost=32.04..34.35 rows=231 width=52) (actual 
> time=1.137..2.090 rows=231 loops=1)'
> '  Group Key: pti.product_id, pti.company_id, pti.date_range'
> '  ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31 
> rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
> '->  Index Scan using 
> price_generated_company_product_date_active_excl on price_generated  
> (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0 
> loops=231)'
> '  Index Cond: (date_range = pti.date_range)'
> '  Filter: ((upper(active_range) IS NULL) AND (pti.product_id = 
> product_id) AND (pti.company_id = company_id))'
> '  Rows Removed by Filter: 29460'
> 'Planning time: 3.134 ms'
> 'Execution time: 29406.717 ms'

Well, you see execution time of 30 seconds because there are 231 index lookups,
each taking 126 ms.

And that lookup is slow because of
Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND 
(pti.company_id = company_id))'

Can you provide self-containing example of update?
I don't see there (upper(active_range) IS NULL condition is coming from.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-19 Thread Evgeniy Shishkin

> On 16 Mar 2016, at 16:37, Tom Lane  wrote:
> 
> Andreas Joseph Krogh  writes:
>> 1. Why isnt' folder_id part of the index-cond?
> 
> Because a GIN index is useless for sorting.

I don't see how gin inability to return sorted data relates to index condition.
In fact i tried to reproduce the example,
and if i change folder_id to int from bigint, then index condition with 
folder_id is used

 Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))


> 
>> 2. Is there a way to make it use the (same) index to sort by 
>> received_timestamp?
> 
> No.
> 
>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
> 
> No.
> 
>> 4. It doesn't seem like ts_rank uses the index for sorting either.
> 
> Same reason.
> 
>   regards, tom lane
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-19 Thread Evgeniy Shishkin

> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparan...@gmail.com> wrote:
> 
> 
>> On 16 Mar 2016, at 16:37, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> 
>> Andreas Joseph Krogh <andr...@visena.com> writes:
>>> 1. Why isnt' folder_id part of the index-cond?
>> 
>> Because a GIN index is useless for sorting.
> 
> I don't see how gin inability to return sorted data relates to index 
> condition.
> In fact i tried to reproduce the example,
> and if i change folder_id to int from bigint, then index condition with 
> folder_id is used
> 
> Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
> 

Looks like documentation 
http://www.postgresql.org/docs/9.5/static/btree-gin.html
is lying about supporting int8 type

> 
>> 
>>> 2. Is there a way to make it use the (same) index to sort by 
>>> received_timestamp?
>> 
>> No.
>> 
>>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
>> 
>> No.
>> 
>>> 4. It doesn't seem like ts_rank uses the index for sorting either.
>> 
>> Same reason.
>> 
>>  regards, tom lane
>> 
>> 
>> -- 
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Evgeniy Shishkin

> On 16 Mar 2016, at 18:04, Evgeniy Shishkin <itparan...@gmail.com> wrote:
> 
>> 
>> On 16 Mar 2016, at 17:52, Evgeniy Shishkin <itparan...@gmail.com> wrote:
>> 
>> 
>>> On 16 Mar 2016, at 16:37, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> 
>>> Andreas Joseph Krogh <andr...@visena.com> writes:
>>>> 1. Why isnt' folder_id part of the index-cond?
>>> 
>>> Because a GIN index is useless for sorting.
>> 
>> I don't see how gin inability to return sorted data relates to index 
>> condition.
>> In fact i tried to reproduce the example,
>> and if i change folder_id to int from bigint, then index condition with 
>> folder_id is used
>> 
>>Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>> 
> 
> Looks like documentation 
> http://www.postgresql.org/docs/9.5/static/btree-gin.html
> is lying about supporting int8 type
> 

Uh, it works if i cast to bigint explicitly
  WHERE  del.fts_all @@ to_tsquery('simple', 'hi')
  AND del.folder_id = 1::bigint;
results in 
 Index Cond: ((folder_id = '1'::bigint) AND (fts_all @@ 
'''hi'''::tsquery))

>> 
>>> 
>>>> 2. Is there a way to make it use the (same) index to sort by 
>>>> received_timestamp?
>>> 
>>> No.
>>> 
>>>> 3. Using a GIN-index, is there a way to use the index at all for sorting?
>>> 
>>> No.
>>> 
>>>> 4. It doesn't seem like ts_rank uses the index for sorting either.
>>> 
>>> Same reason.
>>> 
>>> regards, tom lane
>>> 
>>> 
>>> -- 
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query that took a lot of time in Postgresql when not using trim in order by

2015-11-25 Thread Evgeniy Shishkin
> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show  yours execution plans with buffers i.e. explain 
> (analyze,buffers) ...
> 


Fast:

 Sort  (cost=193101.41..195369.80 rows=907357 width=129) (actual 
time=3828.176..3831.261 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, 
dim_cliente.a1_nome, dim_vendedor.a3_nome, 
(btrim((dim_cliente.tipocliente)::text))
   Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, 
dim_cliente.a1_cod, dim_cliente.a1_nome
   Sort Method: quicksort  Memory: 13121kB
   ->  HashAggregate  (cost=91970.52..103312.49 rows=907357 width=129) (actual 
time=2462.690..2496.729 rows=43615 loops=1)
 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, 
dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, 
btrim((dim_cliente.tipocliente)::text)
 ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual 
time=29.524..1533.880 rows=907357 loops=1)


Slow:

 Group  (cost=170417.48..184027.84 rows=907357 width=129) (actual 
time=36649.329..37235.158 rows=43615 loops=1)
   Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, 
dim_cliente.a1_nome, dim_vendedor.a3_nome
   ->  Sort  (cost=170417.48..172685.88 rows=907357 width=129) (actual 
time=36649.315..36786.760 rows=907357 loops=1)
 Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, 
dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
 Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, 
dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
 Sort Method: quicksort  Memory: 265592kB
 ->  Hash Join  (cost=856.30..80628.56 rows=907357 width=129) (actual 
time=26.719..1593.693 rows=907357 loops=1)


The difference is in the top of plans.
As we see, hashjoin time is practically the same. 
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.

I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2015-11-05 Thread Evgeniy Shishkin
Sorry for disrupting the thread,

i am wondering will it be possible to use BRIN indexes to better estimate 
distribution?

I mean create btree index and brin index,
probe brin during planning and estimate if abort early plan with btree will be 
better. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Scan Backward Slow

2015-05-01 Thread Evgeniy Shishkin

 On 01 May 2015, at 13:54, David Osborne da...@qcode.co.uk wrote:
 
 Hi,
 
 We have a query which finds the latest row_id for a particular code.
 
 We've found a backwards index scan is much slower than a forward one, to the 
 extent that disabling indexscan altogether actually improves the query time.
 
 Can anyone suggest why this might be, and what's best to do to improve the 
 query time?
 
 
 
 dev= \d table
Table public.table
 Column|  Type  | Modifiers 
 --++---
  row_id   | integer| 
  code | character(2)   | 
 Indexes:
 table_code_idx btree (code)
 table_row_idx btree (row_id)
 
 dev= select count(*) from table;
   count  
 -
  6090254
 (1 row)
 
 dev= select count(distinct(row_id)) from table;
   count  
 -
  5421022
 (1 row)
 
 dev= select n_distinct from pg_stats where tablename='table' and 
 attname='row_id';
  n_distinct 
 
   -0.762951
 (1 row)
 
 dev= show work_mem;
  work_mem  
 ---
  1249105kB
 (1 row)
 
 dev= select version();
version
 
 --
  PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 
 20120306 (Red Hat 4.6.3-2), 64-bit
 (1 row)
 
 
 The query in question:
 
 dev= explain (analyse,buffers)  select row_id as last_row_id from table 
 where code='XX' order by row_id desc limit 1;
   
 QUERY PLAN
   
 --
  Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=835.281..835.282 rows=1 
 loops=1)
Buffers: shared hit=187961
-  Index Scan Backward using table_row_idx on table  
 (cost=0.43..343741.98 rows=278731 width=4) (actual time=835.278..835.278 
 rows=1 loops=1)
  Filter: (code = 'XX'::bpchar)
  Rows Removed by Filter: 4050971
  Buffers: shared hit=187961
  Total runtime: 835.315 ms
 (7 rows)
 
 http://explain.depesz.com/s/uGC
 
 
 So we can see it's doing a backwards index scan. Out of curiosity I tried a 
 forward scan and it was MUCH quicker:
 
 dev= explain (analyse,buffers)  select row_id as first_row_id from table 
 where code='XX' order by row_id asc limit 1;
 QUERY 
 PLAN 
 ---
  Limit  (cost=0.43..1.67 rows=1 width=4) (actual time=19.473..19.474 rows=1 
 loops=1)
Buffers: shared hit=26730
-  Index Scan using table_row_idx on table  (cost=0.43..343741.98 
 rows=278731 width=4) (actual time=19.470..19.470 rows=1 loops=1)
  Filter: (code = 'XX'::bpchar)
  Rows Removed by Filter: 62786
  Buffers: shared hit=26730
  Total runtime: 19.509 ms
 (7 rows)
 
 http://explain.depesz.com/s/ASxD
 
 
 I thought adding a index on row_id desc might be the answer but it has little 
 effect:
 
 dev= create index row_id_desc_idx on table(row_id desc);
 CREATE INDEX
 Time: 5293.812 ms
 
 dev= explain (analyse,buffers)  select row_id as last_row_id from table 
 where code='XX' order by row_id desc limit 1;
  
 QUERY PLAN
  
 
  Limit  (cost=0.43..1.66 rows=1 width=4) (actual time=944.666..944.667 rows=1 
 loops=1)
Buffers: shared hit=176711 read=11071
-  Index Scan using row_id_desc_idx on table  (cost=0.43..342101.98 
 rows=278731 width=4) (actual time=944.663..944.663 rows=1 loops=1)
  Filter: (code = 'XX'::bpchar)
  Rows Removed by Filter: 4050971
  Buffers: shared hit=176711 read=11071
  Total runtime: 944.699 ms
 (7 rows)
 
 http://explain.depesz.com/s/JStM
 
 In fact, disabling the index scan completely improves matters considerably:
 
 dev= drop index row_id_desc_idx;
 DROP INDEX
 dev= set enable_indexscan to off;
 SET   
 
 dev= explain (analyse,buffers)  select row_id as last_row_id from table 
 where code='XX' order by row_id desc limit 1;
QUERY 
 PLAN   
 

Re: [PERFORM] Tuning the configuration

2014-12-16 Thread Evgeniy Shishkin

 On 16 Dec 2014, at 14:51, Graeme B. Bell g...@skogoglandskap.no wrote:
 
 
 I don't understand the logic behind using drives, 
 which are best for random io, for sequent io workloads.
 
 Because they are also best for sequential IO. I get 1.3-1.4GB/second from 4 
 SSDs in RAID or 500MB/s for single disk systems, even with cheap models. 
 Are you getting more than that from high-end spinning rust?


I better use ssd for random iops when database doesn't fit in ram.
For wal logs i use raid with bbu cache and couple of sas drives.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning the configuration

2014-12-11 Thread Evgeniy Shishkin

 On 11 Dec 2014, at 15:02, Andrea Suisani sick...@opinioni.net wrote:
 
 On 12/10/2014 11:44 AM, Maila Fatticcioni wrote:
 2- I would like to use the two SDD to store the wal file. Do you think
 it is useful or how should I use them?
 
 I definitely would give it a try.
 


I don't understand the logic behind using drives, 
which are best for random io, for sequent io workloads.

Better use 10k sas with BBU raid for wal, money wise.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join with order by on both tables attributes

2014-08-08 Thread Evgeniy Shishkin

 On 08 Aug 2014, at 16:29, Marti Raudsepp ma...@juffo.org wrote:
 
 On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin itparan...@gmail.com wrote:
 select * from users join  notifications on 
 users.id=notifications.user_id ORDER BY users.priority desc 
 ,notifications.priority desc limit 10;
 
 In my understanding, i need to have two indexes
 on users(priority desc, id)
 and notifications(user_id, priority desc)
 
 And actually with this kind of query we really want the most wanted 
 notifications, by the user.
 So we really can rewrite to order by users.priority desc, id asc, 
 notifications.priority desc according to business logic.
 
 You can rewrite it with LATERAL to trick the planner into sorting each
 user's notifications separately. This should give you the nestloop
 plan you expect:
 
 SELECT *
 FROM users,
 LATERAL (
  SELECT * FROM notifications WHERE notifications.user_id=users.id
  ORDER BY notifications.priority DESC
 ) AS notifications
 ORDER BY users.priority DESC, users.id
 

Thank you very much.


 It would be great if Postgres could do this transformation automatically.
 
 There's a partial sort patch in the current CommitFest, which would
 solve the problem partially (it could use the index on users, but the
 notifications sort would have to be done in memory still).
 https://commitfest.postgresql.org/action/patch_view?id=1368
 
 Regards,
 Marti



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] two table join with order by on both tables attributes

2014-08-07 Thread Evgeniy Shishkin
Hello,

suppose you have two very simple tables with fk dependency, by which we join 
them
and another attribute for sorting

like this
select * from users join  notifications on users.id=notifications.user_id ORDER 
BY users.priority desc ,notifications.priority desc limit 10;

Very typical web query.

No matter which composite indexes i try, postgresql can not make efficient 
nested loop plan using indexes.
It chooses all sorts of seq scans and hash joins or merge join and always a 
sort node and then a limit 10.

Neither plan provides acceptable performance. And tables tend to grow =\

Can anybody suggest something or explain this behavior?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join with order by on both tables attributes

2014-08-07 Thread Evgeniy Shishkin
My question was about that you can not have fast execution of this kind of 
query in postgresql.
With any runtime configuration you just swith from seq scan and hash join to 
merge join, and then you have a sort node.

In my understanding, i need to have two indexes
on users(priority desc, id)
and notifications(user_id, priority desc)

then postgresql would choose nested loop and get sorted data from indexes.
But it wont. 

I don't understand why.

Do you have any schema and GUCs which performs this kind of query well?

Sorry for top posting.  

 Can you explain why a nested loop is best for your data?  Given my
 understanding of an expected prioritycardinality I would expect your ORDER
 BY to be extremely inefficient and not all that compatible with a nested
 loop approach.
 
 You can use the various parameters listed on this page to force the desired
 plan and then provide EXPLAIN ANALYZE results for the various executed plans
 and compare them.
 
 http://www.postgresql.org/docs/9.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE
 
 And now for the obligatory read this link:
 
 https://wiki.postgresql.org/wiki/SlowQueryQuestions
 
 If you can show that in fact the nested loop (or some other plan) performs
 better than the one chosen by the planner - and can provide data that the
 developers can use to replicate the experiment - then improvements can be
 made.  At worse you will come to understand why the planner is right and can
 then explore alternative models.
 
 David J.
 
 
 
 
 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/two-table-join-with-order-by-on-both-tables-attributes-tp5814135p5814137.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join with order by on both tables attributes

2014-08-07 Thread Evgeniy Shishkin
 select * from users join  notifications on users.id=notifications.user_id 
 ORDER BY users.priority desc ,notifications.priority desc limit 10;
 
 In my understanding, i need to have two indexes
 on users(priority desc, id)
 and notifications(user_id, priority desc)
 then postgresql would choose nested loop and get sorted data from indexes.
 But it wont. 
 
 Indeed.  If you think a bit harder, you'll realize that the plan you
 suggest would *not* produce the sort order requested by this query.
 It would (if I'm not confused myself) produce an ordering like
   users.priority desc, id asc, notifications.priority desc
 which would only match what the query asks for if there's just a single
 value of id per users.priority value.
 
 Offhand I think that the planner will not recognize a nestloop as
 producing a sort ordering of this kind even if the query did request the
 right ordering.  That could perhaps be improved, but I've not seen many
 if any cases where it would be worth the trouble.


Thanks Tom, you are right.

But may be some sort of skip index scan ala loose index scan will help with 
index on notifications(priority desc,user_id)?

I know that this is currently not handled by native executors.
May by i can work around this using WITH RECURSIVE query?

Also, are there any plans to handle  loose index scan in the upcoming release?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join with order by on both tables attributes

2014-08-07 Thread Evgeniy Shishkin

 On 08 Aug 2014, at 03:43, Evgeniy Shishkin itparan...@gmail.com wrote:
 
 select * from users join  notifications on users.id=notifications.user_id 
 ORDER BY users.priority desc ,notifications.priority desc limit 10;
 
 In my understanding, i need to have two indexes
 on users(priority desc, id)
 and notifications(user_id, priority desc)
 then postgresql would choose nested loop and get sorted data from indexes.
 But it wont. 
 
 Indeed.  If you think a bit harder, you'll realize that the plan you
 suggest would *not* produce the sort order requested by this query.
 It would (if I'm not confused myself) produce an ordering like
  users.priority desc, id asc, notifications.priority desc
 which would only match what the query asks for if there's just a single
 value of id per users.priority value.
 
 Offhand I think that the planner will not recognize a nestloop as
 producing a sort ordering of this kind even if the query did request the
 right ordering.  That could perhaps be improved, but I've not seen many
 if any cases where it would be worth the trouble.
 

And actually with this kind of query we really want the most wanted 
notifications, by the user.
So we really can rewrite to order by users.priority desc, id asc, 
notifications.priority desc according to business logic.
And we will benefit if this case would be improved.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slave wal is ahead of master

2014-03-19 Thread Evgeniy Shishkin
Hello, 

we have 3 servers with postgresql 9.3.3. One is master and two slaves.
We run synchronous_replication and fsync, synchronous_commit and 
full_page_writes are on.

Suddenly master hang up with hardware failure, it is a strange bug in iLo which 
we investigate with HP.

Before master was rebooted, i ran ps aux on slave
postgres: wal receiver process   streaming 12/F1031DF8

Last messages in slaves logs was
2014-03-19 02:41:29.005 GMT,,,7389,,53108c69.1cdd,16029,,2014-02-28 13:17:29 
GMT,,0,LOG,0,recovery restart point at 12/DFFBB3E8,last completed 
transaction was at log time 2014-03-19 02:41:28.886869+00

and then there was silence, because master hang.

Then master was rebooted and slave wrote in log
2014-03-19 15:36:39.176 GMT,,,7392,,53108c69.1ce0,2,,2014-02-28 13:17:29 
GMT,,0,FATAL,XX000,terminating walreceiver due to timeout,
2014-03-19 15:36:39.177 GMT,,,7388,,53108c69.1cdc,6,,2014-02-28 13:17:29 
GMT,1/0,0,LOG,0,record with zero length at 12/F1031DF8,
2014-03-19 15:36:57.181 GMT,,,12100,,5329b996.2f44,1,,2014-03-19 15:36:54 
GMT,,0,FATAL,XX000,could not connect to the primary server: could not connect 
to server: No route to host
Is the server running on host 10.162.2.50 and accepting
TCP/IP connections on port 5432?
,

Then master finally came back, slave wrote
2014-03-19 15:40:09.389 GMT,,,13121,,5329ba59.3341,1,,2014-03-19 15:40:09 
GMT,,0,FATAL,XX000,could not connect to the primary server: FATAL:  the 
database system is starting up
,
2014-03-19 15:40:16.468 GMT,,,13136,,5329ba5e.3350,1,,2014-03-19 15:40:14 
GMT,,0,LOG,0,started streaming WAL from primary at 12/F100 on timeline 
1,
2014-03-19 15:40:16.468 GMT,,,13136,,5329ba5e.3350,2,,2014-03-19 15:40:14 
GMT,,0,FATAL,XX000,could not receive data from WAL stream: ERROR:  requested 
starting point 12/F100 is ahead of the WAL flush position of this server 
12/F0FFFCE8
,

last message was repeated several times
and then this happened

2014-03-19 15:42:04.623 GMT,,,13722,,5329bacc.359a,1,,2014-03-19 15:42:04 
GMT,,0,LOG,0,started streaming WAL from primary at 12/F100 on timeline 
1,
2014-03-19 15:42:04.628 GMT,,,7388,,53108c69.1cdc,7,,2014-02-28 13:17:29 
GMT,1/0,0,LOG,0,invalid record length at 12/F1031DF8,
2014-03-19 15:42:04.628 GMT,,,13722,,5329bacc.359a,2,,2014-03-19 15:42:04 
GMT,,0,FATAL,57P01,terminating walreceiver process due to administrator 
command,
2014-03-19 15:42:09.628 GMT,,,7388,,53108c69.1cdc,8,,2014-02-28 13:17:29 
GMT,1/0,0,LOG,0,invalid record length at 12/F1031DF8,
2014-03-19 15:42:14.628 GMT,,,7388,,53108c69.1cdc,9,,2014-02-28 13:17:29 
GMT,1/0,0,LOG,0,invalid record length at 12/F1031DF8,
2014-03-19 15:42:19.628 GMT,,,7388,,53108c69.1cdc,10,,2014-02-28 13:17:29 
GMT,1/0,0,LOG,0,invalid record length at 12/F1031DF8,” 
and it just repeats forever.


Meanwhile on master
2014-03-19 15:39:30.957 GMT,,,7115,,5329ba32.1bcb,2,,2014-03-19 15:39:30 
GMT,,0,LOG,0,database system was not properly shut down; automatic 
recovery in progress,
2014-03-19 15:39:30.989 GMT,,,7115,,5329ba32.1bcb,3,,2014-03-19 15:39:30 
GMT,,0,LOG,0,redo starts at 12/DFFBB3E8,
2014-03-19 15:39:47.114 GMT,,,7115,,5329ba32.1bcb,4,,2014-03-19 15:39:30 
GMT,,0,LOG,0,redo done at 12/F0FFFC38,
2014-03-19 15:39:47.114 GMT,,,7115,,5329ba32.1bcb,5,,2014-03-19 15:39:30 
GMT,,0,LOG,0,last completed transaction was at log time 2014-03-19 
05:02:29.273138+00,
2014-03-19 15:39:47.115 GMT,,,7115,,5329ba32.1bcb,6,,2014-03-19 15:39:30 
GMT,,0,LOG,0,checkpoint starting: end-of-recovery immediate,
2014-03-19 15:40:16.466 
GMT,replicator,,7986,10.162.2.52:44336,5329ba5e.1f32,1,idle,2014-03-19 
15:40:14 GMT,2/0,0,ERROR,XX000,requested starting point 12/F100 is ahead 
of the WAL flush position of this server 12/F0FFFCE8,walreceiver

So, all two slaves are disconnected from master, which somehow is past his 
slaves.

I decided to promote one of the slaves, so we can have some snapshot of the 
data.
relevant logs from this are 
2014-03-19 16:50:43.118 GMT,,,,,5329cae3.115c,3,,2014-03-19 16:50:43 
GMT,,0,LOG,0,redo starts at 12/DFFBB3E8,
2014-03-19 16:50:50.028 
GMT,dboperator,postgres,4452,[local],5329caea.1164,1,,2014-03-19 
16:50:50 GMT,,0,FATAL,57P03,the database system is starting up,
2014-03-19 16:50:51.128 GMT,,,,,5329cae3.115c,4,,2014-03-19 16:50:43 
GMT,,0,LOG,0,invalid contrecord length 5736 at 12/F0FFFC80,
2014-03-19 16:50:51.128 GMT,,,,,5329cae3.115c,5,,2014-03-19 16:50:43 
GMT,,0,LOG,0,redo done at 12/F0FFFC38,””

It is interesting that redo done at 12/F0FFFC38 both on master and promoted 
slave.

The main question is there is actual latest data, and how is it possible that 
master is behind his slave in synchronous replication.

Thanks for the help.

-- 
Sent via 

Re: [PERFORM] Query taking long time

2014-03-10 Thread Evgeniy Shishkin

On 07 Mar 2014, at 13:18, acanada acan...@cnio.es wrote:

 The table entity2document2 has 30GB. In consecutive runs it gets much 
 better... 30ms apron.

So you just benchmarking your hard drives with random iops.

You need more ram and faster disks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query taking long time

2014-03-07 Thread Evgeniy Shishkin

 Hello Mat,
 
 Setting enable_bitmapscan to off doesn't really helps. It gets worse...
 
 x= SET enable_bitmapscan=off; 
 SET
 x= explain analyze select * from (select * from entity2document2  where 
 name='ranitidine' ) as a  order by a.hepval;

 QUERY PLAN

 
  Sort  (cost=18789.21..18800.70 rows=4595 width=131) (actual 
 time=79965.282..79966.657 rows=13512 loops=1)
Sort Key: entity2document2.hepval
Sort Method:  quicksort  Memory: 2301kB
-  Index Scan using entity2document2_name on entity2document2  
 (cost=0.00..18509.70 rows=4595 width=131) (actual time=67.507..79945.362 
 rows=13512 loops=1)
  Index Cond: ((name)::text = 'ranitidine'::text)
  Total runtime: 79967.705 ms
 (6 rows)
 
 Any other idea? 
 

Please post your hw configuration. I think that your db is on disk and they are 
slow.



 Thank you very much for your help. Regards,
 Andrés
 
 El Mar 6, 2014, a las 2:11 PM, desmodemone escribió:
 
 
 Il 05/mar/2014 00:36 Venkata Balaji Nagothi vbn...@gmail.com ha scritto:
 
  After looking at the distinct values, yes the composite Index on name 
  and hepval is not recommended. That would worsen - its expected.
 
  We need to look for other possible work around. Please drop off the above 
  Index. Let me see if i can drill further into this.
 
  Meanwhile - can you help us know the memory parameters (work_mem, 
  temp_buffers etc) set ?
 
  Do you have any other processes effecting this query's performance ?
 
  Any info about your Disk, RAM, CPU would also help.
 
  Regards,
  Venkata Balaji N
 
  Fujitsu Australia
 
 
 
 
  Venkata Balaji N
 
  Sr. Database Administrator
  Fujitsu Australia
 
 
  On Tue, Mar 4, 2014 at 10:23 PM, acanada acan...@cnio.es wrote:
 
  Hello,
 
  I don't know if this helps to figure out what is the problem but after 
  adding the multicolumn index on name and hepval, the performance is even 
  worse (¿?).  Ten times worse...
 
  explain analyze select * from (select * from entity_compounddict2document 
   where name='progesterone') as a order by a.hepval;
   
  QUERY PLAN

  -
   Sort  (cost=422746.18..423143.94 rows=159104 width=133) (actual 
  time=95769.674..95797.943 rows=138165 loops=1)
 Sort Key: entity_compounddict2document.hepval
 Sort Method:  quicksort  Memory: 25622kB
 -  Bitmap Heap Scan on entity_compounddict2document  
  (cost=3501.01..408999.90 rows=159104 width=133) (actual 
  time=70.789..95519.258 rows=138165 loops=1)
   Recheck Cond: ((name)::text = 'progesterone'::text)
   -  Bitmap Index Scan on entity_compound2document_name  
  (cost=0.00..3461.23 rows=159104 width=0) (actual time=35.174..35.174 
  rows=138165 loops=1)
 Index Cond: ((name)::text = 'progesterone'::text)
   Total runtime: 95811.838 ms
  (8 rows)
 
  Any ideas please?
 
  Thank you 
  Andrés.
 
 
 
  El Mar 4, 2014, a las 12:28 AM, Venkata Balaji Nagothi escribió:
 
  On Mon, Mar 3, 2014 at 9:17 PM, acanada acan...@cnio.es wrote:
 
  Hello,
 
  Thankyou for your answer.
  I have made more changes than a simple re-indexing recently. I have 
  moved the sorting field to the table in order to avoid the join clause. 
  Now the schema is very simple. The query only implies one table:
 
  x= \d+ entity_compounddict2document;
Table public.entity_compounddict2document
Column  |  Type  | Modifiers | 
  Storage  | Description 
  --++---+--+-
   id   | integer| not null  | plain  
| 
   document_id  | integer|   | plain  
| 
   name | character varying(255) |   | 
  extended | 
   qualifier| character varying(255) |   | 
  extended | 
   tagMethod| character varying(255) |   | 
  extended | 
   created  | timestamp(0) without time zone |   | plain  
| 
   updated  | timestamp(0) without time zone |   | plain  
| 
   curation | integer|   | plain  
| 
   hepval   | double precision   |   | plain  
| 
   cardval  | double precision   |   | plain  
| 
   nephval  | double precision

Re: [PERFORM] Query taking long time

2014-03-07 Thread Evgeniy Shishkin

On 07 Mar 2014, at 12:46, acanada acan...@cnio.es wrote:

 
 El Mar 7, 2014, a las 10:39 AM, Evgeniy Shishkin escribió:
 
 
 Hello Mat,
 
 Setting enable_bitmapscan to off doesn't really helps. It gets worse...
 
 x= SET enable_bitmapscan=off; 
 SET
 x= explain analyze select * from (select * from entity2document2  where 
 name='ranitidine' ) as a  order by a.hepval;
  
 QUERY PLAN  
  
 
 Sort  (cost=18789.21..18800.70 rows=4595 width=131) (actual 
 time=79965.282..79966.657 rows=13512 loops=1)
  Sort Key: entity2document2.hepval
  Sort Method:  quicksort  Memory: 2301kB
  -  Index Scan using entity2document2_name on entity2document2  
 (cost=0.00..18509.70 rows=4595 width=131) (actual time=67.507..79945.362 
 rows=13512 loops=1)
Index Cond: ((name)::text = 'ranitidine'::text)
 Total runtime: 79967.705 ms
 (6 rows)
 
 Any other idea? 
 
 
 Please post your hw configuration. I think that your db is on disk and they 
 are slow.
 
 The server has 2 processors quadcore, 10GB of RAM and data is located in a 
 fiber disk of 2TB. It doesn't seem to be the problem… 

And your database size is?

Also do this timings get better in consecutive runs? 


 

 Thank you
 
 Andrés
 
 
 
 
 Thank you very much for your help. Regards,
 Andrés
 
 El Mar 6, 2014, a las 2:11 PM, desmodemone escribió:
 
 
 Il 05/mar/2014 00:36 Venkata Balaji Nagothi vbn...@gmail.com ha 
 scritto:
 
 After looking at the distinct values, yes the composite Index on name 
 and hepval is not recommended. That would worsen - its expected.
 
 We need to look for other possible work around. Please drop off the above 
 Index. Let me see if i can drill further into this.
 
 Meanwhile - can you help us know the memory parameters (work_mem, 
 temp_buffers etc) set ?
 
 Do you have any other processes effecting this query's performance ?
 
 Any info about your Disk, RAM, CPU would also help.
 
 Regards,
 Venkata Balaji N
 
 Fujitsu Australia
 
 
 
 
 Venkata Balaji N
 
 Sr. Database Administrator
 Fujitsu Australia
 
 
 On Tue, Mar 4, 2014 at 10:23 PM, acanada acan...@cnio.es wrote:
 
 Hello,
 
 I don't know if this helps to figure out what is the problem but after 
 adding the multicolumn index on name and hepval, the performance is even 
 worse (¿?). Ten times worse...
 
 explain analyze select * from (select * from 
 entity_compounddict2document  where name='progesterone') as a order by 
 a.hepval;

 QUERY PLAN   

 -
 Sort  (cost=422746.18..423143.94 rows=159104 width=133) (actual 
 time=95769.674..95797.943 rows=138165 loops=1)
  Sort Key: entity_compounddict2document.hepval
  Sort Method:  quicksort  Memory: 25622kB
  -  Bitmap Heap Scan on entity_compounddict2document  
 (cost=3501.01..408999.90 rows=159104 width=133) (actual 
 time=70.789..95519.258 rows=138165 loops=1)
Recheck Cond: ((name)::text = 'progesterone'::text)
-  Bitmap Index Scan on entity_compound2document_name  
 (cost=0.00..3461.23 rows=159104 width=0) (actual time=35.174..35.174 
 rows=138165 loops=1)
  Index Cond: ((name)::text = 'progesterone'::text)
 Total runtime: 95811.838 ms
 (8 rows)
 
 Any ideas please?
 
 Thank you 
 Andrés.
 
 
 
 El Mar 4, 2014, a las 12:28 AM, Venkata Balaji Nagothi escribió:
 
 On Mon, Mar 3, 2014 at 9:17 PM, acanada acan...@cnio.es wrote:
 
 Hello,
 
 Thankyou for your answer.
 I have made more changes than a simple re-indexing recently. I have 
 moved the sorting field to the table in order to avoid the join 
 clause. Now the schema is very simple. The query only implies one 
 table:
 
 x= \d+ entity_compounddict2document;
 Table public.entity_compounddict2document
 Column  |  Type  | Modifiers | Storage 
  | Description 
 --++---+--+-
 id   | integer| not null  | plain  
   | 
 document_id  | integer|   | plain  
   | 
 name | character varying(255) |   | 
 extended | 
 qualifier| character varying(255) |   | 
 extended | 
 tagMethod| character varying(255) |   | 
 extended | 
 created  | timestamp(0) without time zone |   | plain  
   | 
 updated  | timestamp(0) without time zone |   | plain  
   | 
 curation

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Evgeniy Shishkin




On 28.05.2013, at 2:17, John Mudd johnbm...@gmail.com wrote:

 Thanks again.
 
 Well, I have two problems with using the CLUSTER option. It's only temporary 
 since any updates, depending how much free space is reserved per page, 
 requires re-running the CLUSTER. And my primary concern is that it 
 arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's 
 easy to test so here are the results. The primary key still looses even with 
 the CLUSTER. Granted it is close but considering this is now an unfair 
 comparison it still doesn't make sense to me. How can a search for a specific 
 row that should be fairly straight forward take longer than a search that 
 includes an ORDER BY clause?
 

Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated 
the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on 
different values but the same rows? Because now it is two different tuples and 
you count every io.

 
 test=# CLUSTER test_select USING test_select_pkey ;
 CLUSTER
 test=# VACUUM ANALYZE test_select ;
 VACUUM
 
 (stopped postgres; reset O/S cache; started postgres)
 
 
 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 ORDER 
 BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN
 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 
 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41938.15 
 rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 19.526 ms
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 
 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN 

 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
 width=21) (actual time=21.070..21.072 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 21.178 ms
 
 
 
 
 On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin itparan...@gmail.com 
 wrote:
 
 On May 27, 2013, at 6:35 PM, John Mudd johnbm...@gmail.com wrote:
 
 Thanks, that's easy enough to test. Didn't seem to help though.
 
 Ok. And if you CLUSTER tables USING PK?
 
 
 test=# REINDEX index test_select_pkey;
 REINDEX
 test=# VACUUM ANALYZE test_select ;
 VACUUM
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 ORDER 
 BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN
   
 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 
 rows=1 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41981.16 
 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 16.444 ms
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN
 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
 width=21) (actual time=23.072..23.074 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 23.192 ms
 
 
 
 
 On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin itparan...@gmail.com 
 wrote:
 
 On May 27, 2013, at 6:02 PM, John Mudd johnbm...@gmail.com wrote:
 
  Postgres 9.1.2 on Ubuntu 12.04
 
  Any reason why a select by primary key would be slower than a select 
  that includes an ORDER BY? I was really hoping using the primary key 
  would give me a boost.
 
 
 You created my_key after data loading, and PK was there all the time.
 If you REINDEX PK, i bet it will be as fast.
 
  I stopped the server and cleared the O/S cache using sync; echo 3  
  /proc/sys/vm/drop_caches between the runs.
 
 
 
  test=# VACUUM ANALYZE test_select;
  VACUUM
 
  (stopped postgres; reset O/S cache; started postgres)
 
  test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 
  ORDER BY key1, key2, key3, id LIMIT 1;
  

Re: [PERFORM] autovacuum fringe case?

2013-01-23 Thread Evgeniy Shishkin




On 23.01.2013, at 20:53, AJ Weber awe...@comcast.net wrote:

 I have a server that is IO-bound right now (it's 4 cores, and top indicates 
 the use rarely hits 25%, but the Wait spikes above 25-40% regularly).  The 
 server is running postgresql 9.0 and tomcat 6.  As I have mentioned in a 
 previous thread, I can't alter the hardware to add disks unfortunately, so 
 I'm going to try and move postgresql off this application server to its own 
 host, but this is a production environment, so in the meantime...
 
 Is it possible that some spikes in IO could be attributable to the autovacuum 
 process?  Is there a way to check this theory?
 

Try iotop

 Would it be advisable (or even permissible to try/test) to disable 
 autovacuum, and schedule a manual vacuumdb in the middle of the night, when 
 this server is mostly-idle?
 
 Thanks for any tips.  I'm in a bit of a jam with my limited hardware.
 
 -AJ
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance