Re: [PERFORM] Plan differences

2016-01-04 Thread Adam Pearson
Hello Anton,
Changing the locale to anything other than C or POSIX will have 
a performance overhead.  I’m pretty sure that just declaring the locale on the 
indexes is just like plastering over the cracks.

Is it possible to reload the database with the same locale as the original 
database server?

Regards,
Adam


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Anton Melser
Sent: 01 January 2016 5:13 PM
To: Tom Lane
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Plan differences

Declaring new indexes with COLLATE "C" and removing the old indexes fixed the 
like problem but it created a another - the > and < queries need a sort before 
passing off the the new index. Having two indexes seems to give me the best of 
both worlds, though obviously it's taking up (much) more space. As space isn't 
ever likely to be a problem, and there are no updates (only copy) to these 
tables, I'll keep it like this to avoid having to reload the entire DB.

I spoke a little soon - while many of the simple queries are now hitting the 
indexes, some of the more complicated ones are still producing substantially 
inferior plans, even after reloading the whole DB with an identical lc_collate 
and lc_ctype. Here are the plans on the original server and the new server 
(identical collations, lctypes and index types - btree C). I have been 
experimenting (accepted = accepted2, idx_accepted2_mid = idx_accepted_mid, 
etc.) and the tables no longer have exactly the same data but there is nothing 
substantially different (a few days of data more with about a year total). The 
oldserver query is actually working on about 3x the amount of data - I tried 
reducing the amounts on the new server to get done in memory but it didn't seem 
to help the plan.

 HashAggregate  (cost=3488512.43..3496556.16 rows=536249 width=143) (actual 
time=228467.924..229026.799 rows=1426351 loops=1)
   Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text), 
a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE fff.column1 
END, a.column2
   ->  Merge Left Join  (cost=110018.15..3072358.66 rows=23780215 width=143) 
(actual time=3281.993..200563.177 rows=23554638 loops=1)
 Merge Cond: ((a.message_id)::text = (fff.message_id)::text)
 ->  Merge Left Join  (cost=110017.58..2781199.04 rows=23780215 
width=136) (actual time=3281.942..157385.338 rows=23554636 loops=1)
   Merge Cond: ((a.message_id)::text = (d.message_id)::text)
   ->  Index Scan using idx_accepted2_mid on accepted a  
(cost=0.70..2226690.13 rows=23780215 width=83) (actual time=3.690..73048.662 
rows=23554632 loops=1)
 Index Cond: ((message_id)::text > '20151130'::text)
 Filter: (((mrid)::text <> 
'----'::text) AND ((mrid)::text <> 'BAT'::text) 
AND ((column2)::text <> 'text1'::text) AND ((column2)::text !~~ 
'text2.%'::text))
 Rows Removed by Filter: 342947
   ->  Index Scan using idx_delivered2_mid on delivered d  
(cost=110016.89..482842.01 rows=3459461 width=53) (actual 
time=3278.245..64031.033 rows=23666434 loops=1)
 Index Cond: ((message_id)::text > '20151130'::text)
 Filter: (NOT (hashed SubPlan 1))
 Rows Removed by Filter: 443
 SubPlan 1
   ->  Index Scan using idx_failed2_mid on failed ff  
(cost=0.57..109953.48 rows=25083 width=46) (actual time=0.041..3124.642 
rows=237026 loops=1)
 Index Cond: ((message_id)::text > '20151130'::text)
 Filter: ((severity)::text = 'permanent'::text)
 Rows Removed by Filter: 5080519
 ->  Index Scan using idx_failed2_mid on failed fff  
(cost=0.57..112718.27 rows=25083 width=53) (actual time=0.034..4861.762 
rows=236676 loops=1)
   Index Cond: ((message_id)::text > '20151130'::text)
   Filter: ((severity)::text = 'permanent'::text)
   Rows Removed by Filter: 5080519
 Planning time: 2.039 ms
 Execution time: 229076.361 ms


 HashAggregate  (cost=7636055.05..7640148.23 rows=272879 width=143) (actual 
time=488739.376..488915.545 rows=403741 loops=1)
   Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text), 
a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE fff.column1 
END, a.column2
   ->  Hash Right Join  (cost=5119277.32..7528101.45 rows=6168777 width=143) 
(actual time=271256.212..480958.460 rows=6516196 loops=1)
 Hash Cond: ((d.message_id)::text = (a.message_id)::text)
 ->  Bitmap Heap Scan on delivered2 d  (cost=808012.86..3063311.98 
rows=3117499 width=53) (actual time=7012.487..194557.307 rows=6604970 loops=1)
   Recheck Cond: ((message_id):

Re: [PERFORM] Plan differences

2016-01-04 Thread Anton Melser
Hi,


>Changing the locale to anything other than C or POSIX will
> have a performance overhead.  I’m pretty sure that just declaring the
> locale on the indexes is just like plastering over the cracks.
>
>
>
> Is it possible to reload the database with the same locale as the original
> database server?
>

Sorry, I wasn't clear - I did end up recreating the DB with lc_collate =
"C" and lc_ctype = "C" and loading all data and the plans are for this
situation (i.e., both are now the same, "C" everywhere) Maybe it is just a
case of optimisations being removed in the RC?

Cheers,
Anton


Re: [PERFORM] Plan differences

2016-01-01 Thread Anton Melser
>
> Declaring new indexes with COLLATE "C" and removing the old indexes fixed
>> the like problem but it created a another - the > and < queries need a sort
>> before passing off the the new index. Having two indexes seems to give me
>> the best of both worlds, though obviously it's taking up (much) more space.
>> As space isn't ever likely to be a problem, and there are no updates (only
>> copy) to these tables, I'll keep it like this to avoid having to reload the
>> entire DB.
>
>
I spoke a little soon - while many of the simple queries are now hitting
the indexes, some of the more complicated ones are still producing
substantially inferior plans, even after reloading the whole DB with an
identical lc_collate and lc_ctype. Here are the plans on the original
server and the new server (identical collations, lctypes and index types -
btree C). I have been experimenting (accepted = accepted2,
idx_accepted2_mid = idx_accepted_mid, etc.) and the tables no longer have
exactly the same data but there is nothing substantially different (a few
days of data more with about a year total). The oldserver query is actually
working on about 3x the amount of data - I tried reducing the amounts on
the new server to get done in memory but it didn't seem to help the plan.

 HashAggregate  (cost=3488512.43..3496556.16 rows=536249 width=143) (actual
time=228467.924..229026.799 rows=1426351 loops=1)
   Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text),
a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE
fff.column1 END, a.column2
   ->  Merge Left Join  (cost=110018.15..3072358.66 rows=23780215
width=143) (actual time=3281.993..200563.177 rows=23554638 loops=1)
 Merge Cond: ((a.message_id)::text = (fff.message_id)::text)
 ->  Merge Left Join  (cost=110017.58..2781199.04 rows=23780215
width=136) (actual time=3281.942..157385.338 rows=23554636 loops=1)
   Merge Cond: ((a.message_id)::text = (d.message_id)::text)
   ->  Index Scan using idx_accepted2_mid on accepted a
 (cost=0.70..2226690.13 rows=23780215 width=83) (actual
time=3.690..73048.662 rows=23554632 loops=1)
 Index Cond: ((message_id)::text > '20151130'::text)
 Filter: (((mrid)::text <>
'----'::text) AND ((mrid)::text <>
'BAT'::text) AND ((column2)::text <> 'text1'::text) AND ((column2)::text
!~~ 'text2.%'::text))
 Rows Removed by Filter: 342947
   ->  Index Scan using idx_delivered2_mid on delivered d
 (cost=110016.89..482842.01 rows=3459461 width=53) (actual
time=3278.245..64031.033 rows=23666434 loops=1)
 Index Cond: ((message_id)::text > '20151130'::text)
 Filter: (NOT (hashed SubPlan 1))
 Rows Removed by Filter: 443
 SubPlan 1
   ->  Index Scan using idx_failed2_mid on failed ff
 (cost=0.57..109953.48 rows=25083 width=46) (actual time=0.041..3124.642
rows=237026 loops=1)
 Index Cond: ((message_id)::text >
'20151130'::text)
 Filter: ((severity)::text = 'permanent'::text)
 Rows Removed by Filter: 5080519
 ->  Index Scan using idx_failed2_mid on failed fff
 (cost=0.57..112718.27 rows=25083 width=53) (actual time=0.034..4861.762
rows=236676 loops=1)
   Index Cond: ((message_id)::text > '20151130'::text)
   Filter: ((severity)::text = 'permanent'::text)
   Rows Removed by Filter: 5080519
 Planning time: 2.039 ms
 Execution time: 229076.361 ms


 HashAggregate  (cost=7636055.05..7640148.23 rows=272879 width=143) (actual
time=488739.376..488915.545 rows=403741 loops=1)
   Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text),
a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE
fff.column1 END, a.column2
   ->  Hash Right Join  (cost=5119277.32..7528101.45 rows=6168777
width=143) (actual time=271256.212..480958.460 rows=6516196 loops=1)
 Hash Cond: ((d.message_id)::text = (a.message_id)::text)
 ->  Bitmap Heap Scan on delivered2 d  (cost=808012.86..3063311.98
rows=3117499 width=53) (actual time=7012.487..194557.307 rows=6604970
loops=1)
   Recheck Cond: ((message_id)::text > '20151225'::text)
   Rows Removed by Index Recheck: 113028616
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 88
   Heap Blocks: exact=1146550 lossy=2543948
   ->  Bitmap Index Scan on idx_delivered_mid
 (cost=0.00..100075.17 rows=6234997 width=0) (actual
time=4414.860..4414.860 rows=6605058 loops=1)
 Index Cond: ((message_id)::text > '20151225'::text)
   SubPlan 1
 ->  Bitmap Heap Scan on failed2 ff
 (cost=19778.06..707046.73 rows=44634 width=46) (actual
time=828.164..1949.687 rows=71500 loops=1)
   

Re: [PERFORM] Plan differences

2015-12-31 Thread Pavel Stehule
Hi


> Does anyone have any ideas? All data are loaded into this table via copy
> and no updates are done. Autovacuum settings weren't changed (and is on
> both). Do I need to increase shared_buffers to half of available memory for
> the planner to make certain optimisations? Anything else I'm missing or can
> try? The new server has been running for almost two weeks now so I would
> have thought things would have had a chance to settle down.
>
>
It is looking like some missing optimization that was removed from RC
release.

Regards

Pavel


> Cheers,
> Anton
>
>


Re: [PERFORM] Plan differences

2015-12-31 Thread Anton Melser
Hi,

It is looking like some missing optimization that was removed from RC
> release.
>

Thanks. Is there some discussion of why these optimisations were removed? I
started looking at some of the more complicated queries I do and there are
many occasions where there are 10-30x performance degradations compared
with the RC. Not what I was hoping for with a much more powerful machine!
Were these optimisations really dangerous? Is there any (easy and safe) way
to get them back or would I need to reinstall an RC version?

Thanks again,
Anton


Re: [PERFORM] Plan differences

2015-12-31 Thread Tom Lane
Anton Melser  writes:
> I moved a DB between two "somewhat" similar Postgres installs and am
> getting much worse plans on the second. The DB was dumped via pg_dump
> (keeping indexes, etc.) and loaded to the new server.

> [ "like 'foo%'" is not getting converted into index bounds ]

I'd bet your old database is in C locale and the new one is not.

The LIKE optimization requires an index that's sorted according to plain
C (strcmp) rules.  A regular text index will be that way only if the
database's LC_COLLATE is C.

If you don't want to rebuild the whole database, you can create indexes to
support this by declaring them with COLLATE "C", or the older way is to
declare them with text_pattern_ops as the index opclass.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Plan differences

2015-12-31 Thread Tom Lane
Jim Nasby  writes:
> On 12/31/15 9:02 AM, Tom Lane wrote:
>> If you don't want to rebuild the whole database, you can create indexes to
>> support this by declaring them with COLLATE "C", or the older way is to
>> declare them with text_pattern_ops as the index opclass.

> Do you have to do anything special in the query itself for COLLATE "C" 
> to work?

No.

> I didn't realize the two methods were equivalent.

Well, they're not equivalent exactly, but indxpath.c knows that either
way produces an index that will work for LIKE.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Plan differences

2015-12-31 Thread Anton Melser
>
> I'd bet your old database is in C locale and the new one is not.
>

Remind me never to never bet against you :-).


> The LIKE optimization requires an index that's sorted according to plain
> C (strcmp) rules.  A regular text index will be that way only if the
> database's LC_COLLATE is C.
>
> If you don't want to rebuild the whole database, you can create indexes to
> support this by declaring them with COLLATE "C", or the older way is to
> declare them with text_pattern_ops as the index opclass.
>

Declaring new indexes with COLLATE "C" and removing the old indexes fixed
the like problem but it created a another - the > and < queries need a sort
before passing off the the new index. Having two indexes seems to give me
the best of both worlds, though obviously it's taking up (much) more space.
As space isn't ever likely to be a problem, and there are no updates (only
copy) to these tables, I'll keep it like this to avoid having to reload the
entire DB.

Thanks very much for your help.
Cheers,
Anton


Re: [PERFORM] Plan differences

2015-12-31 Thread Jim Nasby

On 12/31/15 9:02 AM, Tom Lane wrote:

If you don't want to rebuild the whole database, you can create indexes to
support this by declaring them with COLLATE "C", or the older way is to
declare them with text_pattern_ops as the index opclass.


Do you have to do anything special in the query itself for COLLATE "C" 
to work?


I didn't realize the two methods were equivalent.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance