Installing PostgreSQL 9.5 in centos 6 using YUM

2018-04-17 Thread Dinesh Chandra 12108
Hi Team,

Could anyone help me to solve the below issue. I am installing PostgreSQL 9.5 
in centos 6 using YUM


[root@VM-02 PostgreSQL]# yum install 
https://download.postgresql.org/pub/repos/yum/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-3.noarch.rpm

Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
* base: mirror.nbrc.ac.in
* extras: mirror.nbrc.ac.in
* updates: mirror.nbrc.ac.in
base/primary_db 
  |  41 kB 00:00
http://mirror.nbrc.ac.in/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  43 kB 00:00
http://centos.excellmedia.net/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  45 kB 00:00
http://mirror.dhakacom.com/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  47 kB 00:00
http://mirror.xeonbd.com/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  49 kB 00:00
http://mirror.vbctv.in/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  51 kB 00:00
http://mirror.digistar.vn/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  53 kB 00:00
http://centos.myfahim.com/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  54 kB 00:00
http://ftp.iitm.ac.in/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  56 kB 00:00
http://centos.mirror.net.in/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
base/primary_db 
  |  58 kB 00:00
http://del-mirrors.extreme-ix.org/centos/6.9/os/x86_64/repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2:
 [Errno -1] Metadata file does not match checksum
Trying other mirror.
Error: failure: 
repodata/5d14ebd60604f4433dcc8a3a17cd3bbc7b80ec5dff74cbcc50dab6e711959265-primary.sqlite.bz2
 from base: [Errno 256] No more mirrors to try.

Regards,
Dinesh Chandra



DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.


Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Tomas Vondra



On 04/16/2018 10:42 PM, Hackety Man wrote:

...
The first thing I did was to run some baseline tests using the basic
queries inside of the IF() checks found in each of the functions to
see how the query planner handled them.  I ran the following two
queries.

EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501');
EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501');



Those are not the interesting plans, though. The EXISTS only cares about 
the first row, so you should be looking at


EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501') LIMIT 1;


I moved on to test the other function with the following query...

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
zz_spx_ifcount_noidx('Test501');

and I got the following "auto_explain" results...

2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms 
plan:

  Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
  Result  (cost=4.08..4.09 rows=1 width=1) (actual
time=426.274..426.274 rows=1 loops=1)
    Buffers: shared read=5406
    InitPlan 1 (returns $0)
   ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
width=0) (actual time=426.273..426.273 rows=0 loops=1)
  Filter: (lower(text_distinct) = 'test501'::text)
  Rows Removed by Filter: 100
  Buffers: shared read=5406
2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
"SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
  PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms 
plan:

  Query Text: explain (analyze, buffers) select * from
zz_spx_ifexists_noidx('Test501')
  Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
    Buffers: shared hit=30 read=5438

Definitely not the execution time, or query plan, results I was
expecting.  As we can see, no workers were employed here and my
guess was that this was the reason or the large execution time
difference between these 2 tests?  199 milliseconds versus 428
milliseconds, which is a big difference.  Why are workers not being
employed here like they were when I tested the query found inside of
the IF() check in a standalone manner?  But then I ran another test
and the results made even less sense to me.



The plan difference is due to not realizing the EXISTS essentially 
implies LIMIT 1. Secondly, it expects about 5000 rows matching the 
condition,  uniformly spread through the table. But it apparently takes 
much longer to find the first one, hence the increased duration.


How did you generate the data?


When I ran the above query the first 5 times after starting my
Postgres service, I got the same results each time (around 428
milliseconds), but when running the query 6 or more times, the
execution time jumps up to almost double that.  Here are the
"auto_explain" results running this query a 6th time...



This is likely due to generating a generic plan after the fifth 
execution. There seems to be only small difference in costs, though.



--"auto_explain" results after running the same query 6 or more
times.
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms 
plan:

  Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
  Result  (cost=4.58..4.59 rows=1 width=1) (actual
time=761.843..761.843 rows=1 loops=1)
    Buffers: shared hit=160 read=5246
    InitPlan 1 (returns $0)
   ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000
width=0) (actual time=761.841..761.841 rows=0 loops=1)
  Filter: (lower(text_distinct) = lower($1))
  Rows Removed by Filter: 100
  Buffers: shared hit=160 read=5246
2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
"SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
  PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms 
plan:

  Query Text: explain (analyze, buffers) select * from
zz_spx_ifexists_noidx('Test501')
  Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
    Buffers: shared hit=160 read=524

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Tomas Vondra



On 04/17/2018 07:17 AM, Pavel Stehule wrote:

Hi

2018-04-16 22:42 GMT+02:00 Hackety Man >:


...

>
A support of parallel query execution is not complete -  it doesn't work 
in PostgreSQL 11 too. So although EXISTS variant can be faster (but can 
be - the worst case of EXISTS is same like COUNT), then due disabled 
parallel execution the COUNT(*) is faster now. It is unfortunate, 
because I believe so this issue will be fixed in few years.




None of the issues seems to be particularly related to parallel query. 
It's much more likely a general issue with planning EXISTS / LIMIT and 
non-uniform data distribution.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Data migration from postgres 8.4 to 9.4

2018-04-17 Thread Akshay Ballarpure
Thank you for detailed info. much appreciated. May i know how to install 
pg_upgrade ?


With Best Regards
Akshay




From:   "Albin, Lloyd P" 
To: Akshay Ballarpure 
Date:   04/16/2018 08:38 PM
Subject:RE: Data migration from postgres 8.4 to 9.4



Akshay ,

There are several Official ways to upgrade PostgreSQL.

1) Use pg_upgrade (Faster) Postgres 8.4 to Postgres 9.4. Use the Postgres 
9.4 version of pg_upgrade.
https://www.postgresql.org/docs/9.4/static/pgupgrade.html

2) Dump and Restore your database into a new server (Slower) Postgres 8.4 
to Postgres 9.4 Use pg_dump with pg_restore or pg_dumpall with psql from 
Postgres 9.4 against your Postgres 8.4 Server. You need to use this method 
if you wish to change your initdb settings, such as the default encoding, 
turn on checksums, etc.
https://www.postgresql.org/docs/9.4/static/app-pgdump.html
https://www.postgresql.org/docs/9.4/static/app-pgrestore.html
https://www.postgresql.org/docs/9.4/static/app-pg-dumpall.html

3) Swap out the binaries. This can only be done using the same Postgres 
version (8.4.x or 9.4.x or 10.x) This means that you can upgrade from 
9.4.9 to 9.4.12 by just swapping out the binaries.

4) Unofficially you can use things like slony, etc to do a live migration 
without downtime.

Lloyd



From: Akshay Ballarpure [akshay.ballarp...@tcs.com]
Sent: Monday, April 16, 2018 12:03 AM
Subject: Data migration from postgres 8.4 to 9.4

Hello, 
I need help in using postgresql 8.4 data in postgres 9.4 version. Do I 
need to run any tool to achieve the same? 

Steps i followed is ran postgresql 8.4 and 9.4, copied data from 8.4 
instance to 9.4 and try to start postgresql 9.4 but no luck, getting below 
error. 

[root@ms-esmon esm-data]# su - postgres -c 
"/opt/rh/rh-postgresql94/root/usr/bin/postgres -D 
/var/ericsson/esm-data/postgresql-data/ 2>&1 &" 
[root@ms-esmon esm-data]# LOG:  skipping missing configuration file 
"/var/ericsson/esm-data/postgresql-data/postgresql.auto.conf" 
2018-04-16 06:52:01.546 GMT  FATAL:  database files are incompatible with 
server 
2018-04-16 06:52:01.546 GMT  DETAIL:  The data directory was initialized 
by PostgreSQL version 8.4, which is not compatible with this version 
9.4.9. 


With Best Regards
Akshay
Ericsson OSS MON
Tata Consultancy Services
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Tomas Vondra



On 04/17/2018 04:01 PM, Hackety Man wrote:



On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra 
mailto:tomas.von...@2ndquadrant.com>> wrote:




On 04/16/2018 10:42 PM, Hackety Man wrote:

...
     The first thing I did was to run some baseline tests using
the basic
     queries inside of the IF() checks found in each of the
functions to
     see how the query planner handled them.  I ran the
following two
     queries.

         EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM
zz_noidx1 WHERE
         LOWER(text_distinct) = LOWER('Test501');
         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
         LOWER(text_distinct) = LOWER('Test501');


Those are not the interesting plans, though. The EXISTS only cares
about the first row, so you should be looking at

     EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
     LOWER(text_distinct) = LOWER('Test501') LIMIT 1;



Okay.  I tested this query and it did return an execution time on par 
with my tests of the "zz_spx_ifexists_noidx" function.

*
*



     I moved on to test the other function with the following
query...

         EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
         zz_spx_ifcount_noidx('Test501');

     and I got the following "auto_explain" results...

         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
426.279 ms         plan:
           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
           Result  (cost=4.08..4.09 rows=1 width=1) (actual
         time=426.274..426.274 rows=1 loops=1)
             Buffers: shared read=5406
             InitPlan 1 (returns $0)
            ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00
rows=5000
         width=0) (actual time=426.273..426.273 rows=0 loops=1)
           Filter: (lower(text_distinct) = 'test501'::text)
           Rows Removed by Filter: 100
           Buffers: shared read=5406
         2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
at IF
         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
428.077 ms         plan:
           Query Text: explain (analyze, buffers) select * from
         zz_spx_ifexists_noidx('Test501')
           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
         rows=1 width=32) (actual time=428.076..428.076 rows=1
loops=1)
             Buffers: shared hit=30 read=5438

     Definitely not the execution time, or query plan, results I was
     expecting.  As we can see, no workers were employed here and my
     guess was that this was the reason or the large execution time
     difference between these 2 tests?  199 milliseconds versus 428
     milliseconds, which is a big difference.  Why are workers
not being
     employed here like they were when I tested the query found
inside of
     the IF() check in a standalone manner?  But then I ran
another test
     and the results made even less sense to me.


The plan difference is due to not realizing the EXISTS essentially
implies LIMIT 1. Secondly, it expects about 5000 rows matching the
condition,  uniformly spread through the table. But it apparently
takes much longer to find the first one, hence the increased duration.



Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even 
though it doesn't show in the query plan, correct? Is it not possible 
for parallel scans to be implemented while applying an implicit, or 
explicit, LIMIT 1?

**//___^



It doesn't add a limit node to the plan, but it behaves similarly to 
that. The database only needs to fetch the first row to answer the 
EXISTS predicate.


I don't think this is inherently incompatible with parallel plans, but 
the planner simply thinks it's going to bee very cheap - cheaper than 
setting up parallel workers etc. So it does not do that.




How did you generate the data?



I used generate_series() to create 1 million records in sequence at the 
same time that I created the table using the following script...


CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
int_distinct, 'Test'::text || generate_series(0, 99)::text AS
text_distinct;



Which means that there are actually no matching rows for 'Test501'. 
So the database will scan the whole table anyway, in order to answer the 
EXISTS co

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Hackety Man
On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra 
wrote:

>
>
> On 04/16/2018 10:42 PM, Hackety Man wrote:
>
>> ...
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to
>> see how the query planner handled them.  I ran the following two
>> queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>>
>>
> Those are not the interesting plans, though. The EXISTS only cares about
> the first row, so you should be looking at
>
> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
> LOWER(text_distinct) = LOWER('Test501') LIMIT 1;



Okay.  I tested this query and it did return an execution time on par with
my tests of the "zz_spx_ifexists_noidx" function.



>
>
> I moved on to test the other function with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>> zz_spx_ifcount_noidx('Test501');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms
>>  plan:
>>   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>   Result  (cost=4.08..4.09 rows=1 width=1) (actual
>> time=426.274..426.274 rows=1 loops=1)
>> Buffers: shared read=5406
>> InitPlan 1 (returns $0)
>>->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
>> width=0) (actual time=426.273..426.273 rows=0 loops=1)
>>   Filter: (lower(text_distinct) = 'test501'::text)
>>   Rows Removed by Filter: 100
>>   Buffers: shared read=5406
>> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
>> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>>   PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms
>>  plan:
>>   Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test501')
>>   Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>> rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>> Buffers: shared hit=30 read=5438
>>
>> Definitely not the execution time, or query plan, results I was
>> expecting.  As we can see, no workers were employed here and my
>> guess was that this was the reason or the large execution time
>> difference between these 2 tests?  199 milliseconds versus 428
>> milliseconds, which is a big difference.  Why are workers not being
>> employed here like they were when I tested the query found inside of
>> the IF() check in a standalone manner?  But then I ran another test
>> and the results made even less sense to me.
>>
>>
> The plan difference is due to not realizing the EXISTS essentially implies
> LIMIT 1. Secondly, it expects about 5000 rows matching the condition,
> uniformly spread through the table. But it apparently takes much longer to
> find the first one, hence the increased duration.
>


Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even
though it doesn't show in the query plan, correct?  Is it not possible for
parallel scans to be implemented while applying an implicit, or explicit,
LIMIT 1?



>
> How did you generate the data?



I used generate_series() to create 1 million records in sequence at the
same time that I created the table using the following script...

CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
int_distinct, 'Test'::text || generate_series(0, 99)::text AS
text_distinct;


>
> When I ran the above query the first 5 times after starting my
>> Postgres service, I got the same results each time (around 428
>> milliseconds), but when running the query 6 or more times, the
>> execution time jumps up to almost double that.  Here are the
>> "auto_explain" results running this query a 6th time...
>>
>>
> This is likely due to generating a generic plan after the fifth execution.
> There seems to be only small difference in costs, though.
>
>
> --"auto_explain" results after running the same query 6 or more
>> times.
>> 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms
>>  plan:
>>   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>   Result  (cost=4.58..4.59 rows=1 width=1) (actual
>> time=761.843..761.843 rows=1 loops=1)
>> Buffers: shared hit=160 read=5246
>> InitPlan 1 (returns 

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Hackety Man
Hi Pavel,

Thanks for sharing that information.  I was not aware that the parallel
query functionality was not yet fully implemented.

Thanks,
Ryan

On Tue, Apr 17, 2018 at 1:17 AM, Pavel Stehule 
wrote:

> Hi
>
> 2018-04-16 22:42 GMT+02:00 Hackety Man :
>
>> *A description of what you are trying to achieve and what results you
>> expect.:*
>>
>> My end goal was to test the execution time difference between using an
>> IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and
>> when a string match was not found.  My expectation was that my 2 functions
>> would behave fairly similarly, but they most certainly did not.  Here are
>> the table, functions, test queries, and test query results I received, as
>> well as comments as I present the pieces and talk about the results from my
>> perspective.
>>
>> This is the table and data that I used for my tests.  A table with 1
>> million sequenced records.  No indexing on any columns.  I ran ANALYZE on
>> this table and a VACUUM on the entire database, just to be sure.
>>
>> CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 99) AS
>> int_distinct, 'Test'::text || generate_series(0, 99)::text AS
>> text_distinct;
>>
>> These are the 2 functions that I ran my final tests with.  My goal was to
>> determine which function would perform the fastest and my expectation was
>> that they would still be somewhat close in execution time comparison.
>>
>> --Test Function #1
>> CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
>>  RETURNS text
>>  LANGUAGE 'plpgsql'
>>  STABLE
>> AS $$
>>
>> BEGIN
>>  IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) > 0 THEN
>>   RETURN 'Found';
>>  ELSE
>>   RETURN 'Not Found';
>>  END IF;
>> END;
>> $$;
>>
>> --Test Function #2
>> CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
>>  RETURNS text
>>  LANGUAGE 'plpgsql'
>>  STABLE
>> AS $$
>>
>> BEGIN
>>  IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) THEN
>>   RETURN 'Found';
>>  ELSE
>>   RETURN 'Not Found';
>>  END IF;
>> END;
>> $$;
>>
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to see how
>> the query planner handled them.  I ran the following two queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test501');
>>
>> The execution time results and query plans for these two were very
>> similar, as expected.  In the results I can see that 2 workers were
>> employed for each query plan.
>>
>> --Results for the SELECT COUNT(*) query.
>> QUERY PLAN
>>
>> 
>> 
>> 
>> Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual
>> time=172.105..172.105 rows=1 loops=1)
>>   Buffers: shared read=1912
>>
>>
>>   ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual
>> time=172.020..172.099 rows=3 loops=1)
>>   Workers Planned: 2
>>
>>
>>   Workers Launched: 2
>>
>>
>>   Buffers: shared read=1912
>>
>>
>>   ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual
>> time=155.123..155.123 rows=1 loops=3)
>>  Buffers: shared read=5406
>>
>>
>>  ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=0) (actual time=155.103..155.103 rows=0 loops=3)
>>  Filter: (lower(text_distinct) = 'test501'::text)
>>
>>  Rows Removed by Filter: 33
>>
>>  Buffers: shared read=5406
>>
>> Planning time: 0.718 ms
>>
>>
>> Execution time: 187.601 ms
>>
>> --Results for the SELECT 1 query.
>> QUERY PLAN
>>
>> 
>> 
>> Gather  (cost=1000.00..13156.00 rows=5000 width=4) (actual
>> time=175.682..175.682 rows=0 loops=1)
>>   Workers Planned: 2
>>
>>
>>   Workers Launched: 2
>>
>>
>>   Buffers: shared read=2021
>>
>>
>>   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=4) (actual time=159.769..159.769 rows=0 loops=3)
>>   Filter: (lower(text_distinct) = 'test501'::text)
>>
>>   Rows Removed by Filter: 33
>>
>>   Buffers: shared read=5406
>>
>> Planning time: 0.874 ms
>>
>> Execution time: 192.045 ms
>>
>> After running these baseline tests and viewing the fairly similar
>> results, right or wrong, I expected my queries that tested the functions to
>> behave similarly.  I started with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000
>> 001');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:57:22.624 EDT [17812] LOG:  duration: 155.239 ms  plan:
>>  Query Text: SELECT (SELECT COUN

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Pavel Stehule
2018-04-17 12:52 GMT+02:00 Tomas Vondra :

>
>
> On 04/17/2018 07:17 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> 2018-04-16 22:42 GMT+02:00 Hackety Man > hackety...@gmail.com>>:
>>
>> ...
>>
> >
>
>> A support of parallel query execution is not complete -  it doesn't work
>> in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be
>> - the worst case of EXISTS is same like COUNT), then due disabled parallel
>> execution the COUNT(*) is faster now. It is unfortunate, because I believe
>> so this issue will be fixed in few years.
>>
>>
> None of the issues seems to be particularly related to parallel query.
> It's much more likely a general issue with planning EXISTS / LIMIT and
> non-uniform data distribution.


I was wrong EXISTS are not supported. It looks like new dimension of
performance issues related to parallelism. I understand so this example is
worst case.

postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1
WHERE LOWER(text_distinct) = LOWER('Test501'));
  QUERY
PLAN
--
 Result  (cost=4.08..4.09 rows=1 width=1) (actual time=423.600..423.600
rows=1 loops=1)
   Buffers: shared hit=3296 read=2110
   InitPlan 1 (returns $0)
 ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0)
(actual time=423.595..423.595 rows=0 loops=1)
   Filter: (lower(text_distinct) = 'test501'::text)
   Rows Removed by Filter: 100
   Buffers: shared hit=3296 read=2110
 Planning Time: 0.133 ms
 Execution Time: 423.633 ms

postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
LOWER(text_distinct) = LOWER('Test501');
   QUERY
PLAN
-
 Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual
time=246.662..246.662 rows=1 loops=1)
   Buffers: shared hit=817 read=549
   ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual
time=246.642..246.656 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=817 read=549
 ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8)
(actual time=242.168..242.169 rows=1 loops=3)
   Buffers: shared hit=3360 read=2046
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00
rows=2083 width=0) (actual time=242.165..242.165 rows=0 loops=3)
 Filter: (lower(text_distinct) = 'test501'::text)
 Rows Removed by Filter: 33
 Buffers: shared hit=3360 read=2046
 Planning Time: 0.222 ms
 Execution Time: 247.927 ms

The cost of EXISTS is too low to use parallelism, and value is found too
late.

When I decrease startup cost to 0 of parallel exec I got similar plan,
similar time

postgres=# EXPLAIN (ANALYZE, BUFFERS) select exists(SELECT * FROM zz_noidx1
WHERE LOWER(text_distinct) = LOWER('Test501'));
 QUERY
PLAN
-
 Result  (cost=2.43..2.44 rows=1 width=1) (actual time=246.398..246.402
rows=1 loops=1)
   Buffers: shared hit=885 read=489
   InitPlan 1 (returns $1)
 ->  Gather  (cost=0.00..12156.00 rows=5000 width=0) (actual
time=246.393..246.393 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=885 read=489
   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00
rows=2083 width=0) (actual time=241.067..241.067 rows=0 loops=3)
 Filter: (lower(text_distinct) = 'test501'::text)
 Rows Removed by Filter: 33
 Buffers: shared hit=3552 read=1854
 Planning Time: 0.138 ms
 Execution Time: 247.623 ms
(13 rows)

>From this perspective it looks so cost of EXISTS(subselect) is maybe too
low.

Regards

Pavel






>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Hackety Man
On Tue, Apr 17, 2018 at 10:23 AM, Tomas Vondra  wrote:

>
>
> On 04/17/2018 04:01 PM, Hackety Man wrote:
>
>>
>>
>> On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <
>> tomas.von...@2ndquadrant.com >
>> wrote:
>>
>>
>>
>> On 04/16/2018 10:42 PM, Hackety Man wrote:
>>
>> ...
>>  The first thing I did was to run some baseline tests using
>> the basic
>>  queries inside of the IF() checks found in each of the
>> functions to
>>  see how the query planner handled them.  I ran the
>> following two
>>  queries.
>>
>>  EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM
>> zz_noidx1 WHERE
>>  LOWER(text_distinct) = LOWER('Test501');
>>  EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>>  LOWER(text_distinct) = LOWER('Test501');
>>
>>
>> Those are not the interesting plans, though. The EXISTS only cares
>> about the first row, so you should be looking at
>>
>>  EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>>  LOWER(text_distinct) = LOWER('Test501') LIMIT 1;
>>
>>
>>
>> Okay.  I tested this query and it did return an execution time on par
>> with my tests of the "zz_spx_ifexists_noidx" function.
>> *
>> *
>>
>>
>>
>>
>>  I moved on to test the other function with the following
>> query...
>>
>>  EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>>  zz_spx_ifcount_noidx('Test501');
>>
>>  and I got the following "auto_explain" results...
>>
>>  2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
>> 426.279 ms plan:
>>Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1
>> WHERE
>>  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>Result  (cost=4.08..4.09 rows=1 width=1) (actual
>>  time=426.274..426.274 rows=1 loops=1)
>>  Buffers: shared read=5406
>>  InitPlan 1 (returns $0)
>> ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00
>> rows=5000
>>  width=0) (actual time=426.273..426.273 rows=0 loops=1)
>>Filter: (lower(text_distinct) =
>> 'test501'::text)
>>Rows Removed by Filter: 100
>>Buffers: shared read=5406
>>  2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL
>> statement
>>  "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>>  LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>>PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
>> at IF
>>  2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
>> 428.077 ms plan:
>>Query Text: explain (analyze, buffers) select * from
>>  zz_spx_ifexists_noidx('Test501')
>>Function Scan on zz_spx_ifexists_noidx
>> (cost=0.25..0.26
>>  rows=1 width=32) (actual time=428.076..428.076 rows=1
>> loops=1)
>>  Buffers: shared hit=30 read=5438
>>
>>  Definitely not the execution time, or query plan, results I
>> was
>>  expecting.  As we can see, no workers were employed here and
>> my
>>  guess was that this was the reason or the large execution
>> time
>>  difference between these 2 tests?  199 milliseconds versus
>> 428
>>  milliseconds, which is a big difference.  Why are workers
>> not being
>>  employed here like they were when I tested the query found
>> inside of
>>  the IF() check in a standalone manner?  But then I ran
>> another test
>>  and the results made even less sense to me.
>>
>>
>> The plan difference is due to not realizing the EXISTS essentially
>> implies LIMIT 1. Secondly, it expects about 5000 rows matching the
>> condition,  uniformly spread through the table. But it apparently
>> takes much longer to find the first one, hence the increased duration.
>>
>>
>>
>> Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even
>> though it doesn't show in the query plan, correct? Is it not possible for
>> parallel scans to be implemented while applying an implicit, or explicit,
>> LIMIT 1?
>> **//___^
>>
>>
> It doesn't add a limit node to the plan, but it behaves similarly to that.
> The database only needs to fetch the first row to answer the EXISTS
> predicate.
>
> I don't think this is inherently incompatible with parallel plans, but the
> planner simply thinks it's going to bee very cheap - cheaper than setting
> up parallel workers etc. So it does not do that.



Understood.  Any chance of the planner possibly being enhanced in the
future to come to a be

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Pavel Stehule
>>>
>>>
>>> Right.  I was more wondering why it switched over to a generic plan, as
>>> you've stated, like clockwork starting with the 6th execution run.
>>>
>>>
>> That's a hard-coded value. The first 5 executions are re-planned using
>> the actual parameter values, and then we try generating a generic plan and
>> see if it's cheaper than the non-generic one. You can disable that, though.
>
>
>
> So on that note, in the planner's eyes, starting with the 6th execution,
> it looks like the planner still thinks that the generic plan will perform
> better than the non-generic one, which is why it keeps using the generic
> plan from that point forward?
>
> Similar to the parallel scans, any chance of the planner possibly being
> enhanced in the future to come to a better conclusion as to whether, or
> not, the generic plan will perform better than the non-generic plan?  :-)
>

all is based on estimations, and when estimations are not correct, then ..
The current solution is fart to perfect, but nobody goes with better ideas
:( Statistic based planners is best available technology, unfortunately
with lot of gaps.

There are not any statistic where any tuple is in database, so a precious
estimation of EXISTS is hard (impossible). Similar issue is with LIMIT. It
can be nice, but I don't expect any significant changes in this area -
maybe some tuning step by step of some parameters.

Regards

Pavel


>
>
>
>>
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>
> Thanks for all the help!  I really appreciate it!
>
> Ryan
>
>


Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Tomas Vondra
On 04/17/2018 05:43 PM, Hackety Man wrote:
> 
> 
> On Tue, Apr 17, 2018 at 10:23 AM, Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> 
> On 04/17/2018 04:01 PM, Hackety Man wrote:
> 
>  ...
> Right.  I was more wondering why it switched over to a generic
> plan, as you've stated, like clockwork starting with the 6th
> execution run.
> 
> 
> That's a hard-coded value. The first 5 executions are re-planned
> using the actual parameter values, and then we try generating a
> generic plan and see if it's cheaper than the non-generic one. You
> can disable that, though.
> 
> 
> 
> So on that note, in the planner's eyes, starting with the 6th execution,
> it looks like the planner still thinks that the generic plan will
> perform better than the non-generic one, which is why it keeps using the
> generic plan from that point forward?
> 

Yes. The point of prepared statements (which also applies to plpgsql, as
it uses prepared statements automatically) is to eliminate the planning
overhead. So we try planning it with actual parameter values for the
first 5 plans, and then compare it to the generic plan.

> Similar to the parallel scans, any chance of the planner possibly being
> enhanced in the future to come to a better conclusion as to whether, or
> not, the generic plan will perform better than the non-generic plan?  :-)

There's always hope, but it's hard to say if/when an enhancement will
happen, unfortunately.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

2018-04-17 Thread Tomas Vondra


On 04/17/2018 04:05 PM, Hackety Man wrote:
> Hi Pavel,
> 
> Thanks for sharing that information.  I was not aware that the parallel
> query functionality was not yet fully implemented.
> 

Nothing is ever "fully implemented". There are always gaps and possible
improvements ;-)

That being said, parallelism opens an entirely new dimension of possible
plans and planning issues.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Citext Performance

2018-04-17 Thread Deepak Somaiya
 Apology for sending you emails directly but I do see you guys responding on 
email related to performance so thought of copying you folks.
Folks, I read following (PostgreSQL: Documentation: 9.6: citext) and it does 
not hold true in my testing.. i.e citext is not performing better than lower.Am 
I missing something? help is appreciated.

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
PostgreSQL: Documentation: 9.6: citext


 |

 |

 |




"citext is not as efficient as text because the operator functions and the 
B-tree comparison functions must make copies of the data and convert it to 
lower case for comparisons. It is, however, slightly more efficient than using 
lower to get case-insensitive matching."


Here is what I have done 
drop table test;drop table testci;
CREATE TABLE test (id INTEGER PRIMARY KEY,name character varying(254));CREATE 
TABLE testci (id INTEGER PRIMARY KEY,name citext
);
INSERT INTO test(id, name)SELECT generate_series(101,200), 
(md5(random()::text));
INSERT INTO testci(id, name)SELECT generate_series(1,100), 
(md5(random()::text));

Now, I have done sequential search
explain (analyze on, format yaml) select * from test where 
lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:     Node Type: "Seq Scan"    Parallel Aware: false    Relation Name: 
"test"    Alias: "test"    Startup Cost: 0.00    Total Cost: 23334.00    Plan 
Rows: 5000    Plan Width: 37    Actual Startup Time: 0.016    Actual Total 
Time: 680.199    Actual Rows: 1    Actual Loops: 1    Filter: 
"(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"    Rows 
Removed by Filter: 99  Planning Time: 0.045  Triggers:   Execution Time: 
680.213

explain (analyze on, format yaml) select * from testci where 
name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan:     Node Type: "Seq Scan"    Parallel Aware: false    Relation Name: 
"testci"    Alias: "testci"    Startup Cost: 0.00    Total Cost: 20834.00    
Plan Rows: 1    Plan Width: 37    Actual Startup Time: 0.017    Actual Total 
Time: 1184.485    Actual Rows: 1    Actual Loops: 1    Filter: "(name = 
'956d692092f0b9f85f36bf2b2501f3ad'::citext)"    Rows Removed by Filter: 99  
Planning Time: 0.029  Triggers:   Execution Time: 1184.496


You can see sequential searches with lower working twice as fast as citext.
Now I added index on citext and equivalent functional index (lower) on text.

CREATE INDEX textlowerindex ON test (lower(name));
create index textindex on test(name);


Index creation took longer with citext v/s creating lower functional index.

Now here comes execution with indexes
explain (analyze on, format yaml) select * from test where 
lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan:     Node Type: "Bitmap Heap Scan"    Parallel Aware: false    Relation 
Name: "test"    Alias: "test"    Startup Cost: 187.18    Total Cost: 7809.06    
Plan Rows: 5000    Plan Width: 37    Actual Startup Time: 0.020    Actual Total 
Time: 0.020    Actual Rows: 1    Actual Loops: 1    Recheck Cond: 
"(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"    Rows 
Removed by Index Recheck: 0    Exact Heap Blocks: 1    Lossy Heap Blocks: 0    
Plans:       - Node Type: "Bitmap Index Scan"        Parent Relationship: 
"Outer"        Parallel Aware: false        Index Name: "textlowerindex"        
Startup Cost: 0.00        Total Cost: 185.93        Plan Rows: 5000        Plan 
Width: 0        Actual Startup Time: 0.016        Actual Total Time: 0.016      
  Actual Rows: 1        Actual Loops: 1        Index Cond: 
"(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"  Planning 
Time: 0.051  Triggers:   Execution Time: 0.035



explain (analyze on, format yaml) select * from testci where 
name='956d692092f0b9f85f36bf2b2501f3ad'; 

- Plan:     Node Type: "Index Scan"    Parallel Aware: false    Scan Direction: 
"Forward"    Index Name: "citextindex"    Relation Name: "testci"    Alias: 
"testci"    Startup Cost: 0.42    Total Cost: 8.44    Plan Rows: 1    Plan 
Width: 37    Actual Startup Time: 0.049    Actual Total Time: 0.050    Actual 
Rows: 1    Actual Loops: 1    Index Cond: "(name = 
'956d692092f0b9f85f36bf2b2501f3ad'::citext)"    Rows Removed by Index Recheck: 
0  Planning Time: 0.051  Triggers:   Execution Time: 0.064