Re: [PERFORM] Explain Analyze - actual time in loops

2017-09-08 Thread Igor Neyman
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Neto pr
Sent: Thursday, September 07, 2017 11:17 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Explain Analyze - actual time in loops

…
 ###
   -> Index Scan using idx_l_partkeylineitem000x on lineitem (cost = 
0.57..97.65 rows = 26 width = 36)
  (current time = 23.615..419.113 rows = 30 loops = 26469)
  Index Cond: (l_partkey = part.p_partkey)
## #
According to the documentation, one should multiply the Actual Time by the 
number of Loops.
That is: 419113 ms -> 419113/1000/60 = 6.9 minutes * 26469 (loops) = 182.6 
minutes.

But how does this stretch take 182.6 minutes, if the entire query ran in 66 
minutes?

…….
thank you and best regards
[] 's Neto
Neto,
The time you see there is in ms, so the point (‘.’) you see is the digital 
point.
So, it is 419.113ms or a little less than half a second (0.419sec).
Igor Neyman


Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mariel Cherkassky
Sent: Monday, August 21, 2017 10:20 AM
To: MichaelDBA <michael...@sqlexec.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] performance problem on big tables

I had a system that consist from many objects(procedures,functions..) on an 
oracle database. We decided to integrate that system to postgresql. That system 
coppied alot of big tables from a different read only oracle database and 
preformed on it alot of queries to produce reports. The part of getting the 
data is part of some procedures, I cant change it so freely. I'm searching a 
way to improve the perfomance of the database because I'm sure that I didnt 
conifgure something well. Moreover, When I run complicted queries (joint 
between 4 big tables and filtering) it takes alot of time and I see that the 
server is cacheing all my ram memory.


Probably your joins are done on Postgres side.

m.b. instead of Postgres pulling data from Oracle, you should try pushing data 
from Oracle to Postgres using Oracle’s Heterogeneous Services and Postgres ODBC 
driver. In this case you do your joins and filtering on Oracles side and just 
push the result set to Postgres.
That’s how I did migration from Oracle to Postgres.

Regards,
Igor Neyman


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau <charles.nad...@gmail.com>
Cc: Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <iney...@perceptron.com<mailto:iney...@perceptron.com>>
Cc: Jeff Janes <jeff.ja...@gmail.com<mailto:jeff.ja...@gmail.com>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman
From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
   ->  Hash  (cost=30059641.47..30059641.47 
rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
 Buckets: 4194304 (originally 
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
 Buffers: shared hit=138102964 
read=67983056
 -

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman


From: 
pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes >
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
 Buffers: shared hit=996551813 
read=15851133
   

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes 
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
 Buffers: shared hit=996551813 
read=15851133
 I/O Timings: read=5323746.860
 ->  

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Wednesday, July 12, 2017 3:21 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I set shared_buffers to 24 GB and effective_cache_size to 64GB and I can see 
that the queries are faster due to the fact that the index are used more often. 
Knowing I have 72GB of RAM and the server is exclusively dedicated to 
Postgresql, what could be the maximum value for effective_cache?
Thanks!

Charles

64GB for effective_cache_size should be good enough, adding couple more GB 
wouldn’t change much.

Igor


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 11:25 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


Attention: This email was sent from someone outside of Perceptron. Always 
exercise caution when opening attachments or clicking links from unknown 
senders or when receiving unexpected emails.

Igor,

The sum of effective_cache_size and shared_buffer will be higher than the 
physical memory I have. Is it OK?
Thanks!

Charles

Yes, that’s normal.

shared_buffers is the maximum that Postgres allowed to allocate, while 
effective_cache_size is just a number that optimizer takes into account when 
creating execution plan.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Tuesday, July 11, 2017 10:34 AM
To: Charles Nadeau <charles.nad...@gmail.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman <iney...@perceptron.com<mailto:iney...@perceptron.com>>
Cc: Andreas Kretschmer 
<andr...@a-kretschmer.de<mailto:andr...@a-kretschmer.de>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor

Also, maybe it’s time to look at execution plans (explain analyze) of specific 
slow queries, instead of trying to solve the problem “in general”.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: Andreas Kretschmer <andr...@a-kretschmer.de>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Monday, July 10, 2017 11:48 AM
To: Andreas Kretschmer <andr...@a-kretschmer.de>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Andreas,

Because the ratio between the Sequential IOPS and Random IOPS is about 29. 
Taking into account that part of the data is in RAM, I obtained an "effective" 
ratio of about 22.
Thanks!

Charles

On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer 
<andr...@a-kretschmer.de<mailto:andr...@a-kretschmer.de>> wrote:


Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
random_page_cost | 22


why such a high value for random_page_cost?

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com<http://www.2ndQuadrant.com>


--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Considering RAM size of 72 GB and your database size of ~225GB, and also the 
fact that Postgres is the only app running on the server, probably 1/3 of your 
database resides in memory, so random_page_cost = 22 looks extremely high, 
probably it completely precludes index usage in your queries.

You should try this setting at least at its default value: random_page_cost =4, 
and probably go even lower.
Also, effective_cache_size is at least as big as your shared_buffers. Having 
72GB RAM t effective_cache_size should be set around 64GB (again considering 
that Postgres is the only app running on the server).

Regards,
Igor Neyman






Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Thursday, September 22, 2016 10:36 AM
To: Sven R. Kunze <srku...@mail.de>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: Igor Neyman 
Sent: Thursday, September 22, 2016 10:33 AM
To: 'Sven R. Kunze' <srku...@mail.de>; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the complete 
big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two separate ones 
and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

______

Another option to try::


SELECT DISTINCT 
FROM
"big_table"
LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id" 
AND  "table_a"."item_id" IN ())
LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id" 
AND "table_b"."item_id" IN ());

Regards,
Igor Neyman

___

Please disregard this last suggestion, it'll not produce required results.

Solution using UNION should work.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: Igor Neyman 
Sent: Thursday, September 22, 2016 10:33 AM
To: 'Sven R. Kunze' <srku...@mail.de>; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the complete 
big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two separate ones 
and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

______

Another option to try::


SELECT DISTINCT 
FROM
"big_table"
LEFT OUTER JOIN "table_a" ON ("big_table"."id" = "table_a"."big_table_id" 
AND  "table_a"."item_id" IN ())
LEFT OUTER JOIN "table_b" ON ("big_table"."id" = "table_b"."big_table_id" 
AND "table_b"."item_id" IN ());

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sven R. Kunze
Sent: Thursday, September 22, 2016 9:25 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Multiple-Table-Spanning Joins with ORs in WHERE Clause

Hi pgsql-performance list,


what is the recommended way of doing **multiple-table-spanning joins with ORs 
in the WHERE-clause**?


Until now, we've used the LEFT OUTER JOIN to filter big_table like so:


SELECT DISTINCT 
FROM
 "big_table"
 LEFT OUTER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
 LEFT OUTER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
 OR
 "table_b"."item_id" IN ();


However, this results in an awful slow plan (requiring to scan the 
complete big_table which obviously isn't optimal).
So, we decided (at least for now) to split up the query into two 
separate ones and merge/de-duplicate the result with application logic:


SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ();


SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


As you can imagine we would be very glad to solve this issue with a 
single query and without having to re-code existing logic of PostgreSQL. 
But how?


Best,
Sven


PS: if you require EXPLAIN ANALYZE, I can post them as well.

__

What about:

SELECT 
FROM
 "big_table" INNER JOIN "table_a" ON ("big_table"."id" = 
"table_a"."big_table_id")
WHERE
 "table_a"."item_id" IN ()
UNION
SELECT 
FROM
 "big_table" INNER JOIN "table_b" ON ("big_table"."id" = 
"table_b"."big_table_id")
WHERE
 "table_b"."item_id" IN ();


Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problem with performance using query with unnest after migrating from V9.1 to V9.2 and higher

2016-09-22 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Knels, Udo
Sent: Thursday, September 22, 2016 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Problem with performance using query with unnest after 
migrating from V9.1 to V9.2 and higher

Hi,

I tried the following on the upgraded database:
analyze schema_test.table_a;

But the result is the same. 

https://explain.depesz.com/s/hsx5

"Sort  (cost=5.94..6.01 rows=26 width=6) (actual time=0.199..0.200 rows=3 
loops=1)"
"  Sort Key: table_a.col0002"
"  Sort Method: quicksort  Memory: 25kB"
"  Buffers: shared hit=1"
"  ->  HashAggregate  (cost=5.07..5.33 rows=26 width=6) (actual 
time=0.161..0.163 rows=3 loops=1)"
"Group Key: table_a.col0002"
"Buffers: shared hit=1"
"->  Hash Semi Join  (cost=2.76..4.95 rows=50 width=6) (actual 
time=0.070..0.133 rows=26 loops=1)"
"  Hash Cond: ((table_a.col0001)::text = 
(unnest('{501001}'::text[])))"
"  Buffers: shared hit=1"
"  ->  Seq Scan on table_a  (cost=0.00..1.50 rows=50 width=17) 
(actual time=0.015..0.034 rows=50 loops=1)"
"Buffers: shared hit=1"
"  ->  Hash  (cost=1.51..1.51 rows=100 width=32) (actual 
time=0.028..0.028 rows=1 loops=1)"
"Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"->  Result  (cost=0.00..0.51 rows=100 width=0) (actual 
time=0.015..0.017 rows=1 loops=1)"
"Planning time: 0.653 ms"
"Execution time: 0.326 ms"

Greetings

Udo Knels
treubuch IT GmbH
_________

table_a is too small, just 50 records.
Optimizer decided (correctly) that Seq Scan is cheaper than using an index.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random slow queries

2016-06-29 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of 
devel.brai...@xoxy.net
Sent: Tuesday, June 28, 2016 9:24 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Random slow queries

Hi,

I have a weird slow query issue I can't seem to find the cause of, so I'm 
hoping someone here can shed some light on this.





Given the reasonably small dataset (a pg_dump of the full database containing 
about 2500 jobs is less than 1MB) I would think that the whole database fits in 
memory anyway, making this issue all the more puzzling. Have I missed something 
obvious?

Best regards,
Roel

__

Did you try AUTO_EXPLAIN extension 
(https://www.postgresql.org/docs/9.3/static/auto-explain.html) for diagnostic 
purposes?
With auto_explain.loganalize = true it will log automatically EXPLAIN ANALYZE 
output, rather than just EXPLAIN output.  Turning this parameter ON permanently 
could have negative impact on over-all performance, so use it judiciously.  

Regards,
Igor


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big number of connections

2016-03-31 Thread Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jarek
Sent: Thursday, March 31, 2016 3:08 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Big number of connections

Hello!

We are going to build system based on PostgreSQL database for huge 
number of individual users (few thousands). Each user will have his own 
account, for authorization we will use Kerberos (MIT or Windows). 
Most of users will have low activity, but for various reasons, connection 
should be open all the time.
I'd like to know what potential problems and limitations we can expect with 
such deployment.
During preliminary testing we have found that for each connection we 
need ~1MB RAM. Is there any way to decrease this ? Is there any risk, that such 
number of users will degrade performance ?
I'll be happy to hear any remarks and suggestions related to design, 
administration and handling of such installation.

best regards
Jarek

___

Take a look at PgBouncer.
It should solve your problems.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Primary key index partially used

2016-01-26 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Florian Gossin
Sent: Tuesday, January 26, 2016 10:52 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Primary key index partially used

Hi all,
I'm using PostgreSQL 9.4.5 and I have a weird issue.
I have the following three tables:

visit
( nb bigint NOT NULL,
  CONSTRAINT visit_pkey PRIMARY KEY (nb)
)
with ~ 750'000 rows

invoice
( id bigint NOT NULL,
  CONSTRAINT invoice_pkey PRIMARY KEY (id)
)
with ~ 3'000'000 rows

visit_invoice
( invoice_id bigint NOT NULL,
  visit_nb bigint NOT NULL,
  CONSTRAINT visit_invoice_pkey PRIMARY KEY (visit_nb, invoice_id),
  CONSTRAINT fk_vis_inv_inv FOREIGN KEY (invoice_id)
  REFERENCES invoice (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_vis_inv_vis FOREIGN KEY (visit_nb)
  REFERENCES visit (nb) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)
with ~ 3'000'000 rows

When I delete a row in visit table, it runs the trigger for constraint 
fk_vis_inv_vis and it seems to use the primary key index on visit_invoice:
explain analyze DELETE FROM visit WHERE nb = 213;

 Delete on visit  (cost=0.42..8.44 rows=1 width=6) (actual time=2.225..2.225 
rows=0 loops=1)
   ->  Index Scan using visit_pkey on visit  (cost=0.42..8.44 rows=1 width=6) 
(actual time=2.084..2.088 rows=1 loops=1)
 Index Cond: (nb = 213)
 Planning time: 0.201 ms
 Trigger for constraint fk_vis_inv_vis: time=0.673 calls=1

But when I delete a record in the table invoice, it runs the trigger for 
constraint fk_vis_inv_vis and it doesn't seem to use the primary key index on 
visit_invoice:

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.109..0.109 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.060..0.060 rows=1 loops=1)
 Index Cond: (id = 30140470)
 Planning time: 0.156 ms
 Trigger for constraint fk_vis_inv_inv: time=219.122 calls=1
So, if I create explicitly an index for the second column (which is already 
part of the primary key), it seems to use it because the trigger execution is 
really faster:

CREATE INDEX fki_vis_inv_inv
  ON visit_invoice
  USING btree
  (invoice_id);

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.057..0.057 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.039..0.040 rows=1 loops=1)
 Index Cond: (id = 120043571)
 Planning time: 0.074 ms
 Trigger for constraint fk_vis_inv_inv: time=0.349 calls=1
So I have tried to create the primary key differently, like PRIMARY KEY 
(invoice_id, visit_nb), and in that case it is the opposite, the deletion of 
the invoice is very fast and the deletion of the visit is really slower, unless 
I create a specific index as above.
So my question is: why is my index on the primary key not used by both triggers 
and why should I always create an explicit index on the second column ?
Thanks.
Florian

First, It’s a god (for performance) practice to create indexes on FK columns in 
“child” table.
Second, PG is using index only if the first column in concatenated index is 
used in WHERE clause.  That is exactly what you observe.

Regards,
Igor Neyman




Re: [PERFORM] Primary key index partially used

2016-01-26 Thread Igor Neyman

From: Igor Neyman
Sent: Tuesday, January 26, 2016 11:01 AM
To: 'Florian Gossin' <fluanc...@gmail.com>; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Primary key index partially used


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Florian Gossin
Sent: Tuesday, January 26, 2016 10:52 AM
To: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: [PERFORM] Primary key index partially used

Hi all,
I'm using PostgreSQL 9.4.5 and I have a weird issue.
I have the following three tables:

visit
( nb bigint NOT NULL,
  CONSTRAINT visit_pkey PRIMARY KEY (nb)
)
with ~ 750'000 rows

invoice
( id bigint NOT NULL,
  CONSTRAINT invoice_pkey PRIMARY KEY (id)
)
with ~ 3'000'000 rows

visit_invoice
( invoice_id bigint NOT NULL,
  visit_nb bigint NOT NULL,
  CONSTRAINT visit_invoice_pkey PRIMARY KEY (visit_nb, invoice_id),
  CONSTRAINT fk_vis_inv_inv FOREIGN KEY (invoice_id)
  REFERENCES invoice (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT fk_vis_inv_vis FOREIGN KEY (visit_nb)
  REFERENCES visit (nb) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE
)
with ~ 3'000'000 rows

When I delete a row in visit table, it runs the trigger for constraint 
fk_vis_inv_vis and it seems to use the primary key index on visit_invoice:
explain analyze DELETE FROM visit WHERE nb = 213;

 Delete on visit  (cost=0.42..8.44 rows=1 width=6) (actual time=2.225..2.225 
rows=0 loops=1)
   ->  Index Scan using visit_pkey on visit  (cost=0.42..8.44 rows=1 width=6) 
(actual time=2.084..2.088 rows=1 loops=1)
 Index Cond: (nb = 213)
 Planning time: 0.201 ms
 Trigger for constraint fk_vis_inv_vis: time=0.673 calls=1

But when I delete a record in the table invoice, it runs the trigger for 
constraint fk_vis_inv_vis and it doesn't seem to use the primary key index on 
visit_invoice:

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.109..0.109 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.060..0.060 rows=1 loops=1)
 Index Cond: (id = 30140470)
 Planning time: 0.156 ms
 Trigger for constraint fk_vis_inv_inv: time=219.122 calls=1
So, if I create explicitly an index for the second column (which is already 
part of the primary key), it seems to use it because the trigger execution is 
really faster:

CREATE INDEX fki_vis_inv_inv
  ON visit_invoice
  USING btree
  (invoice_id);

explain analyze DELETE FROM invoice WHERE id = 30140470;

 Delete on invoice  (cost=0.43..8.45 rows=1 width=6) (actual time=0.057..0.057 
rows=0 loops=1)
   ->  Index Scan using invoice_pkey on invoice  (cost=0.43..8.45 rows=1 
width=6) (actual time=0.039..0.040 rows=1 loops=1)
 Index Cond: (id = 120043571)
 Planning time: 0.074 ms
 Trigger for constraint fk_vis_inv_inv: time=0.349 calls=1
So I have tried to create the primary key differently, like PRIMARY KEY 
(invoice_id, visit_nb), and in that case it is the opposite, the deletion of 
the invoice is very fast and the deletion of the visit is really slower, unless 
I create a specific index as above.
So my question is: why is my index on the primary key not used by both triggers 
and why should I always create an explicit index on the second column ?
Thanks.
Florian

First, It’s a god (for performance) practice to create indexes on FK columns in 
“child” table.
Second, PG is using index only if the first column in concatenated index is 
used in WHERE clause.  That is exactly what you observe.

Regards,
Igor Neyman

“god” -> good ☺



Re: [PERFORM] Slow 3 Table Join with v bad row estimate

2015-11-10 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of David Osborne
Sent: Tuesday, November 10, 2015 12:32 PM
To: Tom Lane <t...@sss.pgh.pa.us>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow 3 Table Join with v bad row estimate

Ok - wow.

Adding that index, I get the same estimate of 1 row, but a runtime of ~450ms.
A 23000ms improvement.

http://explain.depesz.com/s/TzF8h

This is great. So as a general rule of thumb, if I see a Join Filter removing 
an excessive number of rows, I can check if that condition can be added to an 
index from the same table which is already being scanned.

Thanks for this!

David,
I believe the plan you are posting is the old plan.
Could you please post explain analyze with the index that Tom suggested?

Regards,
Igor Neyman


Re: [PERFORM] One long transaction or multiple short transactions?

2015-10-06 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Carlo
Sent: Monday, October 05, 2015 11:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] One long transaction or multiple short transactions?

We have a system which is constantly importing flat file data feeds into 
normalized tables in a DB warehouse over 10-20 connections. Each data feed row 
results in a single transaction of multiple single row writes to multiple 
normalized tables.

The more columns in the feed row, the more write operations, longer the 
transaction.

Operators are noticing that splitting a single feed of say - 100 columns - into 
two consecutive feeds of 50 columns improves performance dramatically. I am 
wondering whether the multi-threaded and very busy import environment causes 
non-linear performance degradation for longer transactions. Would the operators 
be advised to rewrite the feeds to result in more smaller transactions rather 
than fewer, longer ones?

Carlo



?  over 10-20 connections

How many cores do you have on that machine?
Test if limiting number of simultaneous feeds, like bringing their number down 
to half of your normal connections has the same positive effect.

Regards,
Igor Neyman


Re: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB average

2015-10-05 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Monday, October 05, 2015 2:25 PM
To: FattahRozzaq <ssoor...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB 
average



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of FattahRozzaq
Sent: Monday, October 05, 2015 10:51 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] shared-buffers set to 24GB but the RAM only use 4-5 GB 
average

I have configured postgresql.conf with parameters as below:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_rotation_age = 1d
log_rotation_size = 1024MB
listen_addresses = '*'
checkpoint_segments = 64
wal_keep_segments = 128
max_connections = 
max_prepared_transactions = 
checkpoint_completion_target = 0.9
default_statistics_target = 10
maintenance_work_mem = 1GB
effective_cache_size = 64GB
shared_buffers = 24GB
work_mem = 5MB
wal_buffers = 8MB
port = 40003
pooler_port = 40053
gtm_host = 'node03'
gtm_port = 10053

As you can see, I have set the shared_buffers to 24GB, but my server still only 
use 4-5 GB average.
I have 128GB RAM in a single server.
My database has 2 tables:
- room (3GB size if pg_dump'ed)
- message (17GB if pg_dump'ed)

The backend application is a messaging server, in average there will be 40-180 
connections to the postgres Server.
The traffic is quite almost-heavy.

How to make postgres-xl effectively utilizes the resource of RAM for
 max_connections?


Thanks,
FattahRozzaq


Why are you looking at memory consumption?
Are you experiencing performance problems?

Regards,
Igor Neyman

___

Also,
Postgres-xl has it's own mailing lists:
http://sourceforge.net/p/postgres-xl/mailman/

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server slowing down over time

2015-09-03 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jean Cavallo
Sent: Thursday, August 27, 2015 1:21 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Server slowing down over time

Hi,

I am currently working on a data migration for a client.
The general plan is :
  - Read data from a postgresql database
  - Convert them to the new application
  - Insert in another database (same postgresql instance).

The source database is rather big (~40GB, wo indexes), and the
conversion process takes some time. It is done by multiple workers
on a separate Linux environnement, piece by piece.

When we start the migration, at first it looks good.
Performances are good, and it ran smoothly. After a few hours,
we noticed that things started to slow down. Some queries seemed
to be stuck, so we waited for them to end, and restarted the server.

After that it went well for some time (~10 minutes), then it slowed
down again. We tried again (a few times), and the pattern repeats.

My postgresql specific problem is that it looks like the server gets
stuck. CPU usage is <10%, RAM usage is under 50% max, there is
no noticeable disk usage. But, there are some (<10) active queries,
some of which may take several hours to complete. Those queries
work properly (i.e < 1min) right after the server restarts.

So my question is : What could slow the queries from ~1min to 2hours
which does not involve CPU, Memory, or disk usage, and which would
"reset" when restarting the server ?

For information, the number of processes does not seem to be the
problem, there are ~20 connections with max_connection set to 100.
We noticed at some point that the hard drive holding the target
database was heavily fragmented (100%...), but defrag did not
seem to change anything.

Also, the queries that appear to get stuck are "heavy" queries,
though after a fresh restart they execute in a reasonable time.

Finally, whatever causes the database to wait also causes the
Windows instance to slow down. But restarting Postgresql fixes
this as well.

Configuration :

The Postgresql server runs on a Windows Virtual Machine under
VMWare. The VM has dedicated resources, and the only other
VM on the host is the applicative server (which runs idle while
waiting for the database). There is nothing else running on the
server except postgresql (well, there were other things, but we
stopped everything to no avail).

PostgreSQL 9.3.5, compiled by Visual C++ build 1600, 64-bit
Windows 2008R2 (64 bits)
10 Go RAM
4 vCPU

Host : VMWare ESXi 5.5.0 build-2068190
CPU Intel XEON X5690 3.97GHz
HDD 3x Nearline SAS 15K RAID0

Please let me know if any other information may be useful.

Jean Cavallo


Having 4 CPUs, I’d try to decrease number of connections from ~20 to 8, and see 
if “slowing down” still happens.

Regards,
Igor Neyman



Re: [PERFORM] Index creation running now for 14 hours

2015-08-26 Thread Igor Neyman


From: Tory M Blue [mailto:tmb...@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman iney...@perceptron.com
Cc: pgsql-performance pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Index creation running now for 14 hours



On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com wrote:


From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance 
pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: [PERFORM] Index creation running now for 14 hours

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it 
started this particular tables index creation at 10:16pm and it's still 
running. 1 single core is at 100% (32 core box) and there is almost zero I/O 
activity.

CentOS 6.6


 16398 | clsdb | 25765 |   10 | postgres | slon.remoteWorkerThread_1 | 
10.13.200.232 | |   45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 
2015-08-25 22:16:03.100992-07 | f   | active | select _cls.fini
shTableAfterCopy(143); analyze torque.impressions;
I was wondering if there were underlying tools to see how it's progressing, or 
if there is anything I can do to bump the performance mid creation? Nothing I 
can do really without stopping postgres or slon, but that would start me back 
at square one.

Thanks
Tory


i

Check pg_locks in regards to the table in question.

Regards,
Igor Neyman

thanks Igor I did, but not clear what that is telling me, there are 249 rows in 
there, nothing has a table name , they are all for the PID in the analyze 
torque.impressions line that I listed above pid 25765.

Here is one for an exclusive lock, but what should I be looking for? There are 
no other processes on this box other than slon and this index creation.


 transactionid |  ||  |   ||  
93588453 | |   |  | 4/25823460 | 25765 | ExclusiveL
ock   | t   | f

Thanks
Tory

There are objects OIDs in pg_lock, not names.
Find the OID of the table that you create your index for, and search pg_locks 
for the records referencing your table.
It cannot be that all records in pg_locks are for pid running “analyze”, there 
should be records with pid running your “create index”.
What’s the size of the table you are indexing?
Also, take a look at pg_stat_activity for long running transactions/queries.

Igor Neyman



Re: [PERFORM] Index creation running now for 14 hours

2015-08-26 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance pgsql-performance@postgresql.org
Subject: [PERFORM] Index creation running now for 14 hours

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it 
started this particular tables index creation at 10:16pm and it's still 
running. 1 single core is at 100% (32 core box) and there is almost zero I/O 
activity.

CentOS 6.6


 16398 | clsdb | 25765 |   10 | postgres | slon.remoteWorkerThread_1 | 
10.13.200.232 | |   45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 
2015-08-25 22:16:03.100992-07 | f   | active | select _cls.fini
shTableAfterCopy(143); analyze torque.impressions;
I was wondering if there were underlying tools to see how it's progressing, or 
if there is anything I can do to bump the performance mid creation? Nothing I 
can do really without stopping postgres or slon, but that would start me back 
at square one.

Thanks
Tory


iostat: sdb is the db directory

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.550.000.230.000.00   96.22

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   1.00 0.0012.00  0 24
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.570.000.060.000.00   96.37

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb  21.50 0.00 15484.00  0  30968

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.720.000.060.000.00   96.22

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   2.00 0.0020.00  0 40
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.060.000.050.020.00   95.87

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   4.00 0.0064.00  0128
sdb   3.50 0.00   108.00  0216

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.360.000.030.000.00   96.61

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.410.000.060.000.00   96.53

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.450.000.270.000.00   96.28

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   0.00 0.00 0.00  0  0
sdb   1.00 0.0024.00  0 48

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.500.000.300.000.00   96.20

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   1.50 0.00   344.00  0688
sdb   0.00 0.00 0.00  0  0

Check pg_locks in regards to the table in question.

Regards,
Igor Neyman


Re: [PERFORM] Performance bottleneck due to array manipulation

2015-08-21 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp  m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
  from
public.ims_object_header h
  WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
 )
into object_ids_to_be_invalidated; -- this array has a size of 100

--current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
  from
public.ims_object_header h
 );
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1)  0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.


I think in this case (as is in many other cases) pure SQL does the job much 
better than procedural language:

DELETE FROM public.ims_point as P
WHERE  P.timestamp  m_before_one_hour
 AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH
WHERE OH.last_point_id = 
P.object_id);

Is that what you are trying to accomplish?

Regards,
Igor Neyman






Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, June 19, 2015 11:07 AM
To: Ian Pushee; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:

 Probably events_confidnce index is not very selective, that's why optimizer 
 prefers seq scan.
 I'd try to create an index on (name, eventspy_id, camera_id, type, status).

 Also, the recent 9.2 is 9.2.13, you should upgrade.

 Regards,
 Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman

---

Oops, should be at least:

effective_cache_size = 5120MB

on dedicated server.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:

 Probably events_confidnce index is not very selective, that's why optimizer 
 prefers seq scan.
 I'd try to create an index on (name, eventspy_id, camera_id, type, status).

 Also, the recent 9.2 is 9.2.13, you should upgrade.

 Regards,
 Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: Irineu Ruiz [mailto:iri...@rassystem.com.br]
Sent: Thursday, June 18, 2015 2:18 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to calculate statistics for one column

SELECT COUNT(DISTINCT id_camada) FROM … equals
349

And it doesn't change significantly over time.

[]'s

2015-06-18 15:16 GMT-03:00 Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com:


From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



Then, I’d think that’s approximately your statistics target.

Regards,
Igor Neyman


Re: [PERFORM] How to calculate statistics for one column

2015-06-18 Thread Igor Neyman


From: Irineu Ruiz [mailto:iri...@rassystem.com.br]
Sent: Thursday, June 18, 2015 3:10 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to calculate statistics for one column

I din't understood.

In this case, my statistics target should be  approximately 349?
I already try this range but didn't work.

It's only work when I put 900 in my statistics.

There is some kind of formula to calculate a good statistics for a column like 
this?



2015-06-18 15:24 GMT-03:00 Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com:


From: Irineu Ruiz 
[mailto:iri...@rassystem.com.brmailto:iri...@rassystem.com.br]
Sent: Thursday, June 18, 2015 2:18 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to calculate statistics for one column

SELECT COUNT(DISTINCT id_camada) FROM … equals
349

And it doesn't change significantly over time.

[]'s

2015-06-18 15:16 GMT-03:00 Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com:


From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Irineu Ruiz
Sent: Thursday, June 18, 2015 1:53 PM
To: pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: [PERFORM] How to calculate statistics for one column

Hi,

I have a table with irregular distribution based in a foreign key, like you can 
see in the end of the e-mail.

Sometimes, in simples joins with another tables with the same id_camada (but 
not the table owner of the foreign key, the planner chooses a seq scan instead 
of use the index with id_camada.
If I do the join using also de table owner of the foreign key, then the index 
is used.

In the first case, querys with seq scan tahe about 30 seconds and with the 
index take about 40 ms.

When I increase the statistics of the column id_camada to 900, then everything 
works using the index in both cases.
My doubt is: there is a way to discovery the best statistics number for this 
column or is a process of trial and error?

id_camada;count(*)
123;10056782
83;311471
42;11316
367;5564
163;3362
257;2100
89;1725
452;1092
157;904
84;883
233;853
271;638
272;620
269;548
270;485
455;437
255;427
32;371
39;320
31;309
411;291
91;260
240;251
162;250
444;247
165;227
36;215
236;193
54;185
53;175
76;170
412;153
159;140
160;139
105;130
59;117
60;117
267;115
238;112
279;111
465;111
5;107
74;103
243;98
35;96
68;82
400;78
391;75
49;74
124;68
73;66
260;64
66;62
168;60
172;56
4;54
44;54
384;53
237;53
390;52
234;52
387;51
378;51
148;50
64;50
379;47
56;46
52;46
377;46
443;46
253;45
97;45
280;43
77;43
2;40
376;39
45;38
235;36
231;36
413;36
241;36
232;34
388;32
101;32
249;32
99;32
100;32
69;32
125;31
166;30
65;29
433;29
149;28
96;27
71;27
98;26
67;26
386;25
50;24
21;24
122;24
47;24
291;22
287;22
404;22
70;22
48;21
63;21
153;18
13;18
46;18
262;18
43;17
72;17
161;17
344;15
29;15
439;14
104;14
119;13
456;12
434;12
55;10
3;10
345;10
286;10
15;10
141;9
169;9
258;9
18;9
158;9
14;8
94;8
463;8
218;8
92;8
170;8
58;7
17;7
19;7
6;7
414;7
10;7
7;7
22;7
90;6
430;6
27;6
195;6
16;6
223;6
11;6
242;6
9;6
26;5
57;5
82;5
451;5
61;5
8;5
445;5
140;5
431;5
197;5
20;5
362;5
24;5
385;4
23;4
25;4
62;4
134;4
150;4
215;4
217;4
219;4
220;4
222;4
224;4
244;4
284;4
318;4
389;4
415;4
449;4
461;4
93;3
209;3
136;3
299;3
188;3
319;3
264;3
95;3
337;3
1;3
221;3
310;3
143;2
320;2
321;2
322;2
324;2
210;2
302;2
438;2
303;2
239;2
330;2
196;2
447;2
332;2
333;2
334;2
307;2
308;2
309;2
340;2
341;2
171;2
190;2
313;2
193;2
154;2
294;2
295;2
250;2
144;2
311;1
312;1
314;1
315;1
316;1
317;1
51;1
323;1
325;1
326;1
327;1
328;1
329;1
331;1
335;1
336;1
338;1
339;1
342;1
343;1
186;1
185;1
354;1
355;1
356;1
357;1
359;1
360;1
361;1
184;1
363;1
364;1
366;1
183;1
369;1
370;1
182;1
181;1
180;1
179;1
380;1
381;1
382;1
383;1
178;1
177;1
176;1
174;1
30;1
173;1
392;1
393;1
155;1
405;1
407;1
409;1
151;1
145;1
12;1
425;1
138;1
135;1
103;1
435;1
437;1
102;1
440;1
441;1
442;1
80;1
448;1
28;1
226;1
227;1
228;1
230;1
225;1
214;1
216;1
213;1
212;1
211;1
208;1
207;1
206;1
78;1
245;1
205;1
204;1
254;1
203;1
202;1
201;1
200;1
199;1
265;1
198;1
268;1
194;1
192;1
273;1
274;1
275;1
278;1
191;1
282;1
75;1
285;1
189;1
288;1
289;1
290;1
187;1
293;1
296;1
297;1
300;1
304;1
305;1
306;1

--

So what’s the result of:

SELECT COUNT(DISTINCT id_camada) FROM …

Does it change significantly over time?

Regards,
Igor Neyman



Then, I’d think that’s approximately your statistics target.

Regards,
Igor Neyman

Well, check if information in pg_stats for your table is correct:

http://www.postgresql.org/docs/9.4/static/view-pg-stats.html

Regards,
Igor Neyman


Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

2015-06-18 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Thursday, June 18, 2015 3:19 PM
To: k...@rice.edu; Will Platnick
Cc: pgsql-performance@postgresql.org; pgsql-gene...@postgresql.org
Subject: Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

Hi Ken/ Will

  I have checked the ulimit value and we are nowhere hitting the max 4096 that 
we have currently set. Is there any other explanation why we should be thinking 
of bumping it to like ulimit -n 5 ( Add ulimit -n 5 to the start of 
whatever you use to start pgbouncer (init script, etc..)) even though we are 
not reaching 4096 max value

Regards
Prabhjot Singh



-Original Message-
From: k...@rice.edu [mailto:k...@rice.edu] 
Sent: Thursday, June 18, 2015 11:10 AM
To: Sheena, Prabhjot
Cc: pgsql-performance@postgresql.org; pgsql-gene...@postgresql.org
Subject: Re: [PERFORM] PGBOUNCER ISSUE PLEASE HELP(Slowing down the site)

On Thu, Jun 18, 2015 at 05:41:01PM +, Sheena, Prabhjot wrote:
 Here is the output of OS limits
 
 postgres@symds-pg:~ $ ulimit -a
 
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 790527
 max locked memory   (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files  (-n) 4096
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 16384
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 
 
 Thanks
 Prabhjot
 

I would bump your open files as was suggested in your pgbouncer start script.

Regards,
Ken

---

Why are you so sure that it is PgBouncer causing slowness?

You, said, bouncer pool_size is set to 250.  How many cores do you have on your 
db server?

Also, why are you running bouncer on a separate machine?  It is very 
light-weight, so running it on the db server wouldn't require much additional 
resource, but will eliminate some network traffic that you have with the 
current configuration.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Friday, June 05, 2015 1:55 PM
To: pgsql-gene...@postgresql.org; pgsql-performance@postgresql.org
Subject: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) 
version

Postgresql 9.3 Version

Guys
  Here  is the issue that I'm facing for couple of weeks now. I have 
table (size  7GB)

If I run this query with this specific registration id it is using the wrong 
execution plan and takes more than a minute to complete. Total number of rows 
for this registration_id is only 414 in this table

explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8718704208 AND response != 4;

QUERY PLAN
--
Result  (cost=2902.98..2903.01 rows=1 width=0) (actual 
time=86910.730..86910.731 rows=1 loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.57..2902.98 rows=1 width=8) (actual 
time=86910.725..86910.725 rows=1 loops=1)
   -  Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)
 Index Cond: (last_update_date IS NOT NULL)
 Filter: ((response  4) AND (registration_id = 
8718704208::bigint))
 Rows Removed by Filter: 52145434
Total runtime: 86910.766 ms


Same query with any other registration id will come back in milli seconds



explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8688546267 AND response != 4;
  QUERY PLAN
--
Aggregate  (cost=529.75..529.78 rows=1 width=8) (actual time=19.723..19.723 
rows=1 loops=1)
   -  Index Scan using btdt_responses_u2 on btdt_responses  (cost=0.57..529.45 
rows=119 width=8) (actual time=0.097..19.689 rows=72 loops=1)
 Index Cond: (registration_id = 8688546267::bigint)
 Filter: (response  4)
 Rows Removed by Filter: 22
Total runtime: 19.769 ms


Please let me know what I can do to fix this issue.


Thanks


Not enough info.
Table structure? Is registration_id - PK?  If not, what is the distribution of 
the values for this table?
When was it analyzed last time?  M.b. you need to increase statistics target 
for this table:

Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)

It expects 2214 records while really getting only 1.

Regards,
Igor Neyman





[PERFORM] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Igor Neyman


From: Sheena, Prabhjot [mailto:prabhjot.si...@classmates.com]
Sent: Friday, June 05, 2015 2:38 PM
To: Igor Neyman; pgsql-gene...@postgresql.org; pgsql-performance@postgresql.org
Subject: RE: Query running slow for only one specific id. (Postgres 9.3) version

When I run vacuum analyze it fixes the problem but after 1 or 2 days the 
problem comes back

Here is the table structure

  Column  |Type |  
Modifiers   | Storage | Stats target | Description
--+-+--+-+--+-
response_id  | integer | not null default 
nextval('btdt_responses_response_id_seq'::regclass) | plain   |  |
registration_id  | bigint  | not null   
  | plain   |  |
btdt_id  | integer | not null   
  | plain   |  |
response | integer | not null   
  | plain   |  |
creation_date| timestamp without time zone | not null default now() 
  | plain   |  |
last_update_date | timestamp without time zone | not null default now() 
  | plain   |  |
Indexes:
btdt_responses_pkey PRIMARY KEY, btree (response_id)
btdt_responses_u2 UNIQUE, btree (registration_id, btdt_id)
btdt_responses_n1 btree (btdt_id)
btdt_responses_n2 btree (btdt_id, response)
btdt_responses_n4 btree (creation_date)
btdt_responses_n5 btree (last_update_date)
btdt_responses_n6 btree (btdt_id, last_update_date)
Foreign-key constraints:
btdt_responses_btdt_id_fkey FOREIGN KEY (btdt_id) REFERENCES 
btdt_items(btdt_id)
btdt_responses_fk1 FOREIGN KEY (btdt_id) REFERENCES btdt_items(btdt_id)
Has OIDs: no
Options: autovacuum_enabled=true, autovacuum_vacuum_scale_factor=0.02, 
autovacuum_analyze_scale_factor=0.02

Thanks

From: Igor Neyman [mailto:iney...@perceptron.com]
Sent: Friday, June 5, 2015 11:06 AM
To: Sheena, Prabhjot; 
pgsql-gene...@postgresql.orgmailto:pgsql-gene...@postgresql.org; 
pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: RE: Query running slow for only one specific id. (Postgres 9.3) version



From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Sheena, Prabhjot
Sent: Friday, June 05, 2015 1:55 PM
To: pgsql-gene...@postgresql.orgmailto:pgsql-gene...@postgresql.org; 
pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org
Subject: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) 
version

Postgresql 9.3 Version

Guys
  Here  is the issue that I'm facing for couple of weeks now. I have 
table (size  7GB)

If I run this query with this specific registration id it is using the wrong 
execution plan and takes more than a minute to complete. Total number of rows 
for this registration_id is only 414 in this table

explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8718704208 AND response != 4;

QUERY PLAN
--
Result  (cost=2902.98..2903.01 rows=1 width=0) (actual 
time=86910.730..86910.731 rows=1 loops=1)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.57..2902.98 rows=1 width=8) (actual 
time=86910.725..86910.725 rows=1 loops=1)
   -  Index Scan Backward using btdt_responses_n5 on btdt_responses  
(cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 
rows=1 loops=1)
 Index Cond: (last_update_date IS NOT NULL)
 Filter: ((response  4) AND (registration_id = 
8718704208::bigint))
 Rows Removed by Filter: 52145434
Total runtime: 86910.766 ms


Same query with any other registration id will come back in milli seconds



explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8688546267 AND response != 4;
  QUERY PLAN
--
Aggregate  (cost=529.75..529.78 rows=1 width=8) (actual time=19.723..19.723 
rows=1 loops=1)
   -  Index Scan using btdt_responses_u2 on btdt_responses  (cost=0.57..529.45

Re: [PERFORM] Poor performance when deleting from entity-attribute-value type master-table

2015-02-09 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andreas Joseph 
Krogh
Sent: Monday, February 09, 2015 4:13 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Poor performance when deleting from entity-attribute-value 
type master-table

Hi all.

Using PG-9.4.0 I'm seeing this trying to delete from an entity-master table:

*# explain analyze delete from onp_crm_entity where entity_id IN (select 
tmp.delivery_id from temp_delete_delivery_id tmp);
QUERY PLAN
---
 Delete on onp_crm_entity  (cost=0.43..5673.40 rows=1770 width=12) (actual 
time=7.370..7.370 rows=0 loops=1)
   -  Nested Loop  (cost=0.43..5673.40 rows=1770 width=12) (actual 
time=0.050..1.374 rows=108 loops=1)
 -  Seq Scan on temp_delete_delivery_id tmp  (cost=0.00..27.70 
rows=1770 width=14) (actual time=0.014..0.080 rows=108 loops=1)
 -  Index Scan using onp_crm_entity_pkey on onp_crm_entity  
(cost=0.43..3.18 rows=1 width=14) (actual time=0.010..0.011 rows=1 loops=108)
   Index Cond: (entity_id = tmp.delivery_id)
 Planning time: 0.314 ms
 Trigger for constraint onp_crm_activity_entity_id_fkey: time=4.141 calls=108
 Trigger for constraint ...
 Trigger for constraint ...
 Trigger for constraint ...


I have lots of tables referencing onp_crm_entity(entity_id) so I expect the 
poor performance of deleting from it is caused by all the triggers firing to 
check FKI-constraints.


Andreas, do you have indexes on FK columns in child tables?
If not – there is your problem.

Regards,
Igor Neyman



Re: [PERFORM] Copy command Faster than original select

2015-02-06 Thread Igor Neyman
I think, it is the difference between writing 43602 records into the file and 
displaying 43602 records on screen.
If you wrap up your select into select count(a.*) from your select, e.g.:

Select count(a.*) from (select ... from mytable join .. join ... order by ) 
as a;

This will exclude time to display all these rows, so you'll get the same (or 
better) performance as with copy into text file, which will prove this theory.

Regards,
Igor Neyman

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of belal
Sent: Friday, February 06, 2015 3:31 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Copy command Faster than original select

I made complex select using PGAdmin III Query Editor, Postgre server 9.3


select ... from mytable join .. join ... order by 

I get [Total query runtime: 8841 ms. 43602 rows retrieved.]

but when I use 

copy ([same above select]) to '/x.txt' 
I get [Query returned successfully: 43602 rows affected, 683 ms execution time.]

these test made on the same machine as the postgresql server.


can anyone explain huge difference in executing time?

best regards all 



--
View this message in context: 
http://postgresql.nabble.com/Copy-command-Faster-than-original-select-tp5836886.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] extremly bad select performance on huge table

2014-10-22 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn Wittich
Sent: Wednesday, October 22, 2014 1:06 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Hi Igor,

that was also my assumption, but unfortunately this isn't true.
I am using the explain analyze.

Example which is fast explain analyze select value from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn

130 - 140 sec

Example which is fast explain analyze select value,c1 from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn


does not complete after several hours although the c1 coulmn should only be 
relevant for retrieval.

Comparing the explain comparison of both statements gave me a hint:

adding the c1 column changes the query planner to make a sequential scan on 
myhugetable as well as on smallertable. This is much slower.

When I set enable_seqscan=false the queryplanner shows the same query plan for 
both statements but the statement including the c1 column does not complete 
after several hours.

How can this be explained?

I do not want the db server to prepare the whole query result at once, my 
intention is that the asynchronous retrieval starts as fast as possible.

Thanks
Björn





 -Original Message-
 From: pgsql-performance-ow...@postgresql.org 
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn 
 Wittich
 Sent: Tuesday, October 21, 2014 3:32 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] extremly bad select performance on huge table

 Hi Tom and Igor,

 thanks for your help. With the reindex the select query running time was 
 reduced from 5200 sec to 130 sec. Impressive!

 Even a join on this table is now fast.

 Unfortunately, there is now another problem: The table in my example 
 has
 500 columns which I want to retrieve with my join command.

 Example which is fast select value from smallertable inner join myhugetable 
 ON smallertable.mycolumn = myhugetable.mycolumn

 Example which is slow select value,c1,c2,c3,...,c10 from smallertable inner 
 join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn


 Which is the number of columns to fetch so bad ? Which action is done 
 in the db system when querying this via pgadmin? I think that there is 
 no real retrieval included, why is the number of additional columns so 
 bad for the join performance?

 =?ISO-8859-15?Q?Bj=F6rn_Wittich?= bjoern_witt...@gmx.de writes:
 Here is the explain (analyze,buffers) select mycolumn from 
 myhugetable Index Only Scan using myprimkey on myhugetable  
 (cost=0.00..822.82
 rows=71768080 width=33) (actual time=16.722..2456300.778 
 rows=71825999 loops=1)
   Heap Fetches: 356861
   Buffers: shared hit=71799472 read=613813
 Total runtime: 2503009.611 ms
 So that works out to about 4 msec per page fetched considering only 
 I/O costs, which is about as good as you're likely to get if the data 
 is sitting on spinning rust.

 You could potentially make it faster with a VACUUM (to mark all pages 
 all-visible and eliminate the heap fetches costs), or a REINDEX (so 
 that the index scan becomes more nearly sequential instead of random 
 access).  However, unless the data is nearly static those will just 
 be temporary fixes: the time will degrade again as you update the table.

 Note: This select is just for testing. My final statement will be a 
 join on this table via the mycolumn column.
 In that case it's probably a waste of time to worry about the 
 performance of this query as such.  In the first place, a join is not 
 likely to use the index at all unless it's fetching a relatively 
 small number of rows, and in the second place it seems unlikely that 
 the join query can use an IndexOnlyScan on this index --- I imagine 
 that the purpose of the join will require fetching additional columns.

  regards, tom lane


 Björn,

 I think, the timing difference you see between 2 queries is caused by 
 delivering to the front-end (PgAdmin) and displaying all additional columns 
 that you include in the second query (much bigger amount of data to pass from 
 the db to the client).
 Pretty sure, if you do explain analyze on both queries, you'll see the same 
 timing, because it'll reflect only db time without what's spent on delivering 
 data to the client.

 Regards,
 Igor Neyman





Okay,

So, REINDEX helped with original query, which execution plan used Index Only 
Scan, if I remember correctly, since you asked only for the column in PK index.
Now, when you add some other column which is not in the index, it switches to 
Sequential Scan.
So, check the bloat on the table.  May be performance could be improved if you 
VACUUM bloated table.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes

Re: [PERFORM] extremly bad select performance on huge table

2014-10-21 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Björn Wittich
Sent: Tuesday, October 21, 2014 3:32 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Hi Tom and Igor,

thanks for your help. With the reindex the select query running time was 
reduced from 5200 sec to 130 sec. Impressive!

Even a join on this table is now fast.

Unfortunately, there is now another problem: The table in my example has
500 columns which I want to retrieve with my join command.

Example which is fast select value from smallertable inner join myhugetable ON 
smallertable.mycolumn = myhugetable.mycolumn

Example which is slow select value,c1,c2,c3,...,c10 from smallertable inner 
join myhugetable ON smallertable.mycolumn = myhugetable.mycolumn


Which is the number of columns to fetch so bad ? Which action is done in 
the db system when querying this via pgadmin? I think that there is no 
real retrieval included, why is the number of additional columns so bad 
for the join performance?

 =?ISO-8859-15?Q?Bj=F6rn_Wittich?= bjoern_witt...@gmx.de writes:
 Here is the explain (analyze,buffers) select mycolumn from myhugetable
 Index Only Scan using myprimkey on myhugetable  (cost=0.00..822.82
 rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999
 loops=1)
   Heap Fetches: 356861
   Buffers: shared hit=71799472 read=613813
 Total runtime: 2503009.611 ms
 So that works out to about 4 msec per page fetched considering only I/O
 costs, which is about as good as you're likely to get if the data is
 sitting on spinning rust.

 You could potentially make it faster with a VACUUM (to mark all pages
 all-visible and eliminate the heap fetches costs), or a REINDEX
 (so that the index scan becomes more nearly sequential instead of random
 access).  However, unless the data is nearly static those will just be
 temporary fixes: the time will degrade again as you update the table.

 Note: This select is just for testing. My final statement will be a join
 on this table via the mycolumn column.
 In that case it's probably a waste of time to worry about the performance
 of this query as such.  In the first place, a join is not likely to use
 the index at all unless it's fetching a relatively small number of rows,
 and in the second place it seems unlikely that the join query can use
 an IndexOnlyScan on this index --- I imagine that the purpose of the join
 will require fetching additional columns.

   regards, tom lane



Björn,

I think, the timing difference you see between 2 queries is caused by 
delivering to the front-end (PgAdmin) and displaying all additional columns 
that you include in the second query (much bigger amount of data to pass from 
the db to the client).
Pretty sure, if you do explain analyze on both queries, you'll see the same 
timing, because it'll reflect only db time without what's spent on delivering 
data to the client.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with large number of joins

2014-10-21 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Marco Di Cesare
Sent: Tuesday, October 21, 2014 4:03 PM
To: Andrew Dunstan; Merlin Moncure
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query with large number of joins


On 10/21/2014 12:31 PM, Andrew Dunstan wrote:
 Please don't top-post on the PostgreSQL lists. See 
 http://idallen.com/topposting.html

Oops, sorry.

Have you tried a) either turning off geqo or setting geqo_threshold 
fairly high b) setting join_collapse_limit fairly high (assuming all the above 
join targets are tables and not views, setting it to something like 25 should 
do the trick.

I did try various combinations of these settings but none yielded any 
significant query run time improvements. 

 You also haven't told us what settings you have for things like 
 effective_cache_size, which can dramatically affect query plans.

effective_cache_size = 4096MB

I tried bumping this up as well but again no significant query run time 
improvements.  



Marco,

Didn't you mention, that you have something like 48GB RAM?
In this case (if that's dedicated db server), you should try and set 
effective_cache_size around 40GB (not 4GB).

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
j...@agliodbs.commailto:j...@agliodbs.com wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
 I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
 work_mem and partitions interact.

 https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
 The above wiki states that if a query involves doing merge sorts of 8
 tables, that requires 8 times work_mem. If I have a table that is
 partitioned does each partition count as a table and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Igor Neyman


Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: Dave Johansen [mailto:davejohan...@gmail.com]
Sent: Wednesday, October 15, 2014 4:20 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:08 PM, Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com wrote:


From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
j...@agliodbs.commailto:j...@agliodbs.com wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
 I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
 work_mem and partitions interact.

 https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
 The above wiki states that if a query involves doing merge sorts of 8
 tables, that requires 8 times work_mem. If I have a table that is
 partitioned does each partition count as a table and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Sorry I don't understand what you mean by that. My understanding is that RedHat 
maintains fixes for security and other major issues for packages that have been 
EOLed. Are you implying that that's not the case? Or something else?

I don’t think that RedHat can maintain Postgres version which was EOLed.
Postgres 8.4 is not supported by PostgreSQL community.

Igor Neyman




Re: [PERFORM] Partitions and work_mem?

2014-10-15 Thread Igor Neyman


From: Dave Johansen [mailto:davejohan...@gmail.com]
Sent: Wednesday, October 15, 2014 4:49 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:36 PM, Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com wrote:


From: Dave Johansen 
[mailto:davejohan...@gmail.commailto:davejohan...@gmail.com]
Sent: Wednesday, October 15, 2014 4:20 PM
To: Igor Neyman
Cc: Josh Berkus; pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 1:08 PM, Igor Neyman 
iney...@perceptron.commailto:iney...@perceptron.com wrote:


From: 
pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org
 
[mailto:pgsql-performance-ow...@postgresql.orgmailto:pgsql-performance-ow...@postgresql.org]
 On Behalf Of Dave Johansen
Sent: Wednesday, October 15, 2014 4:05 PM
To: Josh Berkus
Cc: pgsql-performance
Subject: Re: [PERFORM] Partitions and work_mem?

On Wed, Oct 15, 2014 at 10:10 AM, Josh Berkus 
j...@agliodbs.commailto:j...@agliodbs.com wrote:
On 10/14/2014 10:08 AM, Dave Johansen wrote:
 I'm running Postgres 8.4 on RHEL 6 64-bit and I had a question about how
 work_mem and partitions interact.

 https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem
 The above wiki states that if a query involves doing merge sorts of 8
 tables, that requires 8 times work_mem. If I have a table that is
 partitioned does each partition count as a table and get its on work_mem?

In theory, this could happen.  In practice, based on tests I did at Sun
with DBT3 and 8.3, no backend ever used more than 3X work_mem.  This is
partly because the level of parallelism in postgres is extremely
limited, so we can't actually sort 8 partitions at the same time.

Thanks for the feedback. That's very helpful.

BTW, 8.4 is EOL.  Maybe time to upgrade?

RHEL 6 isn't EOLed and we're working on moving to RHEL 7 but it's a slow 
process that will probably take quite a bit of time, if it ever happens.


Postgres 8.4 is EOL (RHEL).

Sorry I don't understand what you mean by that. My understanding is that RedHat 
maintains fixes for security and other major issues for packages that have been 
EOLed. Are you implying that that's not the case? Or something else?

I don’t think that RedHat can maintain Postgres version which was EOLed.
Postgres 8.4 is not supported by PostgreSQL community.

This conversation has probably become a bit off topic, but my understanding is 
that what you're paying RedHat for is a stable platform for a long period of 
time. That means creating/backporting of fixes for security and other critical 
issues for packages that have been EOLed.
Assuming the above is true, (which I beleve to be the case 
https://access.redhat.com/support/policy/updates/errata ), I don't see what 
would prevent RedHat from making a patch and applying it to the latest 8.4 
release to resolve any newly discovered issues. Isn't that the whole point of 
open source and RedHat being able to do with the code what it wishes as long as 
it meets the requirements of the license? So are you claiming that RedHat 
doesn't/won't do this? Is incapable of doing this? Or am I missing something?


Tom Lane is probably better authority on this issue.
Let’s wait and see what he says.




Re: [PERFORM] Possible performance regression in PostgreSQL 9.2/9.3?

2014-06-05 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Linos
 Sent: Wednesday, June 04, 2014 6:10 PM
 To: Merlin Moncure
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Possible performance regression in PostgreSQL
 9.2/9.3?
 
 On 04/06/14 22:57, Merlin Moncure wrote:
  On Wed, Jun 4, 2014 at 2:58 PM, Linos i...@linos.es wrote:
  On 04/06/14 21:36, Merlin Moncure wrote:
  On Wed, Jun 4, 2014 at 8:56 AM, Linos i...@linos.es wrote:
  Hello,
 
  Some days ago I upgraded from 8.4 to 9.3, after the upgrade some
 queries started performing a lot slower, the query I am using in this example
 is pasted here:
 
  http://pastebin.com/71DjEC21
 
 
  Considering it is a production database users are complaining because
 queries are much slower than before, so I tried to downgrade to 9.2 with the
 same result as 9.3, I finally restored the database on 8.4 and the query is as
 fast as before.
 
  All this tests are done on Debian Squeeze with 2.6.32-5-amd64 kernel
 version, the hardware is Intel Xeon E5520, 32Gb ECC RAM, the storage is
 software RAID 10 with 4 SEAGATE ST3146356SS SAS drives.
 
  postgresql.conf:
  max_connections = 250
  shared_buffers = 6144MB
  temp_buffers = 8MB
  max_prepared_transactions = 0
  work_mem = 24MB
  maintenance_work_mem = 384MB
  max_stack_depth = 7MB
  default_statistics_target = 150
  effective_cache_size = 24576MB
 
 
  9.3 explain:
  http://explain.depesz.com/s/jP7o
 
  9.3 explain analyze:
  http://explain.depesz.com/s/6UQT
 
  9.2 explain:
  http://explain.depesz.com/s/EW1g
 
  8.4 explain:
  http://explain.depesz.com/s/iAba
 
  8.4 explain analyze:
  http://explain.depesz.com/s/MPt
 
  It seems to me that the total estimated cost went too high in 9.2 and
 9.3 but I am not sure why, I tried commenting out part of the query and
 disabling indexonlyscan but still I have very bad timings and estimates.
 
  The dump file is the same for all versions and after the restore process
 ended I did vacuum analyze on the restored database in all versions.
  http://www.postgresql.org/mailpref/pgsql-performance
  The rowcount estimates are garbage on all versions so a good
  execution plan can be chalked up to chance.  That being said, it
  seems like we're getting an awful lot of regressions of this type
  with recent versions.
 
  Can you try re-running this query with enable_nestloop and/or
  enable_material disabled? (you can disable them for a particular
  session via: set enable_material = false;) .   This is a ghetto fix
  but worth trying.  If it was me, I'd be simplifying and optimizing
  the query.
 
  merlin
 
 
  Much better with this options set to false, thank you Merlin, even
  better than 8.4
 
  9.3 explain analyze with enable_nestloop and enable_material set to
 false.
  http://explain.depesz.com/s/94D
 
  The thing is I have plenty of queries that are now a lot slower than 
  before,
 this is only one example. I would like to find a fix or workaround.
 
  I can downgrade to 9.1, I didn't try on 9.1 but it's the first version that
 supports exceptions inside plpython and I would like to use them. Do you
 think this situation would be better on 9.1?
 
  Or maybe can I disable material and nestloop on postgresql.conf? I
 thought was bad to trick the planner but given this strange behavior I am not
 sure anymore.
 
  I would against advise adjusting postgresql.conf.  nestloops often
  give worse plans than other choices but can often give the best plan,
  sometimes by an order of magnitude or more.  planner directives should
  be considered a 'last resort' fix and should generally not be changed
  in postgresql.conf.  If i were in your shoes, I'd be breaking the
  query down and figuring out where it goes off the rails.   Best case
  scenario, you have a simplified, test case reproducible reduction of
  the problem that can help direct changes to the planner.  In lieu of
  that, I'd look at this as a special case optimization of problem
  queries.
 
  There is something else to try.  Can you (temporarily) raise
  join_collapse_limit higher (to, say 20), and see if you get a better
  plan (with and without other planner adjustments)?
 
  merlin
 
 
 
 This is the plan with join_collapse_limit=20, enable_nestloop=false,
 enable_material=false:
 http://explain.depesz.com/s/PpL
 
 The plan with join_collapse_limit=20 but nestloops and enable_material true
 is taking too much time, seems to have the same problem as with
 join_collapse_limit=8.
 
 I will try to create a simpler reproducible example, thank you.
 
 Regards,
 Miguel Angel.
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Usually, when I increase join_collapse_limit, I also increase 
from_collaps_limit (to the same value).

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql

Re: [PERFORM] Seqscan on big table, when an Index-Usage should be possible

2014-06-05 Thread Igor Neyman
 Vorgabewert 0
  Indexe:
  measurementstype_pkey PRIMARY KEY, btree (measurementstypeid)
  measurementstype_datatype btree (datatype)
  measurementstype_name btree (name)
  measurementstype_step btree (step)
  measurementstype_stepdescription btree (stepdescription)
 
 

You don't tell: 
- what kind of hardware (specifically, how much RAM) you are using
- what are your config settings: shared_buffers, work_mem, 
effective_cache_size

All this affects planner decisions, when choosing one (or another) execution 
path/plan.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Issue with query scanning through all data even with indexes

2014-01-16 Thread Igor Neyman
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kai Sellgren
Sent: Thursday, January 09, 2014 4:37 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Issue with query scanning through all data even with indexes

Hi,

I have a query that has each field used in conditions + sort indexed, but it 
scans through all data.

The query in question looks like:

http://pastie.org/8618562

I have each of those condition fields indexed:

NewsArticle.groupId
NewsArticle.sharedToCommunityIds
NewsArticle.sourceFilterIds
CommunityGroupLink.communityId
CommunityGroupLink.groupId
SourceFilter.groupId
SourceFilter.communityId

This is the data output for explain http://d.pr/i/VGT3

And in visual http://d.pr/i/mqiN

Line 7 says rows=99173 which makes it real slow (it can take up to a minute to 
run).

Do you have any ideas? All of them are appreciated!

Cheers,

--
Yours sincerely,
Kai Sellgren


Could you try to move WHERE clause conditions into JOIN conditions, something 
like this:

SELECT NewsArticle.id
FROM NewsArticle
LEFT JOIN CommunityGroupLink ON CommunityGroupLink.communityId = 1538 AND 
(CommunityGroupLink.groupId = NewsArticle.groupId)
  AND((1538 = ANY (NewsArticle.sharedToCommunityIds) OR 
(CommunityGroupLink.id IS NOT NULL)))
LEFT JOIN SourceFilter ON SourceFilter.communityId = 1538 AND 
SourceFilter.groupId = NewsArticle.groupId
  AND((SourceFilter.id IS NULL OR SourceFilter.id = 
ANY(NewsArticle.sourceFilterIds)));


Not sure what you do with LIMIT 35 - it's not shown in explain plan.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-05 Thread Igor Neyman


From: Caio Casimiro [mailto:casimiro.lis...@gmail.com]
Sent: Monday, November 04, 2013 4:33 PM
To: Igor Neyman
Cc: Jeff Janes; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

These are the parameters I have set in postgresql.conf:

work_mem = 128MB
shared_buffers = 1GB
maintenance_work_mem = 1536MB
fsync = off
synchronous_commit = off
effective_cache_size = 2GB

The hardware is a modest one:
CPU: Intel(R) Atom(TM) CPU  230   @ 1.60GHz
RAM: 2GB
HD: 1TV 7200 RPM (WDC WD10EZEX-00RKKA0)

This machine runs a slackware 14.0 dedicated to the Postgresql.

Thank you,
Caio
With just 2GB RAM, this:

shared_buffers = 1GB

and this:

effective_cache_size = 2GB

is too high.

You should lower those:

shared_buffers = 256MB
effective_cache_size = 1GB

and see how your execution plan changes.

Oh, and this:
maintenance_work_mem = 1536MB

is also too high.
Turning off fsync and synchronous_commit is not very good idea.

Regards,
Igor Neyman



Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-04 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Caio Casimiro
Sent: Monday, November 04, 2013 3:44 PM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Thank you very much for your answers guys!

On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro casimiro.lis...@gmail.com wrote:
Hello all,

I have one query running at ~ 7 seconds and I would like to know if it's 
possible to make it run faster, once this query runs lots of time in my 
experiment.


Do you mean you want it to be fast because it runs many times, or that you want 
it to become fast after it runs many times (i.e. once the data is fully 
cached)?  The plan you show takes 24 seconds, not 7 seconds.

I want it to be fast because it runs many times. I have an experiment that 
evaluates recommendation algorithms  for a set of twitter users. This query 
returns recommendation candidates so it is called a lot of times for different 
users and time intervals.
 
 

Basically the query return the topics of tweets published by users that the 
user N follows and that are published between D1 and D2.

Query:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
            WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
            (SELECT followed_id FROM relationship WHERE follower_id = N) ORDER 
BY tt.tweet_id;


I don't know if this affects the plan at all, but it is silly to do a left join 
to tweet when the WHERE clause has conditions that can't be satisfied with a 
null row.  Also, you could try changing the IN-list to an EXISTS subquery.

I'm sorry the ignorance, but I don't understand the issue with the left join, 
could you explain more?
...
Thank you very much again!
Caio


Just try the following:

SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt  JOIN tweet AS t ON (tt.tweet_id = t.id
  AND t.creation_time BETWEEN 
'D1' AND 'D2' AND t.user_id in
 (SELECT followed_id FROM relationship 
WHERE follower_id = N))
 ORDER BY tt.tweet_id;

And see if it helps with performance.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

2013-11-04 Thread Igor Neyman
From: Caio Casimiro [mailto:casimiro.lis...@gmail.com] 
Sent: Monday, November 04, 2013 4:10 PM
To: Igor Neyman
Cc: Jeff Janes; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Hi Neyman, thank you for your answer.
Unfortunately this query runs almost at the same time:

Sort  (cost=4877693.98..4877702.60 rows=3449 width=20) (actual 
time=25820.291..25821.845 rows=1640 loops=1)
  Sort Key: tt.tweet_id
  Sort Method: quicksort  Memory: 97kB
  Buffers: shared hit=1849 read=32788
  -  Nested Loop  (cost=247.58..4877491.32 rows=3449 width=20) (actual 
time=486.839..25814.120 rows=1640 loops=1)
        Buffers: shared hit=1849 read=32788
        -  Hash Semi Join  (cost=229.62..88553.23 rows=1681 width=8) (actual 
time=431.654..13209.159 rows=597 loops=1)
              Hash Cond: (t.user_id = relationship.followed_id)
              Buffers: shared hit=3 read=31870
              -  Index Scan using tweet_creation_time_index on tweet t  
(cost=0.57..83308.25 rows=1781234 width=16) (actual time=130.144..10037.764 
rows=1759645 loops=1)
                    Index Cond: ((creation_time = '2013-05-05 
00:00:00-03'::timestamp with time zone) AND (creation_time = '2013-05-06 
00:00:00-03'::timestamp with time zone))
                    Buffers: shared hit=1 read=31867
              -  Hash  (cost=227.12..227.12 rows=154 width=8) (actual 
time=94.365..94.365 rows=106 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 3kB
                    Buffers: shared hit=2 read=3
                    -  Index Only Scan using relationship_id on relationship  
(cost=0.42..227.12 rows=154 width=8) (actual time=74.540..94.101 rows=106 
loops=1)
                          Index Cond: (follower_id = 335093362)
                          Heap Fetches: 0
                          Buffers: shared hit=2 read=3
        -  Bitmap Heap Scan on tweet_topic tt  (cost=17.96..2841.63 rows=723 
width=20) (actual time=21.014..21.085 rows=3 loops=597)
              Recheck Cond: (tweet_id = t.id)
              Buffers: shared hit=1846 read=918
              -  Bitmap Index Scan on tweet_topic_pk  (cost=0.00..17.78 
rows=723 width=0) (actual time=15.012..15.012 rows=3 loops=597)
                    Index Cond: (tweet_id = t.id)
                    Buffers: shared hit=1763 read=632
Total runtime: 25823.386 ms

I have noticed that in both queries the index scan on tweet_creation_time_index 
is very expensive. Is there anything I can do to make the planner choose a 
index only scan?

Thank you,
Caio

Just try the following:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt  JOIN tweet AS t ON (tt.tweet_id = t.id
                                                  AND t.creation_time BETWEEN 
'D1' AND 'D2' AND t.user_id in
                                         (SELECT followed_id FROM relationship 
WHERE follower_id = N))
 ORDER BY tt.tweet_id;

And see if it helps with performance.

Regards,
Igor Neyman

What is your hardware configuration, and Postgres config parameters modified 
from default values?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How clustering for scale out works in PostgreSQL

2013-08-29 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of bsreejithin
 Sent: Thursday, August 29, 2013 12:42 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] How clustering for scale out works in PostgreSQL
 
 The performance test that was conducted was for 1 Hour.
 
 There are 6 transactions. 2 DB inserts and 4 SELECTs.
 Every 2 minutes there will be 4 SELECTs. And every 3 minutes there will be 2
 DB inserts.
 
 
 
 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/How-clustering-for-scale-out-
 works-in-PostgreSQL-tp5768917p5768957.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
 
 

With that kind of activity, you don't need clustering for your 1000 users.
What you need is PgBouncer, it should solv your problem.  Please read some docs 
on PgBouncer, it's light-weight and very easy to setup.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficient Correlated Update

2013-08-09 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Kevin Grittner
 Sent: Friday, August 09, 2013 11:44 AM
 To: Robert DiFalco; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Efficient Correlated Update
 
 Robert DiFalco robert.difa...@gmail.com wrote:
 
  In my system a user can have external contacts. When I am bringing in
  external contacts I want to correlate any other existing users in the
  system with those external contacts. A users external contacts may or
  may not be users in my system. I have a user_id field in contacts
  that is NULL if that contact is not a user in my system
 
  Currently I do something like this after reading in external
  contacts:
 
    UPDATE contacts SET user_id = u.id
    FROM my_users u
    JOIN phone_numbers pn ON u.phone_significant = pn.significant
    WHERE contacts.owner_id = 7
    AND contacts.user_id IS NULL
    AND contacts.id = pn.ref_contact_id;
 
  If any of the fields are not self explanatory let me know.
  Significant is just the right 7 most digits of a raw phone number.
 
  I'm more interested in possible improvements to my relational logic
  than the details of the significant condition. IOW, I'm start enough
  to optimize the significant query but not smart enough to know if
  this is the best approach for the overall correlated UPDATE query. :)
 
  So yeah, is this the best way to update a contact's user_id reference
  based on a contacts phone number matching the phone number of a user?
 
  One detail from the schema -- A contact can have many phone numbers
  but a user in my system will only ever have just one phone number.
  Hence the JOIN to phone_numbers versus the column in my_users.
 
 In looking it over, nothing jumped out at me as a problem.  Are you having
 some problem with it, like poor performance or getting results different from
 what you expected?
 
 --
 Kevin Grittner
 EDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 

There is an illness that sometimes strikes database developers/administrators.
It is called CTD - Compulsive Tuning Disorder :)

Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
Sent: Wednesday, August 07, 2013 8:43 AM
To: Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
view with another view inside of it.

Good day,
 
I have included a link to the result of EXPLAIN ANALYZE. It's this one:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h
 
Here's a link to Depesz's explain (if links to the site are okay):
http://explain.depesz.com/s/gCk
 
I have just tried setting geqo_threshold, join_collapse_limit and 
from_collapse_limit to 16, but it yielded no improvement.
Changing those three parameters to 32 did speed up the query from about 3.3 
seconds to about a second (give or take 50 ms), which is a pretty good 
improvement, but not quite there, as I'm looking to bring it down to about 300 
ms if possible. Changing those three settings to 48 yielded no improvements 
over 32.
Is there possibly something something else to tweak there?
 
Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
http://explain.depesz.com/s/cj2
 
Thank you.
 
Peter Slapansky

-

Your last explain analyze (with 3 settings set to 32)  shows query duration 
10ms, not 1sec.
Am I wrong? 

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 10:43 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

You're right, it does... but it's quite odd, because I re-ran the 
explain-analyze statement and got the same results.
Still, the query now runs for about a second as mentioned before, so it's 
almost like something's missing from the explain, but I'm certain I copied it 
all.
 
I did this via pgadmin, but that shouldn't matter, should it?
 
Thank you,
 
Peter Slapansky
__
_

At very end of explain analyze output there should be a line:

Total runtime: 

What do you get there?

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 11:34 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

I got:
Total runtime: 9.313 ms in pgAdmin
Total runtime: 9.363 ms in psql.
But timing after the query finished was 912.842 ms in psql.
 
Cheers,
 
Peter Slapansky
__

That proves what Pavel suggested regarding planning time.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Claudio Freire
 Sent: Wednesday, August 07, 2013 2:20 PM
 To: Robert DiFalco
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Efficiently query for the most recent record for a
 given user
 
 On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:
  Let's say I have a table something like this:
 
 create table call_activity (
  id int8 not null,
  called timestamp,
  user_id int8 not null,
  primary key (id)
  foreign key (user_id) references my_users
  )
 
 
  I want to get the last call_activity record for a single user.  Is
  there ANY way to efficiently retrieve the last record for a specified
  user_id, or do I need to de-normalize and update a table with a single
  row for each user each time a new call_activity record is inserted? I
  know I how to do the query without the summary table (subquery or
  GROUP BY with MAX) but that seems like it will never perform well for
  large data sets. Or am I full of beans and it should perform just fine
  for a huge data set as long as I have an index on called?
 
 
 Create an index over (user_id, called desc), and do
 
 select * from call_activity where user_id = blarg order by called desc limit 1
 

And most recent call for every user:

SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity;

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Shaun Thomas
 Sent: Thursday, June 27, 2013 12:16 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Partitions not Working as Expected
 
 Hey guys,
 
 I suspect I'll get an answer equivalent to the planner treats that like a
 variable, but I really hope not because it renders partitions essentially
 useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled.
 
 What I have is this test case:
 
 CREATE TABLE part_test (
fake INT,
part_col TIMESTAMP WITHOUT TIME ZONE
 );
 
 CREATE TABLE part_test_1 (
CHECK (part_col = '2013-05-01' AND
   part_col  '2013-06-01')
 ) INHERITS (part_test);
 
 CREATE TABLE part_test_2 (
CHECK (part_col = '2013-04-01' AND
   part_col  '2013-05-01')
 ) INHERITS (part_test);
 
 And this query performs a sequence scan across all partitions:
 
 EXPLAIN ANALYZE
 SELECT * FROM part_test
   WHERE part_col  CURRENT_DATE;
 
 The CURRENT_DATE value is clearly more recent than any of the partitions,
 yet it checks them anyway. The only way to get it to properly constrain
 partitions is to use a static value:
 
 EXPLAIN ANALYZE
 SELECT * FROM part_test
   WHERE part_col  '2013-06-27';
 
 But developers never do this. Nor should they. I feel like an idiot even 
 asking
 this, because it seems so wrong, and I can't seem to come up with a
 workaround other than, Ok devs, hard code dates into all of your queries
 from now on.
 
 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
 312-676-8870
 stho...@optionshouse.com
 

Doesn't have to be hardcoded.
If executed as dynamic sql, it will be re-planned properly, e.g.:

lQueryString := 'SELECT MAX(cycle_date_time) AS MaxDT
 FROM gp_cycle_' || partition_extension::varchar ||
 ' WHERE cell_id = ' || i_n_Cell_id::varchar ||
 ' AND part_type_id = ' || i_n_PartType_id::varchar ||
 ' AND cycle_date_time = TIMESTAMP ' || 
quote_literal(cast(i_t_EndDate AS VARCHAR));
   IF (lQueryString IS NOT NULL) THEN
   EXECUTE lQueryString INTO lEndDate;


Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman
Hi,
postgres does a seqscan, even though there is an index present and it should be 
much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i 
do.
I can reproduce it with a simplified set of the data itself though.
here's the query, and the analyzed plan:
select count(*) 
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual 
time=481.526..481.526 rows=1 loops=1)
  -  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual 
time=317.403..481.513 rows=17 loops=1)
    Hash Cond: (d2.gid = g2.gid)
    -  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual 
time=0.013..231.707 rows=3107454 loops=1)
    -  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual 
time=0.207..0.207 rows=121 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 5kB
  -  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 
width=8) (actual time=0.102..0.156 rows=121 loops=1)
    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
-- 

So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman


From: Willy-Bas Loos [mailto:willy...@gmail.com] 
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another 
directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based 
authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration 
file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra 
PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'

--

You could change this setting on session level, and prove yourself or query 
optimizer right (or wrong :)

Igor Neyman

...
...
Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual 
time=481.526..481.526 rows=1 loops=1)
  -  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual 
time=317.403..481.513 rows=17 loops=1)
    Hash Cond: (d2.gid = g2.gid)
    -  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual 
time=0.013..231.707 rows=3107454 loops=1)
    -  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual 
time=0.207..0.207 rows=121 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 5kB
  -  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 
width=8) (actual time=0.102..0.156 rows=121 loops=1)
    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--
So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman



-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Igor Neyman


From: Willy-Bas Loos [mailto:willy...@gmail.com] 
Sent: Wednesday, June 26, 2013 3:19 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

plan with enable_seqscan off:

Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual 
time=208.681..208.681 rows=1 loops=1)
  -  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual 
time=69.403..208.647 rows=17 loops=1)
    -  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43 
rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
  Index Cond: (blok = 1942)
    -  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179 
width=8) (actual time=1.340..1.341 rows=0 loops=121)
  Recheck Cond: (geo_id = g.geo_id)
  -  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82 
rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms


On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman iney...@perceptron.com wrote:


From: Willy-Bas Loos [mailto:willy...@gmail.com]
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another 
directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based 
authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration 
file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra 
PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'
--

How much RAM you have on this machine?
What else is this machine is being used for (besides being db server)?
And, what is your setting for effective_cache_size?  It looks like you didn't 
change it from default (128MB).
You need to adjust effective_cache_size so somewhat between 60%-75% of RAM, if 
the database is the main process running on this machine.

Again, effective_cache_size could be set on session level, so you could try it 
before changing GUC in postgresql.conf.
When trying it, don't forget to change enable_seqscan back to on (if it's 
still off).

Igor Neyman







-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Merlin Moncure
 Sent: Thursday, May 30, 2013 11:14 AM
 To: John Mudd
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
 
 On Mon, May 27, 2013 at 9:02 AM, John Mudd johnbm...@gmail.com wrote:
  Postgres 9.1.2 on Ubuntu 12.04
 
  Any reason why a select by primary key would be slower than a select
  that includes an ORDER BY? I was really hoping using the primary key
  would give me a boost.
 
  I stopped the server and cleared the O/S cache using sync; echo 3 
  /proc/sys/vm/drop_caches between the runs.
 
 
 
  test=# VACUUM ANALYZE test_select;
  VACUUM
 
  (stopped postgres; reset O/S cache; started postgres)
 
  test=# explain analyze SELECT * FROM test_select WHERE key1 = 50
  ORDER BY key1, key2, key3, id LIMIT 1;
QUERY
  PLAN
  -
 -
  
   Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
  rows=1
  loops=1)
 -  Index Scan using my_key on test_select  (cost=0.00..41895.49
  rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
   Index Cond: (key1 = 50)
   Total runtime: 12.678 ms
 
  (stopped postgres; reset O/S cache; started postgres)
 
 
 why are you flushing postgres/os cache?  when you do that, you are
 measuring raw read time from disks.  Typical disk seek time is measured
 in milliseconds so the timings are completely appropriate once you
 remove caching effects. Hard drives (at least, the spinning
 kind) are slow and one of the major challenges of database and hardware
 engineering is working around their limitations.  Fortunately it looks
 like faster storage will soon be commonplace for reasonable prices.
 
 merlin
 

True.
But, on the hand (back to original question), 
execution plans that John got before and after suggested change in 
configuration parameters are exactly the same, though timing is different but 
only due to buffer cache issue.

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice when reindexing in production

2013-05-29 Thread Igor Neyman


You could do something like this (which considers you use simple names for your 
indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 
INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not 
consider at least two things: index that are constraints and index that has FK 
depending on it. For the first case, you only need to change the constraint to 
use the index and the DROP command. As for the second case, you would need to 
remove the FKs, drop the old one and recreate the FK (inside a transaction, of 
course), but this could be really slow, a reindex for this case would be 
simpler and perhaps faster.


Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than 
without it, but you could have FK without index (on the child table).
So, what gives?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice when reindexing in production

2013-05-29 Thread Igor Neyman


From: Matheus de Oliveira [mailto:matioli.math...@gmail.com] 
Sent: Wednesday, May 29, 2013 10:19 AM
To: Igor Neyman
Cc: Niels Kristian Schjødt; Magnus Hagander; pgsql-performance@postgresql.org 
list
Subject: Re: [PERFORM] Best practice when reindexing in production



On Wed, May 29, 2013 at 10:55 AM, Igor Neyman iney...@perceptron.com wrote:


You could do something like this (which considers you use simple names for your 
indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 
INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not 
consider at least two things: index that are constraints and index that has FK 
depending on it. For the first case, you only need to change the constraint to 
use the index and the DROP command. As for the second case, you would need to 
remove the FKs, drop the old one and recreate the FK (inside a transaction, of 
course), but this could be really slow, a reindex for this case would be 
simpler and perhaps faster.

=
I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than 
without it, but you could have FK without index (on the child table).
So, what gives?

AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on 
the target table. It creates an entry on pg_depends (I don't know if somewhere 
else), and when you try to drop the index, even if there is an identical one 
that PGs could use, it will throw an error.

You can easily check this:

postgres=# CREATE TABLE parent(id int);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id);
CREATE INDEX
postgres=# CREATE TABLE child(idparent int REFERENCES parent (id));
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id);
CREATE INDEX
postgres=# DROP INDEX parent_idx1;
ERROR:  cannot drop index parent_idx1 because other objects depend on it
DETAIL:  constraint child_idparent_fkey on table child depends on index 
parent_idx1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

BTW, I do think PostgreSQL could verify if there is another candidate to this 
FK. Is it in TODO list? Should it be?

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


So, it's about index on parent table that's used for unique (or PK) constraint 
and referenced by FK on child table.
From your previous email I thought that index on child table supporting FK 
(which is mostly created for performance purposes) cannot be dropped without 
disabling FK. My bad.

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: Anne Rosset [mailto:aros...@collab.net]
 Sent: Friday, May 03, 2013 4:52 PM
 To: Igor Neyman; k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: RE: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 We saw a little bit improvement by increasing the min_pool_size but
 again I see a bigvariation in the time the query is executed. Here is
 the query:
 
 srdb= explain analyze SELECT
 psrdb-artifact.id AS id,
 psrdb-artifact.priority AS priority,
 psrdb-project.path AS projectPathString,
 psrdb-project.title AS projectTitle,
 psrdb-folder.project_id AS projectId,
 psrdb-folder.title AS folderTitle,
 psrdb-item.folder_id AS folderId,
 psrdb-item.title AS title,
 psrdb-item.name AS name,
 psrdb-field_value2.value AS status,
 psrdb-field_value3.value AS category,
 psrdb-sfuser.username AS submittedByUsername,
 psrdb-sfuser.full_name AS submittedByFullname,
 psrdb-sfuser2.username AS assignedToUsername,
 psrdb-sfuser2.full_name AS assignedToFullname,
 psrdb-item.version AS version,
 psrdb-CASE when ((SELECT
 psrdb(mntr_subscription.user_id AS userId
 psrdb( FROM
 psrdb(mntr_subscription mntr_subscription
 psrdb( WHERE
 psrdb(artifact.id=mntr_subscription.object_key
 psrdb( AND mntr_subscription.user_id='user1439'
 psrdb( )= 'user1439') THEN 'user1439' ELSE null END AS
 monitoringUserId,
 psrdb-tracker.icon AS trackerIcon,
 psrdb-tracker.remaining_effort_disabled AS
 remainingEffortDisabled,
 psrdb-tracker.actual_effort_disabled AS actualEffortDisabled,
 psrdb-tracker.estimated_effort_disabled AS
 estimatedEffortDisabled
 psrdb- FROM
 psrdb-field_value field_value2,
 psrdb-field_value field_value,
 psrdb-sfuser sfuser2,
 psrdb-field_value field_value3,
 psrdb-field_value field_value4,
 psrdb-item item,
 psrdb-project project,
 psrdb-relationship relationship,
 psrdb-tracker tracker,
 psrdb-artifact artifact,
 psrdb-sfuser sfuser,
 psrdb-folder folder
 psrdb- WHERE
 psrdb-artifact.id=item.id
 psrdb- AND item.folder_id=folder.id
 psrdb- AND folder.project_id=project.id
 psrdb- AND artifact.group_fv=field_value.id
 psrdb- AND artifact.status_fv=field_value2.id
 psrdb- AND artifact.category_fv=field_value3.id
 psrdb- AND artifact.customer_fv=field_value4.id
 psrdb- AND item.created_by_id=sfuser.id
 psrdb- AND relationship.is_deleted=false
 psrdb- AND
 relationship.relationship_type_name='ArtifactAssignment'
 psrdb- AND relationship.origin_id=sfuser2.id
 psrdb- AND artifact.id=relationship.target_id
 psrdb- AND item.is_deleted=false
 psrdb- AND ((artifact.priority=3))
 psrdb- AND (project.path='projects.psr-pub-13')
 psrdb- AND item.folder_id=tracker.id
 psrdb- ;
 
 QUERY PLAN
 
 ---
 -
 ---
 
  Nested Loop  (cost=0.00..272.62 rows=1 width=181) (actual
 time=805.934..1792.596 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..263.87 rows=1 width=167) (actual
 time=707.739..1553.348 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..263.58 rows=1 width=153) (actual
 time=653.053..1496.839 rows=177 loops=1)
 
-  Nested Loop  (cost=0.00..262.50 rows=1 width=154)
 (actual time=565.627..1385.667 rows=177 loops=1)
 
  -  Nested Loop  (cost=0.00..262.08 rows=1
 width=163) (actual time=565.605..1383.686 rows=177 loops
 =1)
-  Nested Loop  (cost=0.00..261.67 rows=1
 width=166) (actual time=530.928..1347.053 rows=177
  loops=1)
  -  Nested Loop  (cost=0.00..261.26
 rows=1 width=175) (actual time=530.866..1345.032
 rows=177 loops=1)
-  Nested Loop
 (cost=0.00..260.84 rows=1 width=178) (actual time=372.825..1184.
 668 rows=177 loops=1)
  -  Nested Loop
 (cost=0.00..250.33 rows=29 width=128) (actual time=317.897
 ..534.645 rows=1011 loops=1)
-  Nested Loop
 (cost=0.00..207.56 rows=3 width=92) (actual time=251
 .014..408.868 rows=10 loops=1)
  -  Nested
 Loop  (cost=0.00..163.54 rows=155 width=65) (actual
 time=146.176..382.023 rows=615 loops=1)
-
 Index Scan using project_path on project  (cost=0.00.
 .8.27 rows=1 width=42) (actual time=76.581..76.583 rows=1 loops=1)
 
 Index Cond: ((path)::text

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: Anne Rosset [mailto:aros...@collab.net]
 Sent: Monday, May 06, 2013 1:01 PM
 To: Igor Neyman; k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: RE: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 Hi Igor,
 The explain analyze is from when there was no load.
 
 Artifact table: 251831 rows
 Field_value table: 77378 rows
 Mntr_subscription: 929071 rows
 Relationship: 270478 row
 Folder: 280356 rows
 Item: 716465 rows
 Sfuser: 5733 rows
 Project: 1817 rows
 
 8CPUs
 RAM: 8GB
 
 Postgres version: 9.0.13
 
  And no we haven't switched or tested yet  with pgbouncer. We would
 like to do a bit more analysis before trying this.
 
 Thanks for your help,
 Anne
 
 


Anne,

Just as a quick test, try in the psql session/connection locally change 
enable_nestloop setting and run your query:

set enable_nestloop = off;
explain analyze your_query;

just to see if different execution plan will be better and optimizer needs to 
be convinced to use this different plan.
Please post what you get with the modified setting.

Also, what is the setting for effective_cache_size in postgresql.conf?

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Thomas Kellerer
 Sent: Monday, May 06, 2013 1:12 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 Anne Rosset, 06.05.2013 19:00:
  Postgres version: 9.0.13
 
  Work_mem is set to 64MB
  Shared_buffer to 240MB
  Segment_size is 1GB
  Wal_buffer is 10MB
 
  Artifact table: 251831 rows
  Field_value table: 77378 rows
  Mntr_subscription: 929071 rows
  Relationship: 270478 row
  Folder: 280356 rows
  Item: 716465 rows
  Sfuser: 5733 rows
  Project: 1817 rows
 
  8CPUs
  RAM: 8GB
 
 
 With 8GB RAM you should be able to increase shared_buffer to 1GB or
 maybe even higher especially if this is a dedicated server.
 240MB is pretty conservative for a server with that amount of RAM
 (unless you have many other applications running on that box)
 
 Also what are the values for
 
 cpu_tuple_cost
 seq_page_cost
 random_page_cost
 effective_cache_size
 
 What kind of harddisk is in the server? SSD? Regular ones (spinning
 disks)?
 
 
 


Also, with 8 CPUs, your max connection_pool size shouldn't much bigger than 20.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-06 Thread Igor Neyman


From: Anne Rosset [aros...@collab.net]
Sent: Monday, May 06, 2013 5:51 PM
To: Igor Neyman; k...@rice.edu
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Deterioration in performance when query executed in  
multi threads

Hi Igor,
Result with enable_nestloop off:




---
 Hash Join  (cost=49946.49..58830.02 rows=1 width=181) (actual 
time=2189.474..2664.888 rows=180 loops=1)
   Hash Cond: ((item.created_by_id)::text = (sfuser.id)::text)
   -  Hash Join  (cost=49470.50..58345.53 rows=1 width=167) (actual 
time=1931.870..2404.745 rows=180 loops=1)
 Hash Cond: ((relationship.origin_id)::text = (sfuser2.id)::text)
 -  Hash Join  (cost=48994.51..57869.52 rows=1 width=153) (actual 
time=1927.603..2400.334 rows=180 loops=1)
   Hash Cond: ((relationship.target_id)::text = (artifact.id)::text)
   -  Seq Scan on relationship  (cost=0.00..7973.38 rows=240435 
width=19) (actual time=0.036..492.442 rows=241285 loops=1)
 Filter: ((NOT is_deleted) AND 
((relationship_type_name)::text = 'ArtifactAssignment'::text))
   -  Hash  (cost=48994.49..48994.49 rows=1 width=154) (actual 
time=1858.350..1858.350 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 -  Hash Join  (cost=47260.54..48994.49 rows=1 width=154) 
(actual time=1836.495..1858.151 rows=180 loops=1)
   Hash Cond: ((field_value4.id)::text = 
(artifact.customer_fv)::text)
   -  Seq Scan on field_value field_value4  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=22.104..30.694 rows=77378 
loops=1)
   -  Hash  (cost=47260.52..47260.52 rows=1 width=163) 
(actual time=1814.005..1814.005 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 35kB
 -  Hash Join  (cost=45526.57..47260.52 rows=1 
width=163) (actual time=1790.908..1813.780 rows=180 loops=1)
   Hash Cond: ((field_value3.id)::text = 
(artifact.category_fv)::text)
   -  Seq Scan on field_value field_value3 
 (cost=0.00..1443.78 rows=77378 width=15) (actual time=0.002..9.262 rows=77378 
loops=1)
   -  Hash  (cost=45526.55..45526.55 
rows=1 width=166) (actual time=1790.505..1790.505 rows=180 loops=1)
 Buckets: 1024  Batches: 1  Memory 
Usage: 36kB
 -  Hash Join  
(cost=43792.60..45526.55 rows=1 width=166) (actual time=1768.362..1790.304 
rows=180 loops=1)
   Hash Cond: 
((field_value.id)::text = (artifact.group_fv)::text)
   -  Seq Scan on field_value  
(cost=0.00..1443.78 rows=77378 width=9) (actual time=0.002..8.687 rows=77378 
loops=1)
   -  Hash  
(cost=43792.58..43792.58 rows=1 width=175) (actual time=1767.928..1767.928 
rows=180 loops=1)
 Buckets: 1024  
Batches: 1  Memory Usage: 38kB
 -  Hash Join  
(cost=42058.63..43792.58 rows=1 width=175) (actual time=1499.822..1767.734 
rows=180 loops=1)
   Hash Cond: 
((field_value2.id)::text = (artifact.status_fv)::text)
   -  Seq Scan on 
field_value field_value2  (cost=0.00..1443.78 rows=77378 width=15) (actual 
time=0.002..261.082 rows=77378 loops=1)
   -  Hash  
(cost=42058.61..42058.61 rows=1 width=178) (actual time=1492.707..1492.707 
rows=180 loops=1)
 Buckets: 
1024  Batches: 1  Memory Usage: 38kB
 -  Hash 
Join  (cost=18039.59..42058.61 rows=1 width=178) (actual 
time=1175.659..1492.482 rows=180 loops=1)
   Hash 
Cond: ((item.id)::text = (artifact.id)::text)
   -  
Hash Join  (cost=12112.31..36130.95 rows=30 width=128) (actual 
time=304.035..702.745 rows=1015 loops=1)

 Hash Cond: ((item.folder_id)::text = (folder.id)::text)

 -  Seq Scan on item  (cost=0.00

Re: [PERFORM] Deterioration in performance when query executed in multi threads

2013-05-01 Thread Igor Neyman


 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Anne Rosset
 Sent: Wednesday, May 01, 2013 1:10 PM
 To: k...@rice.edu
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 Thanks Ken. I am going to test with different pool sizes and see if I
 see any improvements.
 Are there other configuration options I should look like? I was
 thinking of playing with shared_buffer.
 
 Thanks,
 Anne
 
 -Original Message-
 From: k...@rice.edu [mailto:k...@rice.edu]
 Sent: Wednesday, May 01, 2013 9:27 AM
 To: Anne Rosset
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Deterioration in performance when query executed
 in multi threads
 
 On Wed, May 01, 2013 at 04:07:55PM +, Anne Rosset wrote:
  Hi Ken,
  Thanks for your answer. My test is actually running with jboss 7/jdbc
 and the connection pool is defined  with min-pool-size =10 and max-
 pool-size=400.
 
  Why would you think it is an issue with the connection pool?
 
  Thanks,
  Anne
 
 
 Hi Anne,
 
 You want to be able to run as many jobs productively at once as your
 hardware is capable of supporting. Usually something starting a 2 x
 number of CPUs is best.
 If you make several runs increasing the size of the pool each time, you
 will see a maximum throughput somewhere near there and then the
 performance will decrease as you add more and more connections. You can
 then use that sweet spot.
 Your test harness should make that pretty easy to find.
 
 Regards,
 Ken
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-
 performa...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Anne,

Before expecting advice on specific changes to Postgres configuration 
parameters,
You should provide this list with your hardware configuration, Postgres 
version, your current Postgres configuration parameters (at least those that 
changed from defaults).
And, if you do the testing using specific query, would be nice if you provide 
the results of:

Explain analyze your_select;

along with the definition of database objects (tables, indexes) involved in 
this select.

Also, you mention client-side connection pooler.  In my experience, server-side 
poolers, such as PgBouncer mentioned earlier, are much more effective.

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #8130: Hashjoin still gives issues

2013-05-01 Thread Igor Neyman


 -Original Message-
 

 
 The original query:
 
 select * from ambit_privateevent_calendars as a, ambit_privateevent as
 b, ambit_calendarsubscription as c, ambit_calendar as d where
 c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
 and  c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
 12, 20) and not b.main_recurrence = true;
 
 select b.id from ambit_privateevent_calendars as a, ambit_privateevent
 as b, ambit_calendarsubscription as c, ambit_calendar as d where
 c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270
 and  c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4,
 12, 20) and not b.main_recurrence = true;
 
 (select * = select b.id, the star query is *fastest*)
 
 We compare:
 http://explain.depesz.com/s/jRx
 http://explain.depesz.com/s/eKE
 
 
 By setting set enable_hashjoin = off; performance in our entire
 application increased 30 fold in throughput, which was a bit unexpected
 but highly appreciated. The result of the last query switch the
 mergejoin:
 
 http://explain.depesz.com/s/AWB
 
 It is also visible that after hashjoin is off, the b.id query is faster
 than the * query (what would be expected).
 
 
 Our test machine is overbudgetted, 4x the memory of the entire database
 ~4GB, and uses the PostgreSQL stock settings.
 
 
 Stefan
 

I'd suggest that you adjust Postgres configuration, specifically memory 
settings (buffer_cache, work_mem, effective_cache_size), to reflect your 
hardware config, and see how it affects your query.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Join between 2 tables always executes a sequential scan on the larger table

2013-04-02 Thread Igor Neyman
From: Dieter Rehbein [mailto:dieter.rehb...@skiline.cc] 
Sent: Tuesday, April 02, 2013 4:52 AM
To: pgsql-performance@postgresql.org
Subject: Join between 2 tables always executes a sequential scan on the larger 
table

Hi everybody,

in a project I have a performance problem, which I (and my colleagues) don't 
understand. It's a simple join between 2 of 3 tables:

table-1:   user   (id, user_name, ...).   This table has about 1 million rows 
(999673 rows)
table-2:   competition (57 rows)
table-3:   user_2_competition.  A relation between user and competition. This 
table has about 100.000 rows

The query is a join between table user_2_competition and user and looks like 
this:

select u.id, u.user_name
from user_2_competition uc 
          left join user u on u.id = uc.user_id 
where uc.competition_id = '3cc1cb9b3ac132ad013ad01316040001'

The query returns the ID and user_name of all users participating in a 
competition.

What I don't understand: This query executes a sequential scan on user!


The tables have the following indexes:

user_2_competition:  there is an index on user_id and an index on 
competition_id (competition_id is a VARCHAR(32) containing UUIDs)
user:  id is the primary key and has therefore a unique index (the ID is a 
VARCHAR(32), which contains UUIDs).

The database has just been restored from a backup, I've executed ANALYZE for 
both tables.

The output of explain analyze (Postgres 9.2.3):

Hash Left Join  (cost=111357.64..126222.29 rows=41396 width=42) (actual 
time=1982.543..2737.331 rows=41333 loops=1)
  Hash Cond: ((uc.user_id)::text = (u.id)::text)
  -  Seq Scan on user_2_competition uc  (cost=0.00..4705.21 rows=41396 
width=33) (actual time=0.019..89.691 rows=41333 loops=1)
        Filter: ((competition_id)::text = 
'3cc1cb9b3ac132ad013ad01316040001'::text)
        Rows Removed by Filter: 80684
  -  Hash  (cost=90074.73..90074.73 rows=999673 width=42) (actual 
time=1977.604..1977.604 rows=999673 loops=1)
        Buckets: 2048  Batches: 128  Memory Usage: 589kB
        -  Seq Scan on user u  (cost=0.00..90074.73 rows=999673 width=42) 
(actual time=0.004..1178.827 rows=999673 loops=1)
Total runtime: 2740.723 ms


I expected to see an index-scan on user_2_competition with a hash join to user, 
not a sequential scan on user.  I've tried this with Postgres 9.1 and 9.2.3).

Any ideas, what's going on here?

With EXPLAIN ANALYZE I can see, which query plan Postgres is using.  Is there 
any way to find out, WHY postgres uses this query plan? 

best regards
Dieter

---

Dieter, 
why do you think index-scan on user_2_competition would be better?

Based on huge number of rows returned (41333 out of total ~12 in the table) 
from this table optimizer decided that Seq Scan is better than index scan.
You don't show QUERY TUNING parameters from Postgresql.conf, are they default?
Playing with optimizer parameters (lowering random_page_cost, lowering 
cpu_index_tuple_cost , increasing effective_cache_size, or just setting 
enable_seqscan = off), you could try to force optimizer to use index, and see 
if you are getting better results.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to help the planner

2013-03-29 Thread Igor Neyman
Marty,

When you change from/join collaps_limit pay attention to Genetic Query 
Optimizer settings, I believe by default it's on (geqo = on).
Specifically look at geqo_threshold parameter (default is 12).  
AFAIK, if you don't have intensions to use Genetic Query Optimizer, 
geqo_threshold parameter should be higher than your collaps_limit, e.g. if you 
want to set collaps_limit to 50, and you think you may join 50 tables, then 
also increase geqo_threshold to at least 51.
Otherwise GeCO will come into play unexpectedly.

Besides this, try to play with these parameters (according to your original 
message you keep them at default):

#seq_page_cost = 1.0# measured on an arbitrary scale
random_page_cost = 2.0  # same scale as above  (default 4.0)
cpu_tuple_cost = 0.05   # same scale as above (default 0.01)
cpu_index_tuple_cost = 0.05 # same scale as above  (default 0.005)
cpu_operator_cost = 0.0075  # same scale as above  (default 0.0025)

Start with cpu_tuple_cost, increasing it from default 0.01 to 0.03-0.05.

Regards,
Igor Neyman


From: Marty Frasier [mailto:m.fras...@escmatrix.com] 
Sent: Thursday, March 28, 2013 4:45 PM
To: Tom Lane
Cc: pgsql-performance@postgresql.org; James Quinn
Subject: Re: how to help the planner

Tom,
I cranked (join|from)_collapse_limit up to 50, then 500 just to exclude the 
limits completey, and attempted the query both times.  The planner came up with 
an estimate close to the other estimates (1,944,276) and I stopped actual 
execution after some length of time.
The t12 subquery is grouped differently because that particular test can be 
valid at mutliple schools per student.

I had set session pg_default_statistics to 1 and analyzed prior to the 
earlier runs to allow it to have the best stats it could.  I've looked at it a 
little more closely, setting pg_default_statistics back to default of 100 and 
re-ran analyze on that database.

The value 'cahsee_ela' occurs 75,000 times in column 
analysis.iteration__student__test__year.test which totals 11M rows.  It's 
ranked about 60 of 91 values in frequency.
By setting statistics=1000 on the column 'test' the MCV from pg_stats contains 
all 91 distinct values (there are no nulls) and there is no histogram_bounds 
value for the column.  From MCV: cahsee_ela = 0.00658 which is accurate.
I think that should give the planner good info on the selectivity of the where 
clause.  It appears from the var_eq_const function that it will use that exact 
value when found.  It doesn' t seem to help the outcome though as it had good 
stats before.  I just understand it a little better now - which is good.

Do you have any suggestions where to probe next?
I see some statistics hooks mentioned in some of the source codes but don't 
know how to take advantage of them or whether it would be of use.
I suppose the answer could eventually be we have to reorganize our queries?
 
Thanks,
Marty


On Thu, Mar 28, 2013 at 12:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Marty Frasier m.fras...@escmatrix.com writes:
 We've been using postgreSQL for a few years.  This is my first post here
 and first real dive into query plans.
One quick thought is that it's probably worth cranking up
join_collapse_limit and/or from_collapse_limit, since the number of
relations in the query is considerably more than the default values of
those limits.  This will make planning take longer but possibly find
better plans.  I'm not sure it will help a lot, since most of the
problem is evidently bad rowcount estimates, but it might help.

Also it seems like the major rowcount failing is in the estimate for the
t12 subquery.  I can't tell why that particular combination of WHERE
clauses is giving it such a hard time --- is there something odd about
the distribution of 'cahsee_ela' tests?  Why is that particular subquery
grouped over school/student when all the others are grouped over just
student?

                        regards, tom lane





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage after partitioning

2013-01-22 Thread Igor Neyman
In PG 9.2 I’m getting “Index Only Scan Backward” for every partition in the 
first part of execution plan, when looking for MAX in partitioned table on a 
similar query:

-  Index Only Scan Backward using pk_cycle_200610 on 
gp_cycle_200610 gp_cycle  (cost=0.00..8.34 rows=5 width=8) (actual 
time=0.021..0.021 rows=1 loops=1)
  Index Cond: (cycle_date_time IS NOT NULL)
  Heap Fetches: 0

May be you should upgrade to 9.2.

Regards,
Igor Neyman


From: rudi [mailto:rudol...@gmail.com]
Sent: Tuesday, January 22, 2013 10:08 AM
To: pgsql-performance@postgresql.org
Subject: Re: High CPU usage after partitioning

On Tue, Jan 22, 2013 at 3:46 PM, Andrew Dunstan 
and...@dunslane.netmailto:and...@dunslane.net wrote:
The query is pretty simple and standard, the behaviour (and the plan) is 
totally different when it comes to a partitioned table.

Partioned table query = explain analyze SELECT  sb_logs.* FROM sb_logs  
WHERE (device_id = 901 AND date_taken = (SELECT MAX(date_taken) FROM sb_logs 
WHERE device_id = 901));

And there you have it. Constraint exclusion does not work in cases like this. 
It only works with static expressions (such as a literal date in this case).

Ok, but I would have expected same plant repeated 4 times. When the table is 
not partitioned, the plan is defintely smarter: it knows that index is reversed 
and looks for max with an index scan backward). When the table is partitioned, 
it scan forward and I guess it will always do a full index scan.



--
rd

This is the way the world ends.
Not with a bang, but a whimper.


Re: [PERFORM] How can i find out top high load sql queries in PostgreSQL.

2012-12-19 Thread Igor Neyman
 -Original Message-
 From: Vibhor Kumar [mailto:vibhor.ku...@enterprisedb.com]
 Sent: Monday, December 17, 2012 11:34 AM
 To: suhas.basavaraj12
 Cc: pgsql-performance@postgresql.org
 Subject: Re: How can i find out top high load sql queries in
 PostgreSQL.
 
 
 On Dec 17, 2012, at 3:21 AM, suhas.basavaraj12 suha...@verse.in
 wrote:
 
  There is a tool called pg Fouine . I am sure this will help  you..
 
  http://pgfouine.projects.pgfoundry.org/tutorial.html
 
 +1
 
 You can also use pgbadger, which seemed more flexible than pgFouine.
 http://dalibo.github.com/pgbadger/
 
 Thanks  Regards,
 Vibhor Kumar
 EnterpriseDB Corporation
 The Enterprise PostgreSQL Company
 Blog:http://vibhork.blogspot.com
 

Pg_stat_statements extension tracks SQL statements execution statistics.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Igor Neyman
From: Anibal David Acosta [mailto:a...@devshock.com] 
Sent: Thursday, October 04, 2012 10:01 AM
To: pgsql-performance@postgresql.org
Subject: how to avoid deadlock on masive update with multiples delete

.
. 
.

The other situation could be that update process while blocking rows scale to 
block page and the try to scale to lock table while the delete process as some 
locked rows.

Thanks!


This (lock escalation from row - to page - to table) is MS SQL Server 
feature, pretty sure Postgres does not do it.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres becoming slow, only full vacuum fixes it

2012-10-01 Thread Igor Neyman
 -Original Message-
 From: Thomas Kellerer [mailto:spam_ea...@gmx.net]
 Sent: Tuesday, September 25, 2012 7:24 AM
 To: pgsql-performance@postgresql.org
 Subject: Re: Postgres becoming slow, only full vacuum fixes it
 
 Kiriakos Tsourapas, 25.09.2012 13:01:
  Thank you,
 
  I will take this into consideration, since upgrading to 9 will be
 much harder I assume...
 
 
 I think an upgrade from 8.3 to 8.4 was harder due to the removal of a
 lot of implicit type casts.
 8.4 to 9.x shouldn't be that problematic after all (but will take
 longer due to the required dump/reload)
 

Actually, 8.3 to 8.4 required db dump/restore.
When upgrading from 8.4 to 9.x pg_upgrade could be used  without dump/restore.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-07 Thread Igor Neyman
 your covering index - is it clustered?

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-03 Thread Igor Neyman
 -Original Message-
 From: Craig Ringer [mailto:ring...@ringerc.id.au]
 Sent: Thursday, November 03, 2011 5:07 AM
 To: Igor Neyman
 Cc: Robert Haas; Tom Lane; Jay Levitt;
pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Guide to PG's capabilities for inlining,
 predicate hoisting, flattening, etc?
 
 On 11/03/2011 04:22 AM, Igor Neyman wrote:
 
 That said, I'm not actually against performance hints if done
sensibly.
 
 --
 Craig Ringer
 


 ...sensibly
As it is with any other feature...

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

2011-11-02 Thread Igor Neyman


 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Wednesday, November 02, 2011 11:13 AM
 To: Tom Lane
 Cc: Jay Levitt; pgsql-performance@postgresql.org
 Subject: Re: Guide to PG's capabilities for inlining, predicate
 hoisting, flattening, etc?
 ...
 ...
 Perhaps we could let people say
 something like WITH x AS FENCE (...) when they want the fencing
 behavior, and otherwise assume they don't (but give it to them anyway
 if there's a data-modifying operation in there).
 
 
  
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


Hints here we come :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to see memory usage using explain analyze ?

2011-08-17 Thread Igor Neyman

 -Original Message-
 From: hyelluas [mailto:helen_yell...@mcafee.com]
 Sent: Monday, August 15, 2011 2:33 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: How to see memory usage using explain analyze ?
 
 Igor,
 
 thank you , my tests showed better performance against the larger
 summary
 tables when I splited the index for datasource_id  datex , I use to
 have a
 composed index.
 
 Regarding that index statistics - should I analyze the tables? I
 thought
 auto vacuum takes care of it.
 
 helen
 
 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/How-to-see-memory-usage-using-
 explain-analyze-tp4694962p4701919.html
 Sent from the PostgreSQL - performance mailing list archive at
 Nabble.com.


But, having different sets of indexes, you can't compare execution
plans.
In regards to statistics, you could try to ANALYZE table manually, may
be increasing default_statistics_target.
From the docs:

default_statistics_target (integer)

Sets the default statistics target for table columns that have not
had a column-specific target set via ALTER TABLE SET STATISTICS. Larger
values increase the time needed to do ANALYZE, but might improve the
quality of the planner's estimates. The default is 10. For more
information on the use of statistics by the PostgreSQL query planner,
refer to Section 14.2.

HTH,
Igor

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to see memory usage using explain analyze ?

2011-08-15 Thread Igor Neyman
.nabble.com/How-to-see-memory-usage-using-
 explain-analyze-tp4694681p4694681.html
 Sent from the PostgreSQL - performance mailing list archive at
 Nabble.com.

Helen,

I'm probably a bit late answering your question.
But, just in case...

It looks like one table has combined index summ_app_fw_datex_15191 on
both: datasource_id and datex, which works better than 2 separate
indexes ind_datex_15191(datex) and ind_fw_15191(datasource_id), that you
have on the other table.
Besides, this:

-  Bitmap Index Scan on ind_datex_15191
(cost=0.00..985.83 rows=46855 width=0) (actual time=1020.834..1020.834
rows=9370944 loops=1)

Shows that statistics on ind_datex_15191 are completely out of wack
(expected rows=46855, actual rows=9370944).

HTH,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Real vs Int performance

2011-01-27 Thread Igor Neyman
 

 -Original Message-
 From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
 Sent: Wednesday, January 26, 2011 5:12 PM
 To: David Greco
 Cc: pgsql-performance@postgresql.org
 Subject: Re: Real vs Int performance 
 
 David Greco david_gr...@harte-hanks.com writes:
  Came across a problem I find perplexing. I recreated the 
 dimensional 
  tables in Oracle and the fields that are integers in Oracle became 
  integers in Postgres. Was experiencing terrible performance 
 during the 
  load and narrowed down to a particular dimensional lookup 
 problem. 
 ...
 ...
 ...
 ...
 In real life, this query is actually bound and parameterized,
 
 In that case, an EXPLAIN using literal constants is next door 
 to useless in terms of telling you what will happen in real 
 life.  You need to pay attention to exactly how the 
 parameterization is done.  Again, I'm suspecting a wrong 
 datatype indication.
 
   regards, tom lane
 

To see what happens with parametrized query in real life you could try
auto_explain contrib module.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

 -Original Message-
 From: Mladen Gogala [mailto:mladen.gog...@vmsinfo.com] 
 Sent: Thursday, January 27, 2011 12:00 PM
 To: Tom Lane
 Cc: David Wilson; Kenneth Marshall; pgsql-performance@postgresql.org
 Subject: Re: Postgres 9.0 has a bias against indexes
 
 On 1/27/2011 11:40 AM, Tom Lane wrote:
  It is worth noting that EXPLAIN results should not be extrapolated
   to situations other than the one you are actually testing; for
   example, results on a toy-sized table cannot be 
 assumed to apply to
   large tables.
 Well, that's precisely what I tried. Bummer, I will have to 
 copy a large table over.
 
 --
 Mladen Gogala
 Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 www.vmsinfo.com
 
 

Mladen,

I don't think, this is exclusive Postgres feature.
I'm pretty sure, Oracle optimizer will do TABLE ACCESS (FULL) instead
of using index on 14-row table either.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

 -Original Message-
 From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
 Sent: Thursday, January 27, 2011 3:59 PM
 To: Mladen Gogala
 Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
 pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
 
 On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
 mladen.gog...@vmsinfo.com wrote:
  On 1/27/2011 3:37 PM, Scott Marlowe wrote:
 
  On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
  mladen.gog...@vmsinfo.com  wrote:
 
  There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
 
  That's because Oracle has covering indexes.
 
  I am not sure what you mean by covering indexes but I 
 hope that for 
  the larger table I have in mind,  indexes will be used.  
 For a small 
  table like
 
 In Oracle you can hit JUST the index to get the data you need 
 (and maybe rollback logs, which are generally pretty small)
 
 In Pgsql, once you hit the index you must then hit the actual 
 data store to get the right version of your tuple.  So, index 
 access in pg is more expensive than in Oracle.  However, 
 updates are cheaper.
 Always a trade off
 
 

Scott,
What you describe here isn't about covering indexes - it's about different 
ways implementing MVCC in Oracle and PG.

Mladen, 
you were right.
For recursive query like yours Oracle uses index even on small table.
I made an assumption without testing it.
However some other (non-recursive) queries against the same small table that 
also require reading all 14 rows do table scan.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

 -Original Message-
 From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
 Sent: Thursday, January 27, 2011 4:16 PM
 To: Igor Neyman
 Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
 pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
 
 On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman 
 iney...@perceptron.com wrote:
 
 
  -Original Message-
  From: Scott Marlowe [mailto:scott.marl...@gmail.com]
  Sent: Thursday, January 27, 2011 3:59 PM
  To: Mladen Gogala
  Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
  pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
 
  On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
  mladen.gog...@vmsinfo.com wrote:
   On 1/27/2011 3:37 PM, Scott Marlowe wrote:
  
   On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
   mladen.gog...@vmsinfo.com  wrote:
  
   There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
  
   That's because Oracle has covering indexes.
  
   I am not sure what you mean by covering indexes but I
  hope that for
   the larger table I have in mind,  indexes will be used.
  For a small
   table like
 
  In Oracle you can hit JUST the index to get the data you need (and 
  maybe rollback logs, which are generally pretty small)
 
  In Pgsql, once you hit the index you must then hit the actual data 
  store to get the right version of your tuple.  So, index 
 access in pg 
  is more expensive than in Oracle.  However, updates are cheaper.
  Always a trade off
 
 
 
  Scott,
  What you describe here isn't about covering indexes - 
 it's about different ways implementing MVCC in Oracle and PG.
 
 It is about covering indexes AND it's about the difference in 
 how MVCC is implemented in both databases.
 
 

Well, Mladen's query doesn't involve covering indexes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Igor Neyman
 

 -Original Message-
 From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
 Sent: Thursday, January 27, 2011 4:25 PM
 To: Igor Neyman
 Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
 pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
 
 On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman 
 iney...@perceptron.com wrote:
 
  -Original Message-
  From: Scott Marlowe [mailto:scott.marl...@gmail.com]
  Sent: Thursday, January 27, 2011 4:16 PM
  To: Igor Neyman
  Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; 
  pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
 
  On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman 
 iney...@perceptron.com 
  wrote:
  
  
   -Original Message-
   From: Scott Marlowe [mailto:scott.marl...@gmail.com]
   Sent: Thursday, January 27, 2011 3:59 PM
   To: Mladen Gogala
   Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; 
   pgsql-performance@postgresql.org
   Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
  
   On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala 
   mladen.gog...@vmsinfo.com wrote:
On 1/27/2011 3:37 PM, Scott Marlowe wrote:
   
On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala 
mladen.gog...@vmsinfo.com  wrote:
   
There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use 
 an index.
   
That's because Oracle has covering indexes.
   
I am not sure what you mean by covering indexes but I
   hope that for
the larger table I have in mind,  indexes will be used.
   For a small
table like
  
   In Oracle you can hit JUST the index to get the data 
 you need (and 
   maybe rollback logs, which are generally pretty small)
  
   In Pgsql, once you hit the index you must then hit the 
 actual data 
   store to get the right version of your tuple.  So, index
  access in pg
   is more expensive than in Oracle.  However, updates are cheaper.
   Always a trade off
  
  
  
   Scott,
   What you describe here isn't about covering indexes -
  it's about different ways implementing MVCC in Oracle and PG.
 
  It is about covering indexes AND it's about the difference in how 
  MVCC is implemented in both databases.
 
 
 
  Well, Mladen's query doesn't involve covering indexes.
 
 On Oracle?  Then how can it get the values it needs without 
 having to hit the data store?
 
 

It doesn't.
It does INDEX UNIQUE SCAN and then TABLE ACCESS BY INDEX ROWID.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman
 -Original Message-
 From: Ben [mailto:midfi...@gmail.com] 
 Sent: Thursday, October 28, 2010 12:37 PM
 To: pgsql-performance@postgresql.org
 Subject: partitioning question 1
 
 hello --
 
 my last email was apparently too long to respond to so i'll 
 split it up into shorter pieces.  my first question :
 
 my understanding of how range partitioning and constraint 
 exclusion works leads me to believe that it does not buy any 
 query performance that a clustered index doesn't already give 
 you -- the advantages are all in maintainability.  an index 
 is able to eliminate pages just as well as constraint 
 exclusion is able to eliminate table partitions.  the I/O 
 advantages of having queries target small subtables are the 
 same as the I/O advantages of clustering the index : result 
 pages in a small range are very close to each other on disk.
 
 finally, since constraint exclusion isn't as flexible as 
 indexing (i've seen old mailing list posts that say that 
 constraint exclusion only works with static constants in 
 where clauses, and only works with simple operators like ,  
 which basically forces btree indexes when i want to use gist) 
 it is indeed likely that partitioning can be slower than one 
 big table with a clustered index.
 
 is my intuition completely off on this?
 
 best regards, ben
 

If your SELECT retrieves substantial amount of records, table scan could
be more efficient than index access.

Now, if while retrieving large amount of records WHERE clause of this
SELECT still satisfies constraints on some partition(s), then obviously
one (or few) partition scans will be more efficient than full table scan
of non-partitioned table.

So, yes partitioning provides performance improvements, not only
maintenance convenience.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman
 

 -Original Message-
 From: Ben [mailto:midfi...@gmail.com] 
 Sent: Friday, October 29, 2010 12:16 PM
 To: Igor Neyman
 Cc: pgsql-performance@postgresql.org
 Subject: Re: partitioning question 1
 
 On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:
 
  is my intuition completely off on this?
  
  best regards, ben
  
  
  If your SELECT retrieves substantial amount of records, table scan 
  could be more efficient than index access.
  
  Now, if while retrieving large amount of records WHERE clause of 
  this SELECT still satisfies constraints on some partition(s), then 
  obviously one (or few) partition scans will be more efficient than 
  full table scan of non-partitioned table.
  
  So, yes partitioning provides performance improvements, not only 
  maintenance convenience.
 
 my impression was that a *clustered* index would give a lot 
 of the same I/O benefits, in a more flexible way.  if you're 
 clustered on the column in question, then an index scan for a 
 range is much like a sequential scan over a partition (as far 
 as i understand.)
 
 b
 

Even with clustered index you still read index+table, which is more
expensive than just table scan (in situation I described above).
PG clustered index is not the same as SQL Server clustered index (which
includes actual table pages on the leaf level).

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index scan is not working, why??

2010-10-21 Thread Igor Neyman
 

 -Original Message-
 From: AI Rumman [mailto:rumman...@gmail.com] 
 Sent: Thursday, October 21, 2010 1:25 AM
 To: pgsql-performance@postgresql.org
 Subject: Index scan is not working, why??
 
 I don't know why seq scan is running on the following query 
 where the same query is giving index scan on other servers:
 explain analyze
 
 select *
 from act
 where act.acttype in ( 'Meeting','Call','Task');
   QUERY PLAN
 --
 --
 
  Seq Scan on act (cost=0.00..13386.78 rows=259671 width=142) 
 (actual time=0.013..484.572 rows=263639 loops=1)
   Filter: (((acttype)::text = 'Meeting'::text) OR 
 ((acttype)::text = 'Call'::text) OR ((acttype)::text = 
 'Task'::text))  Total runtime: 732.956 ms
 (3 rows)
 
 
 The above query is giving index scan on other servers and 
 even if I rewrite the query as follows I got index scan:
 explain analyze
 
 select *
 from act
 where act.acttype = 'Meeting'
 or act.acttype = 'Call';
   QUERY PLAN
 --
 --
 --
  Bitmap Heap Scan on act (cost=17.98..1083.80 rows=2277 
 width=142) (actual time=1.901..9.722 rows=4808 loops=1)
   Recheck Cond: (((acttype)::text = 'Meeting'::text) OR 
 ((acttype)::text = 'Call'::text))
   - BitmapOr (cost=17.98..17.98 rows=2281 width=0) (actual 
 time=1.262..1.262 rows=0 loops=1)
   - Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 
 rows=1141 width=0) (actual time=0.790..0.790 rows=3181 loops=1)
   Index Cond: ((acttype)::text = 'Meeting'::text)
   - Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 
 rows=1141 width=0) (actual time=0.469..0.469 rows=1630 loops=1)
   Index Cond: ((acttype)::text = 'Call'::text)  Total 
 runtime: 14.227 ms
 (8 rows)
 
 

Index Scan is not alwayes prefarable to Seq Scan, it depends on
selectivity of your query.
When retrieving substancial portion of big table seq scan is usually
faster, that's why optimizer chooses it.

Your queries (and possibly data sets in the tables on different servers)
are not the same.
Your first query (which uses seq scan) returns 259671 which is probably
substantial part of the whole table.
Your second query (which uses index scan) returns only 4808 rows, which
makes index access less costly in this case.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman
 
 


Tony,
For your query:

 select 
 emailaddress, websiteid
   from members
  where emailok = 1
and emailbounced = 0;

your table doesn't have any indexes where emailok or emailbounced
are leading columns.
That's why existing indexes can not be used.

If you specified websiteid in the where clause then (most probably)
the index members_sorted_idx_001 will be used (based on selectivity and
statistics known to optimizer). 

If this query (as is - without websiteid) is important for your app,
create another index on (emailok, emailbounced) which should help, of
course if selectivity of your where clause is good enough (not to
perform full table scan).

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman

 -Original Message-
 From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
 Sent: Friday, October 15, 2010 2:14 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: oracle to psql migration - slow query in postgres
 
 Thanks for all your responses. What's interesting is that an 
 index is used when this query is executed in Oracle.  It 
 appears to do some parallel processing:
 
 SQL set line 200
 delete from plan_table;
 explain plan for
 select websiteid, emailaddress
   from members
  where emailok = 1
and emailbounced = 0;
 
 SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
 SQL 
 3 rows deleted.
 
 SQL   2345  
 Explained.
 
 SQL SQL 
 PLAN_TABLE_OUTPUT
 --
 --
 --
 --
 Plan hash value: 4247959398
 
 --
 -
 | Id  | Operation   | Name   | 
 Rows  | Bytes
 | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
 --
 -
 |   0 | SELECT STATEMENT||   237M|
 7248M|   469K  (2)| 01:49:33 ||  ||
 |   1 |  PX COORDINATOR ||   |
 ||  ||  ||
 |   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P-S | QC (RAND)  |
 |   3 |PX BLOCK ITERATOR||   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
 |*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
 --
 -
 
 PLAN_TABLE_OUTPUT
 --
 --
 --
 --
 
 Predicate Information (identified by operation id):
 ---
 
4 - filter(EMAILBOUNCED=0 AND EMAILOK=1)
 
 16 rows selected.
 
 

1. Postgres doesn't have FAST FULL SCAN because even if all the info
is in the index, it need to visit the row in the table (visibility
issue).

2. Postgres doesn't have parallel executions.

BUT, it's free anf has greate community support, as you already saw.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical query slower on 8.4 vs 8.3

2010-07-16 Thread Igor Neyman
 

 -Original Message-
 From: Patrick Donlin [mailto:pdon...@oaisd.org] 
 Sent: Thursday, July 15, 2010 11:13 AM
 To: Kevin Grittner; pgsql-performance@postgresql.org
 Subject: Re: Identical query slower on 8.4 vs 8.3
 
 I'll read over that wiki entry, but for now here is the 
 EXPLAIN ANALYZE output assuming I did it correctly. I have 
 run vacuumdb --full --analyze,  it actually runs as a nightly 
 cron job.
 
 8.4.4 Sever:
 Unique  (cost=202950.82..227521.59 rows=702022 width=86) 
 (actual time=21273.371..22429.511 rows=700536 loops=1)
   -  Sort  (cost=202950.82..204705.87 rows=702022 width=86) 
 (actual time=21273.368..22015.948 rows=700536 loops=1)
 Sort Key: test.tid, testresult.trscore, 
 testresult.trpossiblescore, testresult.trstart, 
 testresult.trfinish, testresult.trscorebreakdown, 
 testresult.fk_sid, test.tname, qr.qrscore, qr.qrtotalscore, 
 testresult.trid, qr.qrid
 Sort Method:  external merge  Disk: 71768kB
 -  Hash Join  (cost=2300.82..34001.42 rows=702022 
 width=86) (actual time=64.388..1177.468 rows=700536 loops=1)
   Hash Cond: (qr.fk_trid = testresult.trid)
   -  Seq Scan on questionresult qr  
 (cost=0.00..12182.22 rows=702022 width=16) (actual 
 time=0.090..275.518 rows=702022 loops=1)
   -  Hash  (cost=1552.97..1552.97 rows=29668 
 width=74) (actual time=63.042..63.042 rows=29515 loops=1)
 -  Hash Join  (cost=3.35..1552.97 
 rows=29668 width=74) (actual time=0.227..39.111 rows=29515 loops=1)
   Hash Cond: (testresult.fk_tid = test.tid)
   -  Seq Scan on testresult  
 (cost=0.00..1141.68 rows=29668 width=53) (actual 
 time=0.019..15.622 rows=29668 loops=1)
   -  Hash  (cost=2.60..2.60 rows=60 
 width=21) (actual time=0.088..0.088 rows=60 loops=1)
 -  Seq Scan on test  
 (cost=0.00..2.60 rows=60 width=21) (actual time=0.015..0.044 
 rows=60 loops=1)
 Total runtime: 22528.820 ms
 
 8.3.7 Server:
 Unique  (cost=202950.82..227521.59 rows=702022 width=86) 
 (actual time=22157.714..23343.461 rows=700536 loops=1)
   -  Sort  (cost=202950.82..204705.87 rows=702022 width=86) 
 (actual time=22157.706..22942.018 rows=700536 loops=1)
 Sort Key: test.tid, testresult.trscore, 
 testresult.trpossiblescore, testresult.trstart, 
 testresult.trfinish, testresult.trscorebreakdown, 
 testresult.fk_sid, test.tname, qr.qrscore, qr.qrtotalscore, 
 testresult.trid, qr.qrid
 Sort Method:  external merge  Disk: 75864kB
 -  Hash Join  (cost=2300.82..34001.42 rows=702022 
 width=86) (actual time=72.842..1276.634 rows=700536 loops=1)
   Hash Cond: (qr.fk_trid = testresult.trid)
   -  Seq Scan on questionresult qr  
 (cost=0.00..12182.22 rows=702022 width=16) (actual 
 time=0.112..229.987 rows=702022 loops=1)
   -  Hash  (cost=1552.97..1552.97 rows=29668 
 width=74) (actual time=71.421..71.421 rows=29515 loops=1)
 -  Hash Join  (cost=3.35..1552.97 
 rows=29668 width=74) (actual time=0.398..44.524 rows=29515 loops=1)
   Hash Cond: (testresult.fk_tid = test.tid)
   -  Seq Scan on testresult  
 (cost=0.00..1141.68 rows=29668 width=53) (actual 
 time=0.117..20.890 rows=29668 loops=1)
   -  Hash  (cost=2.60..2.60 rows=60 
 width=21) (actual time=0.112..0.112 rows=60 loops=1)
 -  Seq Scan on test  
 (cost=0.00..2.60 rows=60 width=21) (actual time=0.035..0.069 
 rows=60 loops=1)
 Total runtime: 23462.639 ms
 
 
 Thanks for the quick responses and being patient with me not 
 providing enough information.
 -Patrick
 

Well, now that you've got similar runtime on both 8.4.4 and 8.3.7, here
is a suggestion to improve performance of this query based on EXPLAIN
ANALYZE you proveded (should have done it in your first e-mail).

EXPLAIN ANALYZE shows that most of the time (22015 ms on 8.4.4) spent on
sorting you result set.
And according to this: Sort Method:  external merge  Disk: 71768kB -
sorting is done using disk, meaning your work_mem setting is not
sufficient to do this sort in memory (I didn't go back through this
thread far enough, to see if you provided info on how it is set).

I'd suggest to increase the value up to ~80MB, if not for the system,
may be just for the session running this query.
Then see if performance improved.

And, with query performance issues always start with EXPLAIN ANALYZE.

Regards,
Igor Neyman 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Internal operations when the planner makes a hash join.

2010-02-24 Thread Igor Neyman
 

 -Original Message-
 From: negora [mailto:neg...@negora.com] 
 Sent: Tuesday, February 23, 2010 4:33 PM
 To: Scott Carey
 Cc: Alvaro Herrera; pgsql-performance@postgresql.org
 Subject: Re: Internal operations when the planner makes a hash join.
 
 Thank you for explaining me the internal behaviour of the 
 PostgreSQL engine. I'll try to look for more information 
 about that hash tables. It sounds really really interesting. 
 Your information was very useful.
 
 The origin of my doubt resides in the fact that I need to do 
 a joint between 3 HUGE tables (millions of registries) and do 
 certain operations with the retrieved information. I was 
 deciding whether to use one SELECT with 3 JOINs, as I've been 
 doing since the beginning, or build a PL/PgSQL function based 
 on 3 nested FOR ... IN SELECT ... LOOP 
 structures which tried to minimize the subsequent table 
 searches storing intermediate useful data in arrays 
 (curiously, these would act as the hash tables which you 
 mention, but in a very very rudimentary way). In a case like 
 this one (possibly unable to fit in RAM), Is also JOIN the 
 best solution?
 
 Since I've to retrieve such a big amount of columns and 
 crossed registries I had started to think that using 1 SELECT 
 with 3 JOINs would increase the number of table searches a 
 LOT and also duplicate the information too much. I mean 
 duplicate as in this case, where the Factor 1 appears 
 millions of times for every Element:
 
 Element 1 | Sub-factor 1 | Factor 1
 Element 2 | Subf-actor 1 | Factor 1
 ...
 Element 12639747465586 | Sub-factor 1 | Factor 1 Element 1 | 
 Sub-factor 2 | Factor 1
 
 I hope not to robber you much time but... What do you think 
 about it? Is it better either 1 SELECT with 3 JOINs or build 
 nested FOR ... IN SELECT ... LOOP structures? Could it be 
 one of that cases in which I've to choose between either 
 higher speed but higher memory consume (3
 JOINs) or lower speed but less memory expense (3 FORs)?
 
 Thanks again and apologizes for extending this topic too much.
 
 
 Scott Carey wrote:
  On Feb 23, 2010, at 8:53 AM, Alvaro Herrera wrote:
 

  negora wrote:
 
  
  According to how I understood the process, the engine 
 would get the 
  name from the student with ID 1 and would look for the 
 name of the 
  father with ID 1 in the hashed table. It'd do exactly the 
 same with 
  the student #2 and father #2. But my big doubt is about 
 the 3rd one 
  (Anthony). Would the engine know that it already had 
 retrieved the 
  father's name for the student 1 and would avoid searching for it 
  into the hashed table (using some kind of internal 
 mechanism which 
  allows to re-utilize the name)? Or would it search into 
 the hashed 
  table again?br

  The hash table is searched again.  But that's fast, because it's a 
  hash table.
 
  
 
  To answer the question another way, remembering that it 
 has already seen father A once and tracking that would use a 
 hash table to remember that fact.  
 
  The hash table created by the first scan IS the remember 
 you have seen this father data structure, optimized for fast 
 lookup.  So before even looking at the first student, the 
 hash table is built so that it is fast to find out if a 
 father has been seen before, and if so where that father's 
 data is located.  Looking this data up is often referred to 
 as a probe and not a scan because it takes just as long 
 to do if the hash table has 100 entries or 1 entries.  
 The drawback is that the whole thing has to fit in RAM.
 
 

  -- 
  Alvaro Herrera
 http://www.CommandPrompt.com/
  The PostgreSQL Company - Command Prompt, Inc.
 
  --
  Sent via pgsql-performance mailing list 
  (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
  
 
 

 

So, you are trying to do nested loop in PL/PgSQL.
Why not let optimizer decide between nested loop and hash join based
on your memory settings and statistics collected for objects involved?
I'm pretty sure, it'll be faster than PL/PgSQL 3 nested loops.

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance