[PERFORM] Row estimates off by two orders of magnitude with hstore

2015-06-10 Thread Patrick Krecker
Hi everyone --

I had an issue the other day where a relatively simple query went from
taking about 1 minute to execute to taking 19 hours. It seems that the
planner chooses to use a materialize sometimes [1] and not other times
[2]. I think the issue is that the row count estimate for the result
of the condition type_id = 23 and ref.attributes ? 'reference' is
about 10k rows, but the actual result is 4624280. It seems the
estimate varies slightly over time, and if it drops low enough then
the planner decides to materialize the result of the bitmap heap scan
and the query takes forever.

As an exercise, I tried removing the clause ref.attributes ?
'reference' and the estimates are very accurate [3].

It seems to me that improving the row estimates would be prudent, but
I can't figure out how postgres figures out the estimate for the
hstore ? operator. I suspect it is making a wild guess, since it has
no data on the contents of the hstore in its estimates.

[1] Query plan with materialize:
# explain (analyze, buffers) declare foo_cursor cursor for  SELECT
  ref.case_id, array_agg(ref.attributes - 'reference') FROM
component ref JOIN document c ON c.id = ref.case_id WHERE ref.type_id
= 23 AND ref.attributes ? 'reference' AND NOT 0 = ANY(c.types) GROUP
BY ref.case_id;

 QUERY PLAN
---
 GroupAggregate  (cost=390241.39..16973874.43 rows=10 width=35)
(actual time=81253.653..69928443.721 rows=90969 loops=1)
   Buffers: shared hit=54539 read=1551654, temp read=11959529274 written=126377
   -  Nested Loop  (cost=390241.39..16973835.54 rows=5169 width=35)
(actual time=16157.261..69925063.268 rows=2488142 loops=1)
 Join Filter: (ref.case_id = c.id)
 Rows Removed by Join Filter: 437611625378
 Buffers: shared hit=54539 read=1551654, temp read=11959529274
written=126377
 -  Index Scan using document_pkey on document c
(cost=0.42..314999.89 rows=99659 width=4) (actual
time=0.016..59255.527 rows=94634 loops=1)
   Filter: (0  ALL (types))
   Rows Removed by Filter: 70829
   Buffers: shared hit=54539 read=113549
 -  Materialize  (cost=390240.97..3189944.33 rows=9010
width=35) (actual time=0.088..450.898 rows=4624280 loops=94634)
   Buffers: shared read=1438105, temp read=11959529274
written=126377
   -  Bitmap Heap Scan on component ref
(cost=390240.97..3189899.28 rows=9010 width=35) (actual
time=8107.625..79508.136 rows=4624280 loops=1)
 Recheck Cond: (type_id = 23)
 Rows Removed by Index Recheck: 49237707
 Filter: (attributes ? 'reference'::text)
 Rows Removed by Filter: 4496624
 Buffers: shared read=1438105
 -  Bitmap Index Scan on component_type_id
(cost=0.00..390238.72 rows=9009887 width=0) (actual
time=8105.963..8105.963 rows=959 loops=1)
   Index Cond: (type_id = 23)
   Buffers: shared read=156948
 Total runtime: 69928552.870 ms

[2] Query plan with simple bitmap heap scan:
# explain (analyze, buffers) declare foo_cursor cursor for  SELECT
  ref.case_id, array_agg(ref.attributes - 'reference') FROM
component ref JOIN document c ON c.id = ref.case_id WHERE ref.type_id
= 23 AND ref.attributes ? 'reference' AND NOT 0 = ANY(c.types) GROUP
BY ref.case_id;

QUERY PLAN
-
 HashAggregate  (cost=3724614.33..3724614.46 rows=10 width=34) (actual
time=248900.627..249030.530 rows=90969 loops=1)
   Buffers: shared hit=16962106 read=5551197 written=5806
   -  Nested Loop  (cost=488092.46..3724570.36 rows=5863 width=34)
(actual time=30839.635..246416.327 rows=2488142 loops=1)
 Buffers: shared hit=16962106 read=5551197 written=5806
 -  Bitmap Heap Scan on component ref
(cost=488088.03..3638070.85 rows=10220 width=34) (actual
time=30833.215..196239.109 rows=4624280 loops=1)
   Recheck Cond: (type_id = 23)
   Rows Removed by Index Recheck: 57730489
   Filter: (attributes ? 'reference'::text)
   Rows Removed by Filter: 4496624
   Buffers: shared hit=6922 read=1901840 written=2252
   -  Bitmap Index Scan on component_type_id
(cost=0.00..488085.48 rows=10220388 width=0) (actual
time=30811.185..30811.185 rows=13292968 loops=1)
 Index Cond: (type_id = 23)
 Buffers: shared hit=6922 read=162906 written=1529
 -  Bitmap Heap Scan on document c  (cost=4.43..8.45 rows=1
width=4) (actual time=0.010..0.010 rows=1 loops=4624280)
   Recheck Cond: (id = ref.case_id)
   

Re: [PERFORM] Row estimates off by two orders of magnitude with hstore

2015-06-10 Thread Patrick Krecker
On Wed, Jun 10, 2015 at 2:08 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Jun 10, 2015 at 3:55 PM, Patrick Krecker patr...@judicata.com wrote:
 OK. Well, fortunately for us, we have a lot of possible solutions this
 problem, and it sounds like actually getting statistics for attributes
 ? 'reference' is not realistic. I just wanted to make sure it wasn't
 some configuration error on our part.

 Can anyone explain where exactly the estimate for that clause comes from?

 I tried adding an index and I don't think it improved the estimation,
 the planner still thinks there will be 9k rows as a result of type_id
 = 23 and attributes ? 'reference'. [1]. It might make the pathological
 plan less likely though. It's not clear to me that it reduces the risk
 of a pathological plan to zero.

 no, but done in conjunction with disabling managing out nestloops and
 materliaze query plans, nestloops (say, via SET LOCAL) it will
 probably be fast and future proof..

 merlin

Wouldn't wrapping it in an optimization fence (e.g. SELECT * FROM
(...) AS t WHERE t.attributes ? 'reference') have the same effect as
disabling materialize, but allow the planner to optimize the inner
query however it wants?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Row estimates off by two orders of magnitude with hstore

2015-06-10 Thread Patrick Krecker
OK. Well, fortunately for us, we have a lot of possible solutions this
problem, and it sounds like actually getting statistics for attributes
? 'reference' is not realistic. I just wanted to make sure it wasn't
some configuration error on our part.

Can anyone explain where exactly the estimate for that clause comes from?

I tried adding an index and I don't think it improved the estimation,
the planner still thinks there will be 9k rows as a result of type_id
= 23 and attributes ? 'reference'. [1]. It might make the pathological
plan less likely though. It's not clear to me that it reduces the risk
of a pathological plan to zero.

I also tried wrapping it in a subquery [2]. The estimate is, of
course, still awful, but it doesn't matter anymore because it can't
pick a plan that leverages its low estimate. Its only choice is a
simple filter on the results.

[1]
# CREATE INDEX foobarbaz ON component((attributes - 'reference'))
WHERE ( attributes ? 'reference' );

CREATE INDEX

judicata=# explain (analyze, buffers) declare foo_cursor cursor for
SELECT ref.case_id, array_agg(ref.attributes - 'reference')
FROM component ref JOIN document c ON c.id = ref.case_id WHERE
ref.type_id = 23 AND ref.attributes ? 'reference' AND NOT 0 =
ANY(c.types) GROUP BY ref.case_id;

 QUERY PLAN
---
 HashAggregate  (cost=419667.86..419667.99 rows=10 width=34) (actual
time=97074.773..97197.487 rows=90969 loops=1)
   Buffers: shared hit=16954389 read=4533956 dirtied=2963 written=4759
   -  Nested Loop  (cost=5472.44..419628.76 rows=5213 width=34)
(actual time=537.202..94710.844 rows=2488142 loops=1)
 Buffers: shared hit=16954389 read=4533956 dirtied=2963 written=4759
 -  Bitmap Heap Scan on component ref
(cost=5468.01..342716.88 rows=9087 width=34) (actual
time=534.862..49617.945 rows=4624280 loops=1)
   Recheck Cond: (attributes ? 'reference'::text)
   Rows Removed by Index Recheck: 28739170
   Filter: (type_id = 23)
   Rows Removed by Filter: 165268
   Buffers: shared hit=25 read=921758 dirtied=2963 written=906
   -  Bitmap Index Scan on foobarbaz  (cost=0.00..5465.74
rows=98636 width=0) (actual time=532.215..532.215 rows=4789548
loops=1)
 Buffers: shared read=59300 written=57
 -  Bitmap Heap Scan on document c  (cost=4.43..8.45 rows=1
width=4) (actual time=0.009..0.009 rows=1 loops=4624280)
   Recheck Cond: (id = ref.case_id)
   Filter: (0  ALL (types))
   Rows Removed by Filter: 0
   Buffers: shared hit=16954364 read=3612198 written=3853
   -  Bitmap Index Scan on document_pkey
(cost=0.00..4.43 rows=1 width=0) (actual time=0.003..0.003 rows=1
loops=4624280)
 Index Cond: (id = ref.case_id)
 Buffers: shared hit=14082540 read=1859742 written=1974
 Total runtime: 97217.718 ms

[2]
# explain (analyze, buffers) declare foo_cursor cursor for  SELECT *
FROM (SELECT ref.case_id as case_id, array_agg(ref.attributes
- 'reference') as reference FROM component ref JOIN document c ON
c.id = ref.case_id WHERE ref.type_id = 23 AND NOT 0 = ANY(c.types)
GROUP BY ref.case_id) as t WHERE reference IS NOT NULL;

  QUERY PLAN

 GroupAggregate  (cost=5636347.52..12524155.45 rows=9817 width=34)
(actual time=165466.502..195035.433 rows=93580 loops=1)
   Filter: (array_agg((ref.attributes - 'reference'::text)) IS NOT NULL)
   Buffers: shared hit=13884 read=2085572 written=2952, temp
read=902337 written=902337
   -  Merge Join  (cost=5636347.52..12458841.11 rows=5213367
width=34) (actual time=165383.814..193813.490 rows=5115136 loops=1)
 Merge Cond: (c.id = ref.case_id)
 Buffers: shared hit=13884 read=2085572 written=2952, temp
read=902337 written=902337
 -  Index Scan using document_pkey on document c
(cost=0.43..6696889.20 rows=2128590 width=4) (actual
time=0.009..24720.726 rows=94634 loops=1)
   Filter: (0  ALL (types))
   Rows Removed by Filter: 70829
   Buffers: shared hit=13852 read=195821
 -  Materialize  (cost=5636345.76..5681782.42 rows=9087332
width=34) (actual time=165383.798..168027.149 rows=9120904 loops=1)
   Buffers: shared hit=32 read=1889751 written=2952, temp
read=902337 written=902337
   -  Sort  (cost=5636345.76..5659064.09 rows=9087332
width=34) (actual time=165383.793..167173.325 rows=9120904 loops=1)
 Sort Key: ref.case_id
 Sort Method: external merge  Disk: 1392648kB
 Buffers: shared hit=32 read=1889751 written=2952,

Re: [PERFORM] Tuning the configuration

2014-12-10 Thread Patrick Krecker
On Wed, Dec 10, 2014 at 2:44 AM, Maila Fatticcioni
mfatticci...@mbigroup.it wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello.
 I need to tune a postgres installation I've just made to get a better
 performance. I use two identical servers with a hot replication
 configuration. The two servers have the following hardware:

 Dual Processor Intel Xeon E5-2640V2 20Mb cache 2.00Ghz,
 Ram Mem. 32Gb DDR-3 Ecc Registered,
 Controller MegaRaid 8-ports 1Gb cache,
 4 Enterprise Hdd NL Sas 600 4Tb Sata,
 2 Samsung SSD 840 Pro Series 512Gb,
 2 Hdd 500 Gb

 I made a software raid with the last two hard disks with ext4 and I
 installed Ubuntu 14.04.1 LTS (I have to use this SO) on it. I made a
 hardware raid with the four SAS hard disks and I mount the partition
 on it with ext4 without journaling and I put the database on it.

 Now I have two more steps to do.

 1- could you please help tuning the configuration? What are the best
 value I should use for wal_buffers and shared_buffers?
 2- I would like to use the two SDD to store the wal file. Do you think
 it is useful or how should I use them?

 Thank you for your answers.

 Best Regards,
 Maila Fatticcioni
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iEYEARECAAYFAlSII/gACgkQi2q3wPb3FcPUuACgg2m2o9dQWavLrN2EmmmCpGEv
 YnMAoN0R/gejcKwnxf0qFPKXtaGaIG1A
 =oLxU
 -END PGP SIGNATURE-


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

We used [1] to great effect when setting our server up. We have not
had to diverge much from the recommendations in that document.

Generally, the specifics of tuning depend on the workload of your
specific instance.

[1] https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] CTE query plan ignores selective index

2014-12-02 Thread Patrick Krecker
Hi all -- I am trying to do a better job of understanding why the
planner chooses some plans over others, and I ran into this issue this
morning where the planner ends up choosing a query that's 15000x
slower. I have a table that represents nodes (here called
components) in a tree. Each node has a parent_id; the root node has
a NULL parent_id. I wanted to find the route from each node to its
root. Here is my query:

# explain analyze  WITH RECURSIVE path(start, id, internal_id,
parent_id, document_id, depth) AS (
SELECT t.id, t.id, t.internal_id, t.parent_id, t.document_id, 1
FROM component t
WHERE id  6361197
UNION ALL
SELECT path.start, t.id, t.internal_id, t.parent_id,
t.document_id, path.depth+1
FROM component t, path
WHERE t.internal_id = path.parent_id AND t.document_id=path.document_id
)
SELECT * FROM path ;

  QUERY PLAN

 CTE Scan on path  (cost=61484650.85..61484654.39 rows=177 width=24)
(actual time=0.007..36.958 rows=1007 loops=1)
   CTE path
 -  Recursive Union  (cost=0.57..61484650.85 rows=177 width=24)
(actual time=0.007..36.755 rows=1007 loops=1)
   -  Index Scan using component_pkey on component t
(cost=0.57..644.56 rows=167 width=16) (actual time=0.006..0.076
rows=218 loops=1)
 Index Cond: (id  6361197)
   -  Nested Loop  (cost=0.57..6148400.28 rows=1 width=24)
(actual time=0.063..4.054 rows=88 loops=9)
 -  WorkTable Scan on path path_1  (cost=0.00..33.40
rows=1670 width=16) (actual time=0.000..0.006 rows=112 loops=9)
 -  Index Scan using component_document_id on
component t_1  (cost=0.57..3681.65 rows=1 width=16) (actual
time=0.023..0.036 rows=1 loops=1007)
   Index Cond: (document_id = path_1.document_id)
   Filter: (path_1.parent_id = internal_id)
   Rows Removed by Filter: 237
 Total runtime: 37.039 ms


However, when I add one more row to the seed query of the CTE, it
changes the plan to this:

# explain analyze  WITH RECURSIVE path(start, id, internal_id,
parent_id, document_id, depth) AS (
SELECT t.id, t.id, t.internal_id, t.parent_id, t.document_id, 1
FROM component t
WHERE id  6361198
UNION ALL
SELECT path.start, t.id, t.internal_id, t.parent_id,
t.document_id, path.depth+1
FROM component t, path
WHERE t.internal_id = path.parent_id AND t.document_id=path.document_id
)
SELECT * FROM path ;

 QUERY PLAN
--
 CTE Scan on path  (cost=61122640.05..61122643.61 rows=178 width=24)
(actual time=0.008..587814.729 rows=1008 loops=1)
   CTE path
 -  Recursive Union  (cost=0.57..61122640.05 rows=178 width=24)
(actual time=0.006..587814.346 rows=1008 loops=1)
   -  Index Scan using component_pkey on component t
(cost=0.57..648.36 rows=168 width=16) (actual time=0.006..0.076
rows=219 loops=1)
 Index Cond: (id  6361198)
   -  Hash Join  (cost=5543292.40..6112198.81 rows=1
width=24) (actual time=64743.932..65312.625 rows=88 loops=9)
 Hash Cond: ((path_1.parent_id = t_1.internal_id) AND
(path_1.document_id = t_1.document_id))
 -  WorkTable Scan on path path_1  (cost=0.00..33.60
rows=1680 width=16) (actual time=0.001..0.015 rows=112 loops=9)
 -  Hash  (cost=3627866.96..3627866.96 rows=96335696
width=16) (actual time=64572.641..64572.641 rows=94613537 loops=9)
   Buckets: 4096  Batches: 8192  Memory Usage: 537kB
   -  Seq Scan on component t_1
(cost=0.00..3627866.96 rows=96335696 width=16) (actual
time=0.005..43364.346 rows=94613537 loops=9)
 Total runtime: 587814.885 ms

I would think that it has decided that the document_id index is not
very selective for the given mix of rows, however I checked the
statistics for the table and I found that n_distinct for document_id
is 101559 (the true value is 162545). The value of pg_class.reltuples
for the table is 96055600, which is very close to the true value
94613537.

In the first query, it appears to me that postgres thinks the index
scan is much more expensive than it really is. However, given the
accurate statistics, I can't see how.

BTW I tried playing with random_page_cost. If I lower it to 2.0 then
it chooses the fast plan. At 3.0 it chooses the slow plan.

Thanks!
Patrick


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] General slowness when retrieving a relatively small number of rows

2013-11-15 Thread Patrick Krecker
Hey everyone --

I am debugging an issue with our Postgres machine running on EC2. We are
experiencing slowness when retrieving about 14k rows from a larger table of
140MM rows. Initially I thought it was an indexing problem (doing VACUUM
FULL reduced the index size from 12gb to 8gb), but the slowness persisted.

I created another table with only the subset of data we are interested in,
and simply doing SELECT * on the table takes 21ms, as opposed to 2ms on my
MBP. I examined the relationship between the length of the table scan on
this table with 14032 rows and the query time, and I got these results:

Table public.patrick_component
  Column  |  Type   | Modifiers
--+-+---
 id   | integer |
 case_id  | integer |
 type_id  | integer |
 offset   | integer |
 length   | integer |
 internal_id  | integer |
 parent_id| integer |
 right_sibling_id | integer |

# Rows  MBP   EC2
1   0.035 ms  0.076 ms
10  0.017 ms  0.048 ms
100 0.033 ms  0.316 ms
10000.279 ms  3.166 ms
1   2.477 ms  31.006 ms
10  4.375 ms  42.634 ms # there are fewer than 100k rows in the table;
for some reason LIMIT is slower than without LIMIT

As such, I have decided that it's not an issue with the index. To me this
looks disk caching related, however, the entire table is only 832k, which
should be plenty small to fit entirely into memory (I also ran this
multiple times and in reverse, and the results are the same).

The machine has 30gb of memory for a 45g database. The machine's only
purpose is for Postgres.

Here are the relevant performance tweaks I have made:
shared_buffers = 8448MB
work_mem = 100MB
maintenance_work_mem = 1024MB
wal_buffers = 8MB
effective_cache_size = 22303MB

I have been struggling to make these types of query fast because they are
very common (basically fetching all of the metadata for a document, and we
have a lot of metadata and a lot of documents). Any help is appreciated!

Thanks,
Patrick