Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-16 Thread Nicolas Charles

Le 15/02/2013 19:04, Jeff Janes a écrit :

On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
nicolas.char...@normation.com wrote:

On 14/02/2013 20:27, Jeff Janes wrote:

I'm surprised that the executiontimestamp index is not used, since it
seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?

The index scan are on nodeid_idx and configurationruleid_idx, not on
executiontimestamp
Or am I misreading the output ?

You are correct about the use of nodeid_idx and
configurationruleid_idx.  But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan.  The
information given by the explain plan is not sufficient to decide that
one way or the other.  Some more recent improvements in explain
(analyze, buffers) would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion.  The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.

I added this index
composite_idx btree (ruleid, serial, executiontimestamp)
and lowered the shared_buffer to 54 MB.

For this specific query, it is indeed a bit better. (23s against 28s)

 Sort  (cost=43449.44..43459.07 rows=3854 width=252) (actual 
time=23375.247..23394.704 rows=62403 loops=1)

   Sort Key: executiontimestamp
   Sort Method:  external merge  Disk: 17480kB
   -  Bitmap Heap Scan on ruddersysevents (cost=28884.44..43219.89 
rows=3854 width=252) (actual time=1528.704..23155.991 rows=62403 loops=1)
 Recheck Cond: ((ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND 
(executiontimestamp = to_timestamp('2012-11-22 16:00:16.005'::text, 
'-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp = 
to_timestamp('2013-01-25 18:53:52.467'::text, '-MM-DD 
HH24:MI:SS.MS'::text)) AND (nodeid = 
'31264061-5ecb-4891-9aa4-83824178f43d'::text))
 -  BitmapAnd  (cost=28884.44..28884.44 rows=3854 width=0) 
(actual time=1517.310..1517.310 rows=0 loops=1)
   -  Bitmap Index Scan on composite_idx 
(cost=0.00..3062.16 rows=59325 width=0) (actual time=775.418..775.418 
rows=811614 loops=1)
 Index Cond: ((ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text) AND (serial = 10) AND 
(executiontimestamp = to_timestamp('2012-11-22 16:00:16.005'::text, 
'-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp = 
to_timestamp('2013-01-25 18:53:52.467'::text, '-MM-DD 
HH24:MI:SS.MS'::text)))
   -  Bitmap Index Scan on nodeid_idx (cost=0.00..25820.11 
rows=717428 width=0) (actual time=714.254..714.254 rows=690503 loops=1)
 Index Cond: (nodeid = 
'31264061-5ecb-4891-9aa4-83824178f43d'::text)

 Total runtime: 23419.411 ms
(11 lignes)

But since there were a lot of Sort Method:  external merge, and a lot of 
object instanciations in our batch, we tried to split the job on 5 days 
interval, so the average number of line returned is closed to 5000


Load/IO of the system was *significantly* lower during the batch 
treatment with this index than with the previous attempt (load around .7 
instead of 1.8)


And batch execution time is now 1h08 rather than 2h20


If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

Oh !
So I completely misunderstood the meaning of shared_buffer; I figured that
it was somehow the place where the data would be stored by postgres (like
indexes)

That is correct, it is the space used by postgres to cache data.  But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres.  On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway.  If shared_buffers  RAM/2,
it won't be very successful at this, but it will still try.  The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.




Thank you very much for your explainations !

Nicolas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-15 Thread Nicolas Charles

On 14/02/2013 20:27, Jeff Janes wrote:

On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
nicolas.char...@normation.com  wrote:

It contains 11018592 entries, with the followinf patterns :
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?


Around 65 000 entries
.

I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.

I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on 
executiontimestamp

Or am I misreading the output ?


For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes

I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out.  What is smallish?


A VM with 1 core, 2 GB RAM, a single hard drive


with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB

effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.


For the sake of the test, the VM got its memory increased, with no 
significant changes



checkpoint_segments = 6

Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM

If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

Oh !
So I completely misunderstood the meaning of shared_buffer; I figured 
that it was somehow the place where the data would be stored by postgres 
(like indexes)






create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);

I wouldn't expect this to work well for this particular query.  Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure.  You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.



2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);


I doubt that 84888.349 vs 83717.901 is really a meaningful difference.


3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below

If the batch processing adds data, it is not surprising the query
slows down.  It looks like it is still faster at the end then the
previous two cases, right?


Actually, the batch reads data from this table, and writes into another. 
So the content of the table doesn't change at all

And yes, it is faster than the two previous case


So my question is :
Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?

I don't know what column you are referring to here.  But it sounds
like you think that dropping the leading column from an index is a
minor change.  It is not.  It makes a fundamentally different index.


I was refering to the executionTimeStamp column. I know it is a HUGE 
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I 
can test better what I'm doing.


Thank you !
Nicolas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-15 Thread Jeff Janes
On Fri, Feb 15, 2013 at 1:00 AM, Nicolas Charles
nicolas.char...@normation.com wrote:
 On 14/02/2013 20:27, Jeff Janes wrote:

 On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
 nicolas.char...@normation.com  wrote:

 It contains 11018592 entries, with the followinf patterns :
 108492 distinct executiontimestamp
 14 distinct nodeid
 59 distinct directiveid
 26 distinct ruleid
 35 distinct serial

 How many entries fall within a typical query interval of
 executiontimestamp?


 Around 65 000 entries

 .

 I'm surprised that the executiontimestamp index is not used, since it
 seems
 to be where most of the query time is spent.

 I do not draw that conclusion from your posted information.  Can you
 highlight the parts of it that lead you to this conclusion?

 The index scan are on nodeid_idx and configurationruleid_idx, not on
 executiontimestamp
 Or am I misreading the output ?

You are correct about the use of nodeid_idx and
configurationruleid_idx.  But the part about most of the time going
into filtering out rows that fail to match executiontimestamp is the
part that I don't think is supported by the explained plan.  The
information given by the explain plan is not sufficient to decide that
one way or the other.  Some more recent improvements in explain
(analyze, buffers) would make it easier (especially with
track_io_timing on) but it would still be hard to draw a definitive
conclusion.  The most definitive thing would be to do the experiment
of adding executiontimestamp as a *trailing* column to the end of one
of the existing indexes and see what happens.


 If the indexes fit in RAM, they fit in RAM.  If anything, increasing
 shared_buffers could make it harder to fit them entirely in RAM.  If
 your shared buffers undergo a lot of churn, then the OS cache and the
 shared buffers tend to uselessly mirror each other, meaning there is
 less space for non-redundant pages.

 Oh !
 So I completely misunderstood the meaning of shared_buffer; I figured that
 it was somehow the place where the data would be stored by postgres (like
 indexes)

That is correct, it is the space used by postgres to cache data.  But,
the rest of RAM (beyond shared_buffers) will also be used to cache
data; but by the OS rather than by postgres.  On a dedicated server,
the OS will generally choose to (or at least attempt to) use this
space for the benefit of postgres anyway.  If shared_buffers  RAM/2,
it won't be very successful at this, but it will still try.  The
kernel and postgres do not have intimate knowledge of each other, so
it is hard to arrange that all pages show up in just one place or the
other but not both.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-15 Thread Merlin Moncure
On Thu, Feb 14, 2013 at 9:35 AM, Nicolas Charles
nicolas.char...@normation.com wrote:
 Hello,

 I've been struggling to understand what's happening on my databases/query
 for several days, and I'm turning to higher mind for a logical answer.

 I'm dealing with a fairly large database, containing logs informations, that
 I crunch to get data out of it, with several indexes on them that I hoped
 were logical

 \d ruddersysevents
  Table « public.ruddersysevents »
   Colonne   |   Type   |
 Modificateurs
 +--+--
  id | integer  | non NULL Par défaut,
 nextval('serial'::regclass)
  executiondate  | timestamp with time zone | non NULL
  nodeid | text | non NULL
  directiveid| text | non NULL
  ruleid | text | non NULL
  serial | integer  | non NULL
  component  | text | non NULL
  keyvalue   | text |
  executiontimestamp | timestamp with time zone | non NULL
  eventtype  | character varying(64)|
  policy | text |
  msg| text |
 Index :
 ruddersysevents_pkey PRIMARY KEY, btree (id)
 component_idx btree (component)
 configurationruleid_idx btree (ruleid)
 executiontimestamp_idx btree (executiontimestamp)
 keyvalue_idx btree (keyvalue)
 nodeid_idx btree (nodeid)
 Contraintes de vérification :
 ruddersysevents_component_check CHECK (component  ''::text)
 ruddersysevents_configurationruleid_check CHECK (ruleid  ''::text)
 ruddersysevents_nodeid_check CHECK (nodeid  ''::text)
 ruddersysevents_policyinstanceid_check CHECK (directiveid  ''::text)


 It contains 11018592 entries, with the followinf patterns :
 108492 distinct executiontimestamp
 14 distinct nodeid
 59 distinct directiveid
 26 distinct ruleid
 35 distinct serial

 Related table/index size are
 relation|  size
 +-
  public.ruddersysevents | 3190 MB
  public.nodeid_idx  | 614 MB
  public.configurationruleid_idx | 592 MB
  public.ruddersysevents_pkey| 236 MB
  public.executiontimestamp_idx  | 236 MB


 I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial
 with an executiontimestamp in an interval:

 explain analyze select executiondate, nodeid, ruleid, directiveid, serial,
 component, keyValue, executionTimeStamp, eventtype, policy, msg from
 RudderSysEvents where 1=1  and nodeId =
 '31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId =
 '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and
 executiontimestamp between to_timestamp('2012-11-22 16:00:16.005',
 '-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467',
 '-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
  Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual
 time=28628.922..28647.952 rows=62403 loops=1)
Sort Key: executiontimestamp
Sort Method:  external merge  Disk: 17480kB
-  Bitmap Heap Scan on ruddersysevents  (cost=74359.66..292896.27
 rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
  Recheck Cond: ((nodeid =
 '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid =
 '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
  Filter: ((serial = 10) AND (executiontimestamp =
 to_timestamp('2012-11-22 16:00:16.005'::text, '-MM-DD
 HH24:MI:SS.MS'::text)) AND (executiontimestamp = to_timestamp('2013-01-25
 18:53:52.467'::text, '-MM-DD HH24:MI:SS.MS'::text)))
  -  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0) (actual
 time=1228.610..1228.610 rows=0 loops=1)
-  Bitmap Index Scan on nodeid_idx  (cost=0.00..25795.17
 rows=716237 width=0) (actual time=421.365..421.365 rows=690503 loops=1)
  Index Cond: (nodeid =
 '31264061-5ecb-4891-9aa4-83824178f43d'::text)
-  Bitmap Index Scan on configurationruleid_idx
 (cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490
 rows=1381391 loops=1)
  Index Cond: (ruleid =
 '61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
  Total runtime: 28657.352 ms



 I'm surprised that the executiontimestamp index is not used, since it seems
 to be where most of the query time is spent.

 For all my tests, I removed all the incoming logs, so that this table has
 only selects and no writes

 I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
 with the following non default configuration
 shared_buffers = 112MB
 work_mem = 8MB
 maintenance_work_mem = 48MB
 max_stack_depth = 3MB
 wal_buffers = 1MB
 effective_cache_size = 128MB
 

[PERFORM] Surprising no use of indexes - low performance

2013-02-14 Thread Nicolas Charles

Hello,

I've been struggling to understand what's happening on my 
databases/query for several days, and I'm turning to higher mind for a 
logical answer.


I'm dealing with a fairly large database, containing logs informations, 
that I crunch to get data out of it, with several indexes on them that I 
hoped were logical


\d ruddersysevents
 Table « public.ruddersysevents »
  Colonne   |   Type   |  
Modificateurs

+--+--
 id | integer  | non NULL Par défaut, 
nextval('serial'::regclass)

 executiondate  | timestamp with time zone | non NULL
 nodeid | text | non NULL
 directiveid| text | non NULL
 ruleid | text | non NULL
 serial | integer  | non NULL
 component  | text | non NULL
 keyvalue   | text |
 executiontimestamp | timestamp with time zone | non NULL
 eventtype  | character varying(64)|
 policy | text |
 msg| text |
Index :
ruddersysevents_pkey PRIMARY KEY, btree (id)
component_idx btree (component)
configurationruleid_idx btree (ruleid)
executiontimestamp_idx btree (executiontimestamp)
keyvalue_idx btree (keyvalue)
nodeid_idx btree (nodeid)
Contraintes de vérification :
ruddersysevents_component_check CHECK (component  ''::text)
ruddersysevents_configurationruleid_check CHECK (ruleid  ''::text)
ruddersysevents_nodeid_check CHECK (nodeid  ''::text)
ruddersysevents_policyinstanceid_check CHECK (directiveid  
''::text)



It contains 11018592 entries, with the followinf patterns :
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial

Related table/index size are
relation|  size
+-
 public.ruddersysevents | 3190 MB
 public.nodeid_idx  | 614 MB
 public.configurationruleid_idx | 592 MB
 public.ruddersysevents_pkey| 236 MB
 public.executiontimestamp_idx  | 236 MB


I'm crunching the data by looking for each 
nodeid/ruleid/directiveid/serial with an executiontimestamp in an interval:


explain analyze select executiondate, nodeid, ruleid, directiveid, 
serial, component, keyValue, executionTimeStamp, eventtype, policy, msg 
from RudderSysEvents where 1=1  and nodeId = 
'31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10 and 
executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', 
'-MM-DD HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', 
'-MM-DD HH24:MI:SS.MS') ORDER BY executionTimeStamp asc;
 Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual 
time=28628.922..28647.952 rows=62403 loops=1)

   Sort Key: executiontimestamp
   Sort Method:  external merge  Disk: 17480kB
   -  Bitmap Heap Scan on ruddersysevents  (cost=74359.66..292896.27 
rows=3848 width=252) (actual time=1243.150..28338.927 rows=62403 loops=1)
 Recheck Cond: ((nodeid = 
'31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
 Filter: ((serial = 10) AND (executiontimestamp = 
to_timestamp('2012-11-22 16:00:16.005'::text, '-MM-DD 
HH24:MI:SS.MS'::text)) AND (executiontimestamp = 
to_timestamp('2013-01-25 18:53:52.467'::text, '-MM-DD 
HH24:MI:SS.MS'::text)))
 -  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0) 
(actual time=1228.610..1228.610 rows=0 loops=1)
   -  Bitmap Index Scan on nodeid_idx  
(cost=0.00..25795.17 rows=716237 width=0) (actual time=421.365..421.365 
rows=690503 loops=1)
 Index Cond: (nodeid = 
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
   -  Bitmap Index Scan on configurationruleid_idx  
(cost=0.00..48562.32 rows=1386538 width=0) (actual time=794.490..794.490 
rows=1381391 loops=1)
 Index Cond: (ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)

 Total runtime: 28657.352 ms



I'm surprised that the executiontimestamp index is not used, since it 
seems to be where most of the query time is spent.


For all my tests, I removed all the incoming logs, so that this table 
has only selects and no writes


I'm using Postgres 8.4, on a quite smallish VM, with some process 
runnings, with the following non default configuration

shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
checkpoint_segments = 6

Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the 
performances (less than 

Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-14 Thread Ireneusz Pluta


W dniu 2013-02-14 16:35, Nicolas Charles pisze:
I'm crunching the data by looking for each nodeid/ruleid/directiveid/serial with an 
executiontimestamp in an interval:


explain analyze select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, 
executionTimeStamp, eventtype, policy, msg from RudderSysEvents where 1=1  and nodeId = 
'31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId = '61713ff1-aa6f-4c86-b3cb-7012bee707dd' and 
serial = 10 and executiontimestamp between to_timestamp('2012-11-22 16:00:16.005', '-MM-DD 
HH24:MI:SS.MS') and to_timestamp('2013-01-25 18:53:52.467', '-MM-DD HH24:MI:SS.MS') ORDER BY 
executionTimeStamp asc;
 Sort  (cost=293125.41..293135.03 rows=3848 width=252) (actual time=28628.922..28647.952 
rows=62403 loops=1)

   Sort Key: executiontimestamp
   Sort Method:  external merge  Disk: 17480kB
   -  Bitmap Heap Scan on ruddersysevents (cost=74359.66..292896.27 rows=3848 width=252) (actual 
time=1243.150..28338.927 rows=62403 loops=1)
 Recheck Cond: ((nodeid = '31264061-5ecb-4891-9aa4-83824178f43d'::text) AND (ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text))
 Filter: ((serial = 10) AND (executiontimestamp = to_timestamp('2012-11-22 
16:00:16.005'::text, '-MM-DD HH24:MI:SS.MS'::text)) AND (executiontimestamp = 
to_timestamp('2013-01-25 18:53:52.467'::text, '-MM-DD HH24:MI:SS.MS'::text)))
 -  BitmapAnd  (cost=74359.66..74359.66 rows=90079 width=0) (actual 
time=1228.610..1228.610 rows=0 loops=1)
   -  Bitmap Index Scan on nodeid_idx (cost=0.00..25795.17 rows=716237 width=0) 
(actual time=421.365..421.365 rows=690503 loops=1)

 Index Cond: (nodeid = 
'31264061-5ecb-4891-9aa4-83824178f43d'::text)
   -  Bitmap Index Scan on configurationruleid_idx  (cost=0.00..48562.32 rows=1386538 
width=0) (actual time=794.490..794.490 rows=1381391 loops=1)

 Index Cond: (ruleid = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd'::text)
 Total runtime: 28657.352 ms



I'm surprised that the executiontimestamp index is not used, since it seems to be where most of 
the query time is spent.


this use pattern is quite similar to the one I used to have problem with. The key problem here is 
that planner wants to bitmapand on indexes that are spread on all the table, on all timestamp 
values, regardless you are interested in only a narrow timestamp window, and is quite aggressive on 
using bitmapscan feature. So the planner needs to be directed more precisely.


You could try the above again with:

SET enable_bitmapscan TO off ?

It helped in my case.

You may also try close the timestamp condition in a preselecting CTE, and doing the rest of finer 
filtering outside of it, like:


with
p as (select * from RudderSysEvents where executiontimestamp between '2012-11-22 16:00:16.005' and 
'2013-01-25 18:53:52.467')
select executiondate, nodeid, ruleid, directiveid, serial, component, keyValue, executionTimeStamp, 
eventtype, policy, msg

from p
where nodeId = '31264061-5ecb-4891-9aa4-83824178f43d'  and ruleId = 
'61713ff1-aa6f-4c86-b3cb-7012bee707dd' and serial = 10


As a side note, I think that all your indexes, except the timestamp one, are unnecessary, because of 
low distribution or their values, and, as you see, the confuse they make to the planner.


Eventually, you may use one of the columns as a second one to a two column index together with 
timestamp, the one which may always be used for filtering and add its filtering inside the CTE part.


HTH,
Irek.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Surprising no use of indexes - low performance

2013-02-14 Thread Jeff Janes
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
nicolas.char...@normation.com wrote:

 It contains 11018592 entries, with the followinf patterns :
 108492 distinct executiontimestamp
 14 distinct nodeid
 59 distinct directiveid
 26 distinct ruleid
 35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?

...

 I'm surprised that the executiontimestamp index is not used, since it seems
 to be where most of the query time is spent.

I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?

 For all my tests, I removed all the incoming logs, so that this table has
 only selects and no writes

 I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out.  What is smallish?

 with the following non default configuration
 shared_buffers = 112MB
 work_mem = 8MB
 maintenance_work_mem = 48MB
 max_stack_depth = 3MB
 wal_buffers = 1MB
 effective_cache_size = 128MB

effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.

 checkpoint_segments = 6

 Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
 performances (less than 10%). I would have expected it to improve, since the
 indexes would all fit in RAM

If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.


 create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
 serial, nodeid);

I wouldn't expect this to work well for this particular query.  Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure.  You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.



 2/ Removing nodeid from the index did lower again the perf
 create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
 serial);


I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

 3/ Removing executiontimestamp from the composite index makes the query
 performs better at the begining of its uses (around 17 secondes), but over
 time it degrades (I'm logging query longer than 20 secondes, and there are
 very rare in the first half of the batch, and getting more and more common
 at the end) to what is below

If the batch processing adds data, it is not surprising the query
slows down.  It looks like it is still faster at the end then the
previous two cases, right?


 So my question is :
 Why *not* indexing the column which is not used makes the query slower over
 time, while not slowing the application?

I don't know what column you are referring to here.  But it sounds
like you think that dropping the leading column from an index is a
minor change.  It is not.  It makes a fundamentally different index.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance