Re: [PERFORM] query using incorrect index

2012-08-03 Thread Robert Klemme
On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane wrote:

>  ** **
>
> Using PG 9.0 and given 2 queries (messageq_current is a view on the
> messageq_table):
>
> ** **
>
> select entity_id from messageq_current
>
> where entity_id = 123456;
>
> ** **
>
> select entity_id from messageq_current
>
> where incoming = true
>
> and inactive = false
>
> and staff_ty = 2
>
> and staff_id = 2
>
> order by entity_id desc
>
> limit 1;
>
> ** **
>
> and 2 indexes (there are 15 indexes in total but they are left out here
> for brevity):
>
> ** **
>
> messageq1:
>
> CREATE INDEX messageq1
>
>   ON messageq_table
>
>   USING btree
>
>   (entity_id);
>
> ** **
>
> And messageq4:
>
> ** **
>
> CREATE INDEX messageq4
>
>   ON messageq_table
>
>   USING btree
>
>   (inactive, staff_ty, staff_id, incoming, tran_dt);
>
> **
>

Of course *a lot* of detail is missing (full schema of table, all the other
indexes) but with "inactive" a boolean column I suspect selectivity might
not be too good here and so having it as a first column in a covering index
is at least questionable.  If query 2 is frequent you might also want to
consider creating a partial index only on (staff_ty, staff_id) with
filtering criteria on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?

select max(entity_id) as entity_id

from messageq_current

where incoming = true

and inactive = false

and staff_ty = 2

and staff_id = 2;


>  **
>
> With the messageq1 index present, query 1 is very quick (0.094ms) and
> query 2 is very slow (241.515ms).
>
> If I remove messageq1 then query 2 uses messageq4 and is very quick
> (0.098ms) but then query 1 must use a different index and is therefore
> slower (> 5ms).
>
> ** **
>
> So, to the Query plans:
>

Of which query?  Shouldn't there be four plans in total?  I'd post plans
here:
http://explain.depesz.com/


> With messageq1:
>
> "Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481
> rows=0 loops=1)"
>
> "  Output: messageq_table.entity_id"
>
> "  Buffers: shared hit=32 read=18870 written=12"
>
> "  ->  Index Scan Backward using messageq1 on
> prac_live_10112.messageq_table  (cost=0.00..66762.53 rows=25 width=4)
> (actual time=241.479..241.479 rows=0 loops=1)"
>
> "Output: messageq_table.entity_id"
>
> "Filter: (messageq_table.incoming AND (NOT
> messageq_table.inactive) AND (messageq_table.staff_ty = 2) AND
> (messageq_table.staff_id = 2) AND
> (aud_status_to_flag(messageq_table.aud_status) = 1))"
>
> "Buffers: shared hit=32 read=18870 written=12"
>
> "Total runtime: 241.515 ms"
>
> ** **
>
> Without messageq1:
>
> "Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055
> rows=0 loops=1)"
>
> "  Output: messageq_table.entity_id"
>
> "  Buffers: shared read=3"
>
> "  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual
> time=0.054..0.054 rows=0 loops=1)"
>
> "Output: messageq_table.entity_id"
>
> "Sort Key: messageq_table.entity_id"
>
> "Sort Method:  quicksort  Memory: 17kB"
>
> "->  Bitmap Heap Scan on prac_live_10112.messageq_table
> (cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0
> loops=1)"
>
> "  Output: messageq_table.entity_id"
>
> "  Recheck Cond: ((messageq_table.staff_ty = 2) AND
> (messageq_table.staff_id = 2))"
>
> "  Filter: (messageq_table.incoming AND (NOT
> messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status)
> = 1))"
>
> "  Buffers: shared read=3"
>
> "  ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08
> rows=4920 width=0) (actual time=0.040..0.040 rows=0 loops=1)"
>
> "Index Cond: ((messageq_table.inactive = false) AND
> (messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND
> (messageq_table.incoming = true))"
>
> "Buffers: shared read=3"
>
> "Total runtime: 0.098 ms"
>
> ** **
>
> Clearly the statistics are off somehow but I really don’t know where to
> start.
>
> ** **
>
> Any help you can give me would be very much appreciated.
>

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


Re: [PERFORM] query using incorrect index

2012-08-03 Thread Russell Keane
You're right, a lot of information is missing but I'm unsure that the other 
information will make too much difference.
I could drop all the other indexes on the table which aren't used here and the 
queries would still use the indexes they are currently using.

I appreciate the idea that a boolean column selectivity might not be great. 
I've just tried creating indexes as follows:
CREATE INDEX messageq17
  ON messageq_table
  USING btree
  (staff_ty, staff_id, incoming, inactive, entity_id);

CREATE INDEX messageq18
  ON messageq_table
  USING btree
  (staff_ty, staff_id);

When running query 2 as it stands the same thing happens, it still uses the 
messageq1 index.

The query is logically the same as using max, you are correct, but it's 
generated on the fly so the limit or the queried column may change.

The query plans were for the second query as I'm unsure that the first query is 
really relevant, it was simply there to justify the messageq1 index.

Thanks,

From: Robert Klemme [mailto:shortcut...@googlemail.com]
Sent: 03 August 2012 10:18
To: Russell Keane; pgsql-performance
Subject: Re: [PERFORM] query using incorrect index


On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane 
mailto:russell.ke...@inps.co.uk>> wrote:

Using PG 9.0 and given 2 queries (messageq_current is a view on the 
messageq_table):

select entity_id from messageq_current
where entity_id = 123456;

select entity_id from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2
order by entity_id desc
limit 1;

and 2 indexes (there are 15 indexes in total but they are left out here for 
brevity):

messageq1:
CREATE INDEX messageq1
  ON messageq_table
  USING btree
  (entity_id);

And messageq4:

CREATE INDEX messageq4
  ON messageq_table
  USING btree
  (inactive, staff_ty, staff_id, incoming, tran_dt);

Of course a lot of detail is missing (full schema of table, all the other 
indexes) but with "inactive" a boolean column I suspect selectivity might not 
be too good here and so having it as a first column in a covering index is at 
least questionable.  If query 2 is frequent you might also want to consider 
creating a partial index only on (staff_ty, staff_id) with filtering criteria 
on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?
select max(entity_id) as entity_id
from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2;


With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 
is very slow (241.515ms).
If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) 
but then query 1 must use a different index and is therefore slower (> 5ms).

So, to the Query plans:

Of which query?  Shouldn't there be four plans in total?  I'd post plans here:
http://explain.depesz.com/

With messageq1:
"Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 
rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Buffers: shared hit=32 read=18870 written=12"
"  ->  Index Scan Backward using messageq1 on prac_live_10112.messageq_table  
(cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 
loops=1)"
"Output: messageq_table.entity_id"
"Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND 
(messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND 
(aud_status_to_flag(messageq_table.aud_status) = 1))"
"Buffers: shared hit=32 read=18870 written=12"
"Total runtime: 241.515 ms"

Without messageq1:
"Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 
rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Buffers: shared read=3"
"  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual 
time=0.054..0.054 rows=0 loops=1)"
"Output: messageq_table.entity_id"
"Sort Key: messageq_table.entity_id"
"Sort Method:  quicksort  Memory: 17kB"
"->  Bitmap Heap Scan on prac_live_10112.messageq_table  
(cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 
loops=1)"
"  Output: messageq_table.entity_id"
"  Recheck Cond: ((messageq_table.staff_ty = 2) AND 
(messageq_table.staff_id = 2))"
"  Filter: (messageq_table.incoming AND (NOT 
messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 
1))"
"  Buffers: shared read=3"
"  ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 
width=0) (actual time=0.040..0.040 rows=0 loops=1)"
"Index Cond: ((messageq_table.inactive = false) AND 
(messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND 
(messageq_table.incoming = true))"
"Buffers: shared read=3"
"Total runtime: 0.098 ms"

Clearly the statistics are off somehow but I really don't know where to start.

Any help you can give me would be very much appreciated.

Kind regards

robert

--
remember.guy do

Re: [PERFORM] query using incorrect index

2012-08-03 Thread Russell Keane
Settings query:
"version";"PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 32-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"lc_collate";"English_United Kingdom.1252"
"lc_ctype";"English_United Kingdom.1252"
"listen_addresses";"*"
"log_destination";"stderr"
"log_duration";"off"
"log_line_prefix";"%t "
"log_min_duration_statement";"1ms"
"log_statement";"none"
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5433"
"search_path";"prac_live_10112, prac_shared_10112, global"
"server_encoding";"UTF8"
"shared_buffers";"32MB"
"TimeZone";"Europe/London"
"work_mem";"1MB"

Hardware:
It's important to note that this is a (purposely) low spec development machine 
but the performance story is a similar one on our test setup which is a lot 
closer to our live environment. (I'm in the process of getting figures on this).
E8400 Core 2 Duo (2.99GHz)
4GB ram
xp (latest sp and all updates)
1 300GB SATA2 drive with 170 GB free space

Explain analyse with both indexes present but without the limit (uses the 
correct index):

"Sort  (cost=12534.90..12534.97 rows=25 width=4) (actual time=0.055..0.055 
rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Sort Key: messageq_table.entity_id"
"  Sort Method:  quicksort  Memory: 17kB"
"  Buffers: shared read=3"
"  ->  Bitmap Heap Scan on prac_live_10112.messageq_table  
(cost=174.09..12534.32 rows=25 width=4) (actual time=0.040..0.040 rows=0 
loops=1)"
"Output: messageq_table.entity_id"
"Recheck Cond: ((messageq_table.staff_ty = 2) AND 
(messageq_table.staff_id = 2))"
"Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND 
(aud_status_to_flag(messageq_table.aud_status) = 1))"
"Buffers: shared read=3"
"->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 
width=0) (actual time=0.037..0.037 rows=0 loops=1)"
"  Index Cond: ((messageq_table.inactive = false) AND 
(messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND 
(messageq_table.incoming = true))"
"  Buffers: shared read=3"
"Total runtime: 0.092 ms"



-Original Message-
From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Sent: 02 August 2012 21:13
To: Russell Keane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query using incorrect index

Russell Keane  wrote:
 
> Clearly the statistics are off somehow but I really don't know where 
> to start.
> 
> Any help you can give me would be very much appreciated.
 
It would help to know your more about your hardware and PostgreSQL 
configuration.  The latter can probably best be communicated by copy/paste of 
the results of the query on this page:
 
http://wiki.postgresql.org/wiki/Server_Configuration
 
Can you also post the EXPLAIN ANALYZE output for the slow query with both 
indexes present but without the LIMIT clause?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query using incorrect index

2012-08-03 Thread Kevin Grittner
Russell Keane  wrote:
 
> "log_min_duration_statement";"1ms"
 
> "shared_buffers";"32MB"
> "work_mem";"1MB"
 
Those are pretty low values even for a 4GB machine.  I suggest the
following changes and additions, based on the fact that you seem to
have the active portion of the database fully cached.
 
shared_buffers = '160MB'
work_mem = '8MB'
seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
effective_cache_size = '2GB'
 
> Explain analyse with both indexes present but without the limit
> (uses the correct index):
 
> "Total runtime: 0.092 ms"
 
Part of problem is that it thinks it will find a matching row fairly
quickly, and having done so using the index it chose will mean it is
the *right* row.  The problem is that there are no matching rows, so
it has to scan the entire index.  More fine-grained statistics
*might* help.  If other techniques don't help, you can rewrite the
query slightly to create an optimization fence, but that should be a
last resort.  I agree with Robert that if you have a lot of queries
that select on "incoming" and/or "inactive", a conditional index
(with a WHERE clause in its definition) is likely to be very
helpful.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query using incorrect index

2012-08-03 Thread Russell Keane
I tried creating the following index:

CREATE INDEX messageq17
  ON messageq_table
  USING btree
  (staff_ty, staff_id, entity_id)
  WHERE inactive = false;

'inactive = false' (active would be much easy but this is legacy) records 
should make up a smaller proportion of the overall dataset (and much more of 
the queries will specify this clause) and the results are very promising.

I will also try changing the settings and report back.

Thanks again guys,



-Original Message-
From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Sent: 03 August 2012 15:34
To: Russell Keane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query using incorrect index

Russell Keane  wrote:
 
> "log_min_duration_statement";"1ms"
 
> "shared_buffers";"32MB"
> "work_mem";"1MB"
 
Those are pretty low values even for a 4GB machine.  I suggest the following 
changes and additions, based on the fact that you seem to have the active 
portion of the database fully cached.
 
shared_buffers = '160MB'
work_mem = '8MB'
seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
effective_cache_size = '2GB'
 
> Explain analyse with both indexes present but without the limit (uses 
> the correct index):
 
> "Total runtime: 0.092 ms"
 
Part of problem is that it thinks it will find a matching row fairly quickly, 
and having done so using the index it chose will mean it is the *right* row.  
The problem is that there are no matching rows, so it has to scan the entire 
index.  More fine-grained statistics
*might* help.  If other techniques don't help, you can rewrite the query 
slightly to create an optimization fence, but that should be a last resort.  I 
agree with Robert that if you have a lot of queries that select on "incoming" 
and/or "inactive", a conditional index (with a WHERE clause in its definition) 
is likely to be very helpful.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Laszlo Nagy


All the above are the exact same point in time merely stated as 
relevant to each location. Note that given a timestamp with time zone 
and a zone, PostgreSQL returns a timestamp without time zone (you know 
the zone since you specified it). 

Yes, I know the zone. But I don't know the offset from UTC.

Example:

template1=> set timezone to 'UTC';
SET
template1=> select ('2011-10-30 01:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time?
(1 row)

template1=> select ('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time? What is the 
offset from UTC here? Can you tell me when it was in UTC?

(1 row)

template1=>

What is more:

template1=> select (('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest') is distinct from (('2011-10-30 
01:00:00'::timestamptz) at time zone 'Europe/Budapest');

 ?column?
--
 f
(1 row)

template1=>

Yeah, we know what time zone it is in, but we don't know when it was, 
thanks a lot. :-( It would be unambiguous to store the UTC offset along 
with the value. But it is not how it was implemented.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Steve Crawford

On 08/03/2012 10:21 AM, Laszlo Nagy wrote:


All the above are the exact same point in time merely stated as 
relevant to each location. Note that given a timestamp with time zone 
and a zone, PostgreSQL returns a timestamp without time zone (you 
know the zone since you specified it). 

Yes, I know the zone. But I don't know the offset from UTC.

Example:

template1=> set timezone to 'UTC';
SET
template1=> select ('2011-10-30 01:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time?
(1 row)

template1=> select ('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time? What is the 
offset from UTC here? Can you tell me when it was in UTC?

(1 row)

template1=>



I can not from the given information. Can you? The given information is 
ambiguous as are all times during the hour of fall-back everywhere. That 
leaves developers with a choice: choose an interpretation or throw an 
error. PostgreSQL chooses to use an interpretation.


It would be nice if there were a specification as to how such ambiguous 
data should be interpreted. Perhaps someone can point me to one and to 
any relevant documentation detailing how PostgreSQL handles such data. 
As it is, you need to be aware of how each part of your system deals 
with such. For example (using my local time zone) using the date command 
on Linux I see that

"date -d '2012-11-04 0130'"
returns
"Sun Nov  4 01:30:00 PDT 2012" (Still in Daylight Saving Time)

But given the same input, PostgreSQL interprets it as standard time 
(offset -08):

select '2012-11-04 0130'::timestamptz;
  timestamptz

 2012-11-04 01:30:00-08


What is more:

template1=> select (('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest') is distinct from (('2011-10-30 
01:00:00'::timestamptz) at time zone 'Europe/Budapest');

 ?column?
--
 f
(1 row)

template1=>

Yeah, we know what time zone it is in, but we don't know when it was, 
thanks a lot. :-( It would be unambiguous to store the UTC offset 
along with the value. But it is not how it was implemented.



So you took two distinct points in time, threw away some critical 
information, and are surprised why they are now equal? Then don't do 
that. It's the equivalent of being surprised that www.microsoft.com is 
the same as www.apple.com when comparing them on the short hostname 
only. If you want to know if two points in time differ, just compare them.


Spending a couple hours reading 
http://www.postgresql.org/docs/current/static/datatype-datetime.html 
will be time well spent.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Laszlo Nagy


So you took two distinct points in time, threw away some critical 
information, and are surprised why they are now equal?
Well, I did not want to throw away any information. The actual 
representation could be something like:


"2012-11-04 01:30:00-08 in Europe/Budapest, Winter time"

and

"2012-11-04 01:30:00-08 in Europe/Budapest, Summer time".

It would be unambiguous, everybody would know the time zone, the UTC 
offset and the time value, and conversion back to UTC would be 
unambiguous too.


I presumed that the representation is like that. But I was wrong. I have 
checked other programming languages. As it turns out, nobody wants to 
change the representation just because there can be an ambiguous hour in 
every year. Now I think that most systems treat ambiguous time stamps as 
if they were in standard time. And who am I to go against the main flow? 
I'm sorry, I admit that the problem was in my head.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux memory zone reclaim

2012-08-03 Thread Josh Berkus
On 7/30/12 10:09 AM, Scott Marlowe wrote:
> I think the zone_reclaim gets turned on with a high ratio.  If the
> inter node costs were the same, and the intranode costs dropped in
> half, zone reclaim would likely get turned on at boot time.

We've been seeing a major problem with zone_reclaim and Linux, in that
Linux won't use the FS cache on the "distant" RAM *at all* if it thinks
that RAM is distant enough.  Thus, you get instances of seeing only half
of RAM used for FS cache, even though the database is 5X larger than RAM.

This is poor design on Linux's part, since even the distant RAM is
faster than disk.  For now, we've been disabling zone_reclaim entirely.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux memory zone reclaim

2012-08-03 Thread Scott Marlowe
On Fri, Aug 3, 2012 at 4:30 PM, Josh Berkus  wrote:
> On 7/30/12 10:09 AM, Scott Marlowe wrote:
>> I think the zone_reclaim gets turned on with a high ratio.  If the
>> inter node costs were the same, and the intranode costs dropped in
>> half, zone reclaim would likely get turned on at boot time.
>
> We've been seeing a major problem with zone_reclaim and Linux, in that
> Linux won't use the FS cache on the "distant" RAM *at all* if it thinks
> that RAM is distant enough.  Thus, you get instances of seeing only half
> of RAM used for FS cache, even though the database is 5X larger than RAM.
>
> This is poor design on Linux's part, since even the distant RAM is
> faster than disk.  For now, we've been disabling zone_reclaim entirely.

I haven't run into this, but we were running ubuntu 10.04 LTS.  What
kernel were you running when this happened?  I'd love to see a test
case on this, as it seems like a major regression if it's on newer
kernels, and we're looking at running 12.04 LTS soon on one of our
bigger machines.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux memory zone reclaim

2012-08-03 Thread Josh Berkus

>> This is poor design on Linux's part, since even the distant RAM is
>> faster than disk.  For now, we've been disabling zone_reclaim entirely.
> 
> I haven't run into this, but we were running ubuntu 10.04 LTS.  What
> kernel were you running when this happened?  I'd love to see a test
> case on this, as it seems like a major regression if it's on newer
> kernels, and we're looking at running 12.04 LTS soon on one of our
> bigger machines.

Jeff Frost will have a blog up about it later; we're still collecting data.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow query, different plans

2012-08-03 Thread Midge Brown
I'm having a problem with a query on our production server, but not on a laptop 
running a similar postgres version with a recent backup copy of the same table. 
I tried reindexing the table on the production server, but it didn't make any 
difference. Other queries on the same table are plenty fast. 

This query has been slow, but never like this, particularly during a period 
when there are only a couple of connections in use. 

Vacuum and analyze are run nightly (and show as such in pg_stat_user_tables) in 
addition to autovacuum during the day. Here are my autovacuum settings, but 
when I checked last_autovacuum & last_autoanalyze in pg_stat_user_tables those 
fields were blank. 

autovacuum = on 
log_autovacuum_min_duration = 10
autovacuum_max_workers = 3  
autovacuum_naptime = 1min   
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50   
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1   
autovacuum_freeze_max_age = 2   
autovacuum_vacuum_cost_delay = 10ms (changed earlier today from 1000ms)  
autovacuum_vacuum_cost_limit = -1

wal_level = minimal
wal_buffers = 16MB

The only recent change was moving the 3 databases we have from multiple raid 1 
drives with tablespaces spread all over to one large raid10 with indexes and 
data in pg_default. WAL for this table was moved as well.

Does anyone have any suggestions on where to look for the problem?  

clientlog table info:

Size: 1.94G

  Column  |Type | Modifiers 
--+-+---
 pid0 | integer | not null
 rid  | integer | not null
 verb | character varying(32)   | not null
 noun | character varying(32)   | not null
 detail   | text| 
 path | character varying(256)  | not null
 ts   | timestamp without time zone | 
 applies2 | integer | 
 toname   | character varying(128)  | 
 byname   | character varying(128)  | 
Indexes:
"clientlog_applies2" btree (applies2)
"clientlog_pid0_key" btree (pid0)
"clientlog_rid_key" btree (rid)
"clientlog_ts" btree (ts)

The query, hardware info, and links to both plans:

explain analyze select max(ts) as ts from clientlog where applies2=256;

Production server:
- 4 dual-core AMD Opteron 2212 processors, 2010.485 MHz
- 64GB RAM
- 464GB RAID10 drive 
- Linux 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 
GNU/Linux
 PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20080704 (Red Hat 4.1.2-46), 64-bit

http://explain.depesz.com/s/8R4
 

>From laptop running Linux 2.6.34.9-69.fc13.868 with 3G ram against a copy of 
>the same table:
PostgreSQL 9.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.4.4 20100630 
(Red Hat 4.4.4-10), 32-bit

http://explain.depesz.com/s/NQl

Thank you,
Midge


Re: [PERFORM] slow query, different plans

2012-08-03 Thread Greg Williamson
Midge --

Sorry for top-quoting -- challenged mail.

Perhaps a difference in the stats estimates -- default_statistics_target ?

Can you show us a diff between the postgres config files for each instance ? 
Maybe something there ...

Greg Williamson



>
> From: Midge Brown 
>To: pgsql-performance@postgresql.org 
>Sent: Friday, August 3, 2012 5:38 PM
>Subject: [PERFORM] slow query, different plans
> 
>
> 
>I'm having a 
problem with a query on our production server, but not on a laptop running a 
similar postgres version with a recent backup copy of the same table. I tried 
reindexing the table on the production 
server, but it didn't make any difference. Other queries on the same table are 
plenty fast. 
> 
>This query has been slow, but never like this, 
particularly during a period when there are only a couple of connections in 
use. 
> 
>Vacuum and analyze are run nightly (and 
show as such in pg_stat_user_tables) in addition to autovacuum during 
the day. Here are my autovacuum settings, but when I checked 
last_autovacuum & last_autoanalyze in pg_stat_user_tables those fields 
were blank. 
> 
>autovacuum = 
on 
>log_autovacuum_min_duration = 10    
>autovacuum_max_workers = 
3  
>autovacuum_naptime = 
1min   
>autovacuum_vacuum_threshold = 50    
>autovacuum_analyze_threshold = 50   
>autovacuum_vacuum_scale_factor = 0.2    
>autovacuum_analyze_scale_factor = 0.1   
>autovacuum_freeze_max_age = 2   
>autovacuum_vacuum_cost_delay = 10ms (changed earlier today from 
1000ms)  
>autovacuum_vacuum_cost_limit = -1
> 
>wal_level = minimal
>wal_buffers = 16MB
> 
>The only recent change was moving the 3 databases 
we have from multiple raid 1 drives with tablespaces spread all over to one 
large raid10 with indexes and data in pg_default. WAL for this table was moved 
as well.
> 
>Does anyone have any suggestions on where to look 
for the problem?  
> 
>clientlog table info:
> 
>Size: 1.94G
> 
>  Column  
|    
Type | 
Modifiers 
>--+-+---
> pid0 
| 
integer 
| not null
> rid  | 
integer 
| not null
> verb | character 
varying(32)   | not 
null
> noun | character 
varying(32)   | not 
null
> detail   | 
text    
| 
> path | character 
varying(256)  | not 
null
> ts   | timestamp without time 
zone | 
> applies2 | 
integer 
| 
> toname   | character 
varying(128)  | 
> byname   | 
character varying(128)  | 
>Indexes:
>    "clientlog_applies2" btree 
(applies2)
>    "clientlog_pid0_key" btree 
(pid0)
>    "clientlog_rid_key" btree 
(rid)
>    "clientlog_ts" btree (ts)
>
>The query, hardware info, and links to both 
plans:
> 
>explain analyze select max(ts) as ts from 
clientlog where applies2=256;
> 
>Production server:
>- 4 dual-core AMD Opteron 2212 processors, 
2010.485 MHz
>- 64GB RAM
>- 464GB RAID10 drive 
>- Linux 2.6.18-164.el5 #1 
SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> PostgreSQL 9.0.4 on 
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 
4.1.2-46), 64-bit
>
>http://explain.depesz.com/s/8R4
> 
> 
>From laptop running Linux 2.6.34.9-69.fc13.868 
with 3G ram against a copy of the same table:
>PostgreSQL 9.0.2 on i686-pc-linux-gnu, compiled 
by GCC gcc (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10), 32-bit
> 
>http://explain.depesz.com/s/NQl
> 
>Thank you,
>Midge
> 
>
>

Re: [PERFORM] slow query, different plans

2012-08-03 Thread Tom Lane
"Midge Brown"  writes:
> I'm having a problem with a query on our production server, but not on a 
> laptop running a similar postgres version with a recent backup copy of the 
> same table. I tried reindexing the table on the production server, but it 
> didn't make any difference. Other queries on the same table are plenty fast. 

Reindexing won't help that.  The problem is a bad statistical estimate;
it thinks there are about 700 rows with applies2 = 256, when there's
really only one.  That means the "fast" plan is a lot faster than the
planner gives it credit for, and conversely the "slow" plan is a lot
slower than the planner is expecting.  Their estimated costs end up
nearly the same, which makes it a bit of a chance matter which one is
picked --- but the true costs are a lot different.  So you need to fix
that rowcount estimate.  Raising the stats target for the table might
help.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance