Re: [PERFORM] Increasing pattern index query speed

2008-11-24 Thread Richard Huxton
Andrus wrote:
 Both queries return same result (19) and return same data.
 Pattern query is a much slower (93 sec) than  equality check (13 sec).
 How to fix this ?
 Using 8.1.4, utf-8 encoding, et-EE locale

They're different queries. The fact that they return the same results is
a coincidence.

This

   WHERE rid.toode = '9910' 

Is a different condition to this

   WHERE rid.toode like '9910%'

You aren't going to get the same plans.

Anyway, I think the problem is in the dok JOIN rid bit look:

 Aggregate  (cost=43.09..43.10 rows=1 width=0) (actual
 time=12674.675..12674.679 rows=1 loops=1)
   -  Nested Loop  (cost=29.57..43.08 rows=1 width=0) (actual
 time=2002.045..12673.645 rows=19 loops=1)
 -  Nested Loop  (cost=29.57..37.06 rows=1 width=24) (actual
 time=2001.922..12672.344 rows=19 loops=1)

 Aggregate  (cost=15.52..15.53 rows=1 width=0) (actual
 time=92966.501..92966.505 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..15.52 rows=1 width=0) (actual
 time=24082.032..92966.366 rows=19 loops=1)
 -  Nested Loop  (cost=0.00..9.50 rows=1 width=24) (actual
 time=24081.919..92965.116 rows=19 loops=1)

These are the same but the times are different. I'd be very surprised if
you can reproduce these times reliably.


Can I give you some wider-ranging suggestions Andrus?
1. Fix the vacuuming issue in your hash-join question.
2. Monitor the system to make sure you know if/when disk activity is high.
3. *Then* start to profile individual queries and look into their plans.
Change the queries one at a time and monitor again.

Otherwise, it's very difficult to figure out whether changes you make
are effective.

HTH
-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread Andrus

Tomas,

Let's suppose you set a reasonable value (say 8096) instead of 2GB. That 
gives about 160MB.
Anyway this depends - if you have a lot of slow queries caused by on-disk 
sorts / hash tables, use a higher value. Otherwise leave it as it is.


Probably product orders table is frequently joined which product table.
currently there was work_memory = 512 in conf file.

I changed it to work_memory = 8096


If it is all cached in memory, you may want to ensure that your
shared_buffers is a reasonalbe size so that there is less shuffling of 
data
from the kernel to postgres and back.  Generally, shared_buffers works 
best

between 5% and 25% of system memory.


currently shared_buffers = 15000


That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, 
but seems reasonable.


I changed it to 2

Given the fact that the performance issues are caused by bloated tables 
and / or slow I/O subsystem, moving to a similar system won't help I 
guess.


I have ran VACUUM FULL ANALYZE VERBOSE
and set MAX_FSM_PAGES = 15

So there is no any bloat except pg_shdepend indexes which should not affect 
to query speed.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Steve Clark

Glyn Astill wrote:

--- On Sat, 22/11/08, Scott Marlowe [EMAIL PROTECTED] wrote:
 


You really have two choices.  First is to try and use it as
a plain
SCSI card, maybe with caching turned on, and do the raid in
software.
Second is to cut it into pieces and make jewelry out of it.



Haha, I'm not really into jewelry, although I had thought of smacking it into a 
pile of dust with a lump hammer, that's much more my thing.



Anything
before the Perc 6 series is seriously brain damaged, and
the Perc6
brings the dell raid array line squarly in line with a 5
year old LSI
megaraid, give or take.  And that's being generous.




Well this card thinks it's a 5 year old lsi megaraid. I've got a pile of perc5i 
megaraid paperweights on my desk at work, so this was kinda expected really.



I've tried writeback and write through modes,


tried changing cache flush times, disabled and enabled
multiple PCI delayed transactions, all seem to have little
effect.

Yeah, it's like trying to performance tune a yugo.




Did I mention I drive a yugo?



Finally I decided to wave goodbye to Dell's


firmware. LSI has it down as a MegaRAID 493 elite 1600, so I
flashed it with their latest firmware.  Doesn't seem to
have helped either though.

Does it have a battery backup module?  Often you can't
really turn on
write-back without one.  That would certainly slow things
down.  But
you should certainly expect  20 M/s on a modern RAID
controller
writing out to a 4 disk RAID10




Yeah the battery's on it, that and the 128Mb is really the only reason I 
thought I'd give it a whirl.





Is the battery  functioning? We found that the unit had to be on and charged 
before write back caching
would work.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread tv
 Given the fact that the performance issues are caused by bloated tables
 and / or slow I/O subsystem, moving to a similar system won't help I
 guess.

 I have ran VACUUM FULL ANALYZE VERBOSE
 and set MAX_FSM_PAGES = 15

 So there is no any bloat except pg_shdepend indexes which should not
 affect to query speed.

OK, what was the number of unused pointer items in the VACUUM output?

The query performance is still the same as when the tables were bloated?
What are the outputs of iostat/vmstat/dstat/top when running the query?

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Glyn Astill
--- On Mon, 24/11/08, Steve Clark [EMAIL PROTECTED] wrote:

  Yeah the battery's on it, that and the 128Mb is
 really the only reason I thought I'd give it a whirl.
  
  
 Is the battery  functioning? We found that the unit had to
 be on and charged before write back caching
 would work.

Yeah the battery is on there, and in the BIOS it says it's PRESENT and the 
status is GOOD.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Scott Marlowe
On Mon, Nov 24, 2008 at 7:49 AM, Glyn Astill [EMAIL PROTECTED] wrote:
 --- On Mon, 24/11/08, Steve Clark [EMAIL PROTECTED] wrote:

  Yeah the battery's on it, that and the 128Mb is
 really the only reason I thought I'd give it a whirl.
 
 
 Is the battery  functioning? We found that the unit had to
 be on and charged before write back caching
 would work.

 Yeah the battery is on there, and in the BIOS it says it's PRESENT and the 
 status is GOOD.

If I remember correctly, older LSI cards had pretty poor performance
in RAID 1+0 (or any layered RAID really).  Have you tried setting up
RAID-1 pairs on the card and then striping them with the OS?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Glyn Astill
--- Scott Marlowe [EMAIL PROTECTED] wrote:

 
  Yeah the battery is on there, and in the BIOS it says it's
 PRESENT and the status is GOOD.
 
 If I remember correctly, older LSI cards had pretty poor
 performance
 in RAID 1+0 (or any layered RAID really).  Have you tried setting
 up
 RAID-1 pairs on the card and then striping them with the OS?
 

Not yet no, but that's a good suggestion and I do intend to give it a
whirl.  I get about 27MB/s from raid 1 (10 is about the same) so
hopefully I can up the throughput to the speed of about one disk with
sw raid.

For kicks I did try raid 5 on it; 6.9MB/s made it hard to resist
going to get the hammer, which is still a very attractive option.



  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Scott Marlowe
On Mon, Nov 24, 2008 at 8:06 AM, Glyn Astill [EMAIL PROTECTED] wrote:
 --- Scott Marlowe [EMAIL PROTECTED] wrote:

 
  Yeah the battery is on there, and in the BIOS it says it's
 PRESENT and the status is GOOD.

 If I remember correctly, older LSI cards had pretty poor
 performance
 in RAID 1+0 (or any layered RAID really).  Have you tried setting
 up
 RAID-1 pairs on the card and then striping them with the OS?


 Not yet no, but that's a good suggestion and I do intend to give it a
 whirl.  I get about 27MB/s from raid 1 (10 is about the same) so
 hopefully I can up the throughput to the speed of about one disk with
 sw raid.

 For kicks I did try raid 5 on it; 6.9MB/s made it hard to resist
 going to get the hammer, which is still a very attractive option.

Well, I prefer making keychain fobs still, but from a technical
perspective, I guess either option is a good one.

Srsly, also look at running pure sw RAID on it with the controller
providing caching only.  I don't expect a PERC 3DC to win any awards,
but the less you give that card to do the better off you'll be.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Simon Waters
On Monday 24 November 2008 14:49:17 Glyn Astill wrote:
 --- On Mon, 24/11/08, Steve Clark [EMAIL PROTECTED] wrote:
   Yeah the battery's on it, that and the 128Mb is
 
  really the only reason I thought I'd give it a whirl.
 
 
  Is the battery  functioning? We found that the unit had to
  be on and charged before write back caching
  would work.

 Yeah the battery is on there, and in the BIOS it says it's PRESENT and
 the status is GOOD.

Sorry I deleted the beginning of this on getting back from a week off.

Writeback is configurable. You can enabled write back caching when the unit is 
not charged if you like. It is offered when you create the array (and can be 
changed later). It is arguably a silly thing to do, but it is an option.

I have some reasonable performance stats for this card assuming you have a 
suitably recent version of the driver software, DELL use to ship with a Linux 
kernel that had a broken driver for this card resulting is very poor 
performance (i.e. substantially slower than software RAID). I have a note 
never to use with Linux before 2.6.22 as the LSI driver bundled had issues, 
DELL themselves shipped (if you asked why is performance so bad) a Redhat 
kernel with a later driver for the card than the official Linux kernel.

That said a couple of weeks back ours corrupted a volume on replacing a dead 
hard disk, so I'm never touching these cheap and tacky LSI RAID cards ever 
again. It is suppose to just start rebuilding the array when you insert the 
replacement drive, if it doesn't just work schedule some down time and 
figure out exactly why, don't (for example) blindly follow the instructions 
in the manual on what to do if it doesn't just work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Scott Marlowe
On Mon, Nov 24, 2008 at 8:41 AM, Simon Waters [EMAIL PROTECTED] wrote:

 That said a couple of weeks back ours corrupted a volume on replacing a dead
 hard disk, so I'm never touching these cheap and tacky LSI RAID cards ever
 again. It is suppose to just start rebuilding the array when you insert the
 replacement drive, if it doesn't just work schedule some down time and
 figure out exactly why, don't (for example) blindly follow the instructions
 in the manual on what to do if it doesn't just work.

Reminds me of a horror story at a company I was at some years ago.
Another group was running Oracle on a nice little 4 way Xeon with a
Gig of ram (back when they was a monster server) and had an LSI card.
They unplugged the server to move it into the hosting center, and in
the move, the scsi cable came loose.  When the machine came up, the
LSI RAID marked every drive bad and the old 4xx series card had no
facility for forcing it to take back a drive.  All their work on the
db was gone, newest backup was months old.  I'm pretty sure they now
understand why RAID5 is no replacement for a good backup plan.

I had a 438 in a dual ppro200, and it worked just fine, but I never
trusted it to auto rebuild anything, and made backups every night.  It
was slow (in the 30 meg/second reads on a 6 disk RAID 5, not faster in
RAID-10 for reads or writes) but reliable.

Newer LSI cards seem quite nice, but I'm now using an Areca 16xx
series and am so far very happy with it's reliability and somewhat
happy with its performance.  Sequential read speed is meh, but random
performance is very good, so for a db server, it's a nice unit.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Monitoring buffercache...

2008-11-24 Thread Kevin Kempter
Hi All;

I've installed pg_buffercache and I want to use it to help define the optimal 
shared_buffers size. 

Currently I run this each 15min via cron:
insert into buffercache_stats select now(), isdirty, count(*) as buffers, 
(count(*) * 8192) as memory from pg_buffercache group by 1,2;

and here's it's explain plan
explain insert into buffercache_stats select now(), isdirty, count(*) as 
buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
QUERY PLAN
---
 Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
   -  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
 -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00 
rows=1000 width=1)
(3 rows)


Then once a day I will pull a report from the buffercache_stats table. The 
buffercache_stats table is our own creation :

\d buffercache_stats
 Table public.buffercache_stats
 Column |Type | Modifiers
+-+---
 snap_timestamp | timestamp without time zone |
 isdirty| boolean |
 buffers  | integer   |
 memory| integer   |


Here's my issue, the server that we'll eventually roll this out to is 
extremely busy and the every 15min query above has the potential to have a 
huge impact on performance.

Does anyone have any suggestions per a better approach or maybe a way to 
improve the performance for the above query ?

Thanks in advance...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Brad Nicholson
On Mon, 2008-11-24 at 11:43 -0700, Kevin Kempter wrote:
 Hi All;
 
 I've installed pg_buffercache and I want to use it to help define the optimal 
 shared_buffers size. 
 
 Currently I run this each 15min via cron:
 insert into buffercache_stats select now(), isdirty, count(*) as buffers, 
 (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 
 and here's it's explain plan
 explain insert into buffercache_stats select now(), isdirty, count(*) as 
 buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 QUERY PLAN
 ---
  Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
-  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
  -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00 
 rows=1000 width=1)
 (3 rows)
 
 
 Then once a day I will pull a report from the buffercache_stats table. The 
 buffercache_stats table is our own creation :
 
 \d buffercache_stats
  Table public.buffercache_stats
  Column |Type | Modifiers
 +-+---
  snap_timestamp | timestamp without time zone |
  isdirty| boolean |
  buffers  | integer   |
  memory| integer   |
 
 
 Here's my issue, the server that we'll eventually roll this out to is 
 extremely busy and the every 15min query above has the potential to have a 
 huge impact on performance.

I wouldn't routinely run pg_buffercache on a busy database.  Plus, I
don't think that pg_buffercache will answer this question for you. It
will tell you whats currently in the buffer pool and the clean/dirty
status, but that's not the first place I'd look, but what you really
need is to figure out the hit ratio on the buffer pool and go from
there.

 Does anyone have any suggestions per a better approach or maybe a way to 
 improve the performance for the above query ?

You should be able to use the blocks hit vs block read data in the
pg_stat_database view (for the overall database), and drill down into
pg_statio_user_tables/pg_statio_all_tables to get more detailed data if
you want.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Scott Marlowe
On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter
[EMAIL PROTECTED] wrote:
 Hi All;

 I've installed pg_buffercache and I want to use it to help define the optimal
 shared_buffers size.

 Currently I run this each 15min via cron:
 insert into buffercache_stats select now(), isdirty, count(*) as buffers,
 (count(*) * 8192) as memory from pg_buffercache group by 1,2;

 and here's it's explain plan
 explain insert into buffercache_stats select now(), isdirty, count(*) as
 buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
QUERY PLAN
 ---
  Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
   -  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
 -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00
 rows=1000 width=1)
 (3 rows)


 Then once a day I will pull a report from the buffercache_stats table. The
 buffercache_stats table is our own creation :

 \d buffercache_stats
 Table public.buffercache_stats
 Column |Type | Modifiers
 +-+---
  snap_timestamp | timestamp without time zone |
  isdirty| boolean |
  buffers  | integer   |
  memory| integer   |


 Here's my issue, the server that we'll eventually roll this out to is
 extremely busy and the every 15min query above has the potential to have a
 huge impact on performance.

 Does anyone have any suggestions per a better approach or maybe a way to
 improve the performance for the above query ?

I wouldn't worry about running it every 15 minutes unless it's on a
REALLY slow machine.

I just ran it in a loop over and over on my 8 core opteron server and
it ran the load factor up by almost exactly 1.0.  Under our normal
daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new
load of running that query over and over.  So, it doesn't seem to be
blocking or anything.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Brad Nicholson
On Mon, 2008-11-24 at 12:46 -0700, Scott Marlowe wrote:
 On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter
 [EMAIL PROTECTED] wrote:
  Hi All;
 
  I've installed pg_buffercache and I want to use it to help define the 
  optimal
  shared_buffers size.
 
  Currently I run this each 15min via cron:
  insert into buffercache_stats select now(), isdirty, count(*) as buffers,
  (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 
  and here's it's explain plan
  explain insert into buffercache_stats select now(), isdirty, count(*) as
  buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2;
 QUERY PLAN
  ---
   Subquery Scan *SELECT*  (cost=65.00..65.23 rows=2 width=25)
-  HashAggregate  (cost=65.00..65.12 rows=2 width=1)
  -  Function Scan on pg_buffercache_pages p  (cost=0.00..55.00
  rows=1000 width=1)
  (3 rows)
 
 
  Then once a day I will pull a report from the buffercache_stats table. The
  buffercache_stats table is our own creation :
 
  \d buffercache_stats
  Table public.buffercache_stats
  Column |Type | Modifiers
  +-+---
   snap_timestamp | timestamp without time zone |
   isdirty| boolean |
   buffers  | integer   |
   memory| integer   |
 
 
  Here's my issue, the server that we'll eventually roll this out to is
  extremely busy and the every 15min query above has the potential to have a
  huge impact on performance.
 
  Does anyone have any suggestions per a better approach or maybe a way to
  improve the performance for the above query ?
 
 I wouldn't worry about running it every 15 minutes unless it's on a
 REALLY slow machine.
 
 I just ran it in a loop over and over on my 8 core opteron server and
 it ran the load factor up by almost exactly 1.0.  Under our normal
 daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new
 load of running that query over and over.  So, it doesn't seem to be
 blocking or anything.

The internal docs for pg_buffercache_pages.c state:

To get a consistent picture of the buffer state, we must lock all
partitions of the buffer map.  Needless to say, this is horrible
for concurrency.  Must grab locks in increasing order to avoid
possible deadlocks.

I'd be concerned about that running routinely.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause produces wrong query plan

2008-11-24 Thread Andrus

Scott,


And how exactly should it be optimized?  If a query is even moderately
interesting, with a few joins and a where clause, postgresql HAS to
create the rows that come before your offset in order to assure that
it's giving you the right rows.


SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100

It should scan primary key in index order for 200 first keys and skipping 
first 100 keys.



SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100


That should be plenty fast.


The example which I  posted shows that

SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100

this is extremely *slow*: seq scan is  performed over whole bigtable.


A standard workaround is to use some kind of sequential, or nearly so,
id field, and then use between on that field.

select * from table where idfield between x and x+100;


Users can delete and insert any rows in table.
This appoarch requires updating x in every row in big table after each
insert, delete or order column change and is thus extremely slow.
So I do'nt understand how this can be used for large tables.

Andrus.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread PFC



Not yet no, but that's a good suggestion and I do intend to give it a
whirl.  I get about 27MB/s from raid 1 (10 is about the same) so
hopefully I can up the throughput to the speed of about one disk with
sw raid.


	FYI I get more than 200 MB/s out of a Linux Software RAID5 of 3 SATA  
drives (the new Samsung Spinpoints...)

(Intel ICH8 chipset, Core 2 Duo).

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-24 Thread Andrus

Richard,


These are the same but the times are different. I'd be very surprised if
you can reproduce these times reliably.


I re-tried today again and got same results: in production database pattern 
query  is many times slower that equality query.

toode and rid base contain only single product starting with 9910
So both queries should scan exactly same numbers of rows.


Can I give you some wider-ranging suggestions Andrus?
1. Fix the vacuuming issue in your hash-join question.


I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=15
So issue is fixed before those tests.


2. Monitor the system to make sure you know if/when disk activity is high.


I optimized this system. Now there are short (some seconds) sales queries 
about after every 5 - 300 seconds which cause few disk activity and add few 
new rows to some tables.

I havent seen that this activity affects to this test result.


3. *Then* start to profile individual queries and look into their plans.
Change the queries one at a time and monitor again.


How to change pattern matching query to faster ?

Andrus.

Btw.

I tried to reproduce this big difference in test server in 8.3 using sample 
data script below and got big difference but in opposite direction.


explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date'2006-01-01' and ...

different where clauses produce different results:

AND orders_products.product_id = '3370'  -- 880 .. 926 ms
AND  orders_products.product_id like '3370%' -- 41 ..98 ms

So patter index is 10 .. 20 times (!) faster always.
No idea why.

Test data creation script:

begin;
CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
$_$
SELECT 350;
$_$ LANGUAGE SQL;

CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT 
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id 
CHAR(20),

 id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
  'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
FROM generate_series(0, Counter()/3 ) AS n;

SET work_mem TO 2097151;

INSERT INTO orders_products SELECT
  generate_series/3 as  order_id,
  ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS 
product_id

FROM generate_series(1, Counter());

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;


CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );


COMMIT;
SET work_mem TO DEFAULT;
ANALYZE; 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread Andrus

Tomas,


OK, what was the number of unused pointer items in the VACUUM output?


I posted it in this thread:

VACUUM FULL ANALYZE VERBOSE;
...
INFO:  free space map contains 14353 pages in 314 relations
DETAIL:  A total of 2 page slots are in use (including overhead).
89664 page slots are required to track all free space.
Current limits are:  2 page slots, 1000 relations, using 182 KB.
NOTICE:  number of page slots needed (89664) exceeds max_fsm_pages (2)
HINT:  Consider increasing the configuration parameter max_fsm_pages to a
value over 89664.

Query returned successfully with no result in 10513335 ms.


The query performance is still the same as when the tables were bloated?


Seems to be the same.
However I improved yesterday after previous message other queries not to 
scan whole

product orders (rid) table.
Now there is only few disk activity after 5-300 seconds which seems not to
affect to those query results. So issue in this thread has been gone away.

Now this query runs using constant time 8 seconds:

explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode = 'X05' AND dok.kuupaev='2008-09-01'
Aggregate  (cost=182210.77..182210.78 rows=1 width=0) (actual
time=8031.600..8031.604 rows=1 loops=1)
  -  Nested Loop  (cost=74226.74..182149.27 rows=24598 width=0) (actual
time=2602.474..7948.121 rows=21711 loops=1)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.077..0.089 rows=1 loops=1)
  Index Cond: ('X05'::bpchar = toode)
-  Hash Join  (cost=74226.74..181897.28 rows=24598 width=24)
(actual time=2602.378..7785.315 rows=21711 loops=1)
  Hash Cond: (outer.dokumnr = inner.dokumnr)
  -  Bitmap Heap Scan on rid  (cost=4084.54..101951.60
rows=270725 width=28) (actual time=1129.925..4686.601 rows=278417 loops=1)
Recheck Cond: (toode = 'X05'::bpchar)
-  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4084.54 rows=270725 width=0) (actual time=1123.202..1123.202
rows=278426 loops=1)
  Index Cond: (toode = 'X05'::bpchar)
  -  Hash  (cost=69419.29..69419.29 rows=112766 width=4)
(actual time=1251.496..1251.496 rows=111088 loops=1)
-  Bitmap Heap Scan on dok  (cost=1492.68..69419.29
rows=112766 width=4) (actual time=70.837..776.249 rows=111088 loops=1)
  Recheck Cond: (kuupaev = '2008-09-01'::date)
  -  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.68 rows=112766 width=0) (actual time=64.177..64.177
rows=111343 loops=1)
Index Cond: (kuupaev =
'2008-09-01'::date)
Total runtime: 8031.905 ms


Interestingly using like is 80 times faster:

explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like 'X05' AND dok.kuupaev='2008-09-01'
Aggregate  (cost=88178.69..88178.70 rows=1 width=0) (actual
time=115.335..115.339 rows=1 loops=1)
  -  Hash Join  (cost=71136.22..88117.36 rows=24532 width=0) (actual
time=115.322..115.322 rows=0 loops=1)
Hash Cond: (outer.toode = inner.toode)
-  Hash Join  (cost=70163.36..86253.20 rows=24598 width=24)
(actual time=0.046..0.046 rows=0 loops=1)
  Hash Cond: (outer.dokumnr = inner.dokumnr)
  -  Bitmap Heap Scan on rid  (cost=21.16..6307.52 rows=270725
width=28) (actual time=0.037..0.037 rows=0 loops=1)
Filter: (toode ~~ 'X05'::text)
-  Bitmap Index Scan on rid_toode_pattern_idx
(cost=0.00..21.16 rows=1760 width=0) (actual time=0.028..0.028 rows=0
loops=1)
  Index Cond: (toode ~=~ 'X05'::bpchar)
  -  Hash  (cost=69419.29..69419.29 rows=112766 width=4)
(never executed)
-  Bitmap Heap Scan on dok  (cost=1492.68..69419.29
rows=112766 width=4) (never executed)
  Recheck Cond: (kuupaev = '2008-09-01'::date)
  -  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.68 rows=112766 width=0) (never executed)
Index Cond: (kuupaev =
'2008-09-01'::date)
-  Hash  (cost=853.29..853.29 rows=13429 width=24) (actual
time=114.757..114.757 rows=13412 loops=1)
  -  Seq Scan on toode  (cost=0.00..853.29 rows=13429
width=24) (actual time=0.014..58.319 rows=13412 loops=1)
Total runtime: 115.505 ms

I posted also a test script in other thread which shows also that like is
magitude faster than equality check.

rid.toode = 'X05'

and

rid.toode like 'X05'

are exactly the same conditions, there are indexes for both conditions.

So I do'nt understand why results are so different.

In other sample which I posted in thread Increasing pattern index query
speed like is 4 times slower:

SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode = '9910' AND 

Re: [PERFORM] Monitoring buffercache...

2008-11-24 Thread Scott Marlowe
On Mon, Nov 24, 2008 at 12:52 PM, Brad Nicholson
[EMAIL PROTECTED] wrote:
 I just ran it in a loop over and over on my 8 core opteron server and
 it ran the load factor up by almost exactly 1.0.  Under our normal
 daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new
 load of running that query over and over.  So, it doesn't seem to be
 blocking or anything.

 The internal docs for pg_buffercache_pages.c state:

 To get a consistent picture of the buffer state, we must lock all
 partitions of the buffer map.  Needless to say, this is horrible
 for concurrency.  Must grab locks in increasing order to avoid
 possible deadlocks.

Well, the pg hackers tend to take a parnoid view (it's a good thing
TM) on things like this.  My guess is that the period of time for
which pg_buffercache takes locks on the buffer map are short enough
that it isn't a real big deal on a fast enough server.  On mine, it
certainly had no real negative effects for the 5 minutes or so it was
running in a loop.  None I could see, and we run hundreds of queries
per second on our system.

Of course, for certain other types of loads it could be a much bigger
issue.  But for our load, on our machine, it was virtually
unnoticeable.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause produces wrong query plan

2008-11-24 Thread Andrus

it was veery fast. To be honest I do not know what is happening?!


This is really weird.
It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical 
paging queries


SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

or even first page query

SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

cannot be used in PostgreSql at all for big tables.

Do you have any idea how to fix this ?

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance