Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

2010-06-03 Thread Matthew Wakeling

On Wed, 2 Jun 2010, Jori Jovanovich wrote:

(2) Making the query faster by making the string match LESS specific (odd,
seems like it should be MORE)


No, that's the way round it should be. The LIMIT changes it all. Consider 
if you have a huge table, and half of the entries match your WHERE clause. 
To fetch the ORDER BY ... LIMIT 20 using an index scan would involve 
accessing only on average 40 entries from the table referenced by the 
index. Therefore, the index is quick. However, consider a huge table that 
only has twenty matching entries. The index scan would need to touch every 
single row in the table to return the matching rows, so a sequential scan, 
filter, and sort would be much faster. Of course, if you had an index 
capable of answering the WHERE clause, that would be even better for that 
case.


Matthew

--
Don't criticise a man until you have walked a mile in his shoes; and if
you do at least he will be a mile behind you and bare footed.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Merlin Moncure
On Wed, Jun 2, 2010 at 7:30 PM, Craig James craig_ja...@emolecules.com wrote:
 I'm testing/tuning a new midsize server and ran into an inexplicable
 problem.  With an RAID10 drive, when I move the WAL to a separate RAID1
 drive, TPS drops from over 1200 to less than 90!   I've checked everything
 and can't find a reason.

 Here are the details.

 8 cores (2x4 Intel Nehalem 2 GHz)
 12 GB memory
 12 x 7200 SATA 500 GB disks
 3WARE 9650SE-12ML RAID controller with bbu
  2 disks: RAID1  500GB ext4  blocksize=4096
  8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 (ext4 or xfs - see
 below)
  2 disks: hot swap
 Ubuntu 10.04 LTS (Lucid)

 With xfs or ext4 on the RAID10 I got decent bonnie++ and pgbench results
 (this one is for xfs):

 Version 1.03e       --Sequential Output-- --Sequential Input-
 --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
 %CP
 argon        24064M 70491  99 288158  25 129918  16 65296  97 428210  23
 558.9   1
                    --Sequential Create-- Random
 Create
                    -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
 %CP
                 16 23283  81 + +++ 13775  56 20143  74 + +++ 15152
  54
 argon,24064M,70491,99,288158,25,129918,16,65296,97,428210,23,558.9,1,16,23283,81,+,+++,13775,56,20143\
 ,74,+,+++,15152,54

 pgbench -i -s 100 -U test
 pgbench -c 10 -t 1 -U test
    scaling factor: 100
    query mode: simple
    number of clients: 10
    number of transactions per client: 1
    number of transactions actually processed: 10/10
    tps = 1046.104635 (including connections establishing)
    tps = 1046.337276 (excluding connections establishing)

 Now the mystery: I moved the pg_xlog directory to a RAID1 array (same 3WARE
 controller, two more SATA 7200 disks).  Run the same tests and ...

    tps = 82.325446 (including connections establishing)
    tps = 82.326874 (excluding connections establishing)

 I thought I'd made a mistake, like maybe I moved the whole database to the
 RAID1 array, but I checked and double checked.  I even watched the lights
 blink - the WAL was definitely on the RAID1 and the rest of Postgres on the
 RAID10.

 So I moved the WAL back to the RAID10 array, and performance jumped right
 back up to the 1200 TPS range.

 Next I check the RAID1 itself:

  dd if=/dev/zero of=./bigfile bs=8192 count=200

 which yielded 98.8 MB/sec - not bad.  bonnie++ on the RAID1 pair showed good
 performance too:

 Version 1.03e       --Sequential Output-- --Sequential Input-
 --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
 %CP
 argon        24064M 68601  99 110057  18 46534   6 59883  90 123053   7
 471.3   1
                    --Sequential Create-- Random
 Create
                    -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
 %CP
                 16 + +++ + +++ + +++ + +++ + +++ +
 +++
 argon,24064M,68601,99,110057,18,46534,6,59883,90,123053,7,471.3,1,16,+,+++,+,+++,+,+++,+,\
 +++,+,+++,+,+++

 So ... anyone have any idea at all how TPS drops to below 90 when I move the
 WAL to a separate RAID1 disk?  Does this make any sense at all?  It's
 repeatable. It happens for both ext4 and xfs. It's weird.

 You can even watch the disk lights and see it: the RAID10 disks are on
 almost constantly when the WAL is on the RAID10, but when you move the WAL
 over to the RAID1, its lights are dim and flicker a lot, like it's barely
 getting any data, and the RAID10 disk's lights barely go on at all.

*) Is your raid 1 configured writeback cache on the controller?
*) have you tried changing wal_sync_method to fdatasync?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

2010-06-03 Thread Jori Jovanovich
hi,

I'm sorry for not posting this first.

The server is the following and is being used exclusively for this
PostgreSQL instance:

PostgreSQL 8.4.2 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.2.real (GCC)
4.2.4 (Ubuntu 4.2.4-1ubuntu4), 64-bit
Amazon EC2 Large Instance, 7.5GB memory, 64-bit

This is what is set in my postgresql.conf file:

max_connections = 100
ssl = true
shared_buffers = 24MB

ANALYZE VERBOSE EVENTS;
INFO:  analyzing public.events
INFO:  events: scanned 3 of 211312 pages, containing 1725088 live rows
and 0 dead rows; 3 rows in sample, 12151060 estimated total rows

Updating statistics did not effect the results -- it's still doing full
table scans (I had run statistics as well before posting here as well so
this was expected).

thank you

On Wed, Jun 2, 2010 at 8:49 PM, Bob Lunney bob_lun...@yahoo.com wrote:

 Jori,

 What is the PostgreSQL
 version/shared_buffers/work_mem/effective_cache_size/default_statistics_target?
 Are the statistics for the table up to date?  (Run analyze verbose
 tablename to update them.)  Table and index structure would be nice to
 know, too.

 If all else fails you can set enable_seqscan = off for the session, but
 that is a Big Hammer for what is probably a smaller problem.

 Bob Lunney

 --- On *Wed, 6/2/10, Jori Jovanovich j...@dimensiology.com* wrote:


 From: Jori Jovanovich j...@dimensiology.com
 Subject: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT
 present
 To: pgsql-performance@postgresql.org
 Date: Wednesday, June 2, 2010, 4:28 PM


 hi,

 I have a problem space where the main goal is to search backward in time
 for events.  Time can go back very far into the past, and so the
 table can get quite large.  However, the vast majority of queries are all
 satisfied by relatively recent data.  I have an index on the row creation
 date and I would like almost all of my queries to have a query plan looking
 something like:

  Limit ...
-  Index Scan Backward using server_timestamp_idx on events
  (cost=0.00..623055.91 rows=8695 width=177)
  ...

 However, PostgreSQL frequently tries to do a full table scan.  Often what
 controls whether a scan is performed or not is dependent on the size of the
 LIMIT and how detailed the WHERE clause is.  In practice, the scan is always
 the wrong answer for my use cases (where always is defined to be 99.9%).

 Some examples:

 (1) A sample query that devolves to a full table scan

   EXPLAIN
SELECT events.id, events.client_duration, events.message,
 events.created_by, events.source, events.type, events.event,
 events.environment,
   events.server_timestamp, events.session_id, events.reference,
 events.client_uuid
  FROM events
 WHERE client_uuid ~* E'^foo bar so what'
  ORDER BY server_timestamp DESC
 LIMIT 20;
 QUERY PLAN (BAD!)
 --
  Limit  (cost=363278.56..363278.61 rows=20 width=177)
-  Sort  (cost=363278.56..363278.62 rows=24 width=177)
  Sort Key: server_timestamp
  -  Seq Scan on events  (cost=0.00..363278.01 rows=24 width=177)
Filter: (client_uuid ~* '^foo bar so what'::text)


 (2) Making the query faster by making the string match LESS specific (odd,
 seems like it should be MORE)

   EXPLAIN
SELECT events.id, events.client_duration, events.message,
 events.created_by, events.source, events.type, events.event,
 events.environment,
   events.server_timestamp, events.session_id, events.reference,
 events.client_uuid
  FROM events
 WHERE client_uuid ~* E'^foo'
  ORDER BY server_timestamp DESC
 LIMIT 20;
 QUERY PLAN (GOOD!)


 
  Limit  (cost=0.00..1433.14 rows=20 width=177)
-  Index Scan Backward using server_timestamp_idx on events
  (cost=0.00..623055.91 rows=8695 width=177)
  Filter: (client_uuid ~* '^foo'::text)


 (3) Alternatively making the query faster by using a smaller limit

   EXPLAIN
SELECT events.id, events.client_duration, events.message,
 events.created_by, events.source, events.type, events.event,
 events.environment,
   events.server_timestamp, events.session_id, events.reference,
 events.client_uuid
  FROM events
 WHERE client_uuid ~* E'^foo bar so what'
  ORDER BY server_timestamp DESC
 LIMIT 10;
 QUERY PLAN (GOOD!)


 --
  Limit  (cost=0.00..259606.63 rows=10 width=177)
-  Index Scan Backward using server_timestamp_idx on events
  (cost=0.00..623055.91 rows=24 width=177)
  Filter: (client_uuid ~* '^foo bar so what'::text)


 I find myself wishing I could just put a SQL HINT on the query to force the
 index to be used but I understand that HINTs are 

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Greg Smith

Craig James wrote:
I'm testing/tuning a new midsize server and ran into an inexplicable 
problem.  With an RAID10 drive, when I move the WAL to a separate 
RAID1 drive, TPS drops from over 1200 to less than 90!


Normally 100 TPS means that the write cache on the WAL drive volume is 
disabled (or set to write-through instead of write-back).  When things 
in this area get fishy, I will usually download sysbench and have it 
specifically test how many fsync calls can happen per second.  
http://projects.2ndquadrant.com/talks , Database Hardware 
Benchmarking, page 28 has an example of the right incantation for that.


Also, make sure you run 3ware's utilities and confirm all the disks have 
finished their initialization and verification stages.  If you just 
adjusted disk layout that and immediate launched into benchmarks, those 
are useless until the background cleanup is done.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Craig James

On 6/2/10 4:40 PM, Mark Kirkwood wrote:

On 03/06/10 11:30, Craig James wrote:

I'm testing/tuning a new midsize server and ran into an inexplicable
problem. With an RAID10 drive, when I move the WAL to a separate RAID1
drive, TPS drops from over 1200 to less than 90! I've checked
everything and can't find a reason.


Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones?


Yes.


Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  barrier=1 isn't just a bad idea on ext4, it's a 
disaster.

pgbench -i -s 100 -U test
pgbench -c 10 -t 1 -U test

Change WAL to barrier=0

tps = 1463.264981 (including connections establishing)
tps = 1463.725687 (excluding connections establishing)

Change WAL to noatime, nodiratime, barrier=0

tps = 1479.331476 (including connections establishing)
tps = 1479.810545 (excluding connections establishing)

Change WAL to barrier=1

tps = 82.325446 (including connections establishing)
tps = 82.326874 (excluding connections establishing)

This is really hard to believe, because the bonnie++ numbers and dd(1) numbers look good 
(see my original post).  But it's totally repeatable.  It must be some really unfortunate 
just missed the next sector going by the write head problem.

So with ext4, bonnie++ and dd aren't the whole story.

BTW, I also learned that if you edit /etc/fstab and use mount -oremount it WON'T change barrier=0/1 
unless it is explicit in the fstab file.  That is, if you put barrier=0 into /etc/fstab and use the remount, it will 
change it to no barriers.  But if you then remove it from /etc/fstab, it won't change it back to the default.  You have to 
actually put barrier=1 if you want to get it back to the default.  This seems like a bug to me, and it made it really 
hard to track this down. mount -oremount is not the same as umount/mount!

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Matthew Wakeling

On Thu, 3 Jun 2010, Craig James wrote:

Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  barrier=1 isn't just a bad idea on ext4, it's a 
disaster.


This worries me a little. Does your array have a battery-backed cache? If 
so, then it should be fast regardless of barriers (although barriers may 
make a small difference). If it does not, then it is likely that the fast 
speed you are seeing with barriers off is unsafe.


There should be no just missed the sector going past for write problem 
ever with a battery-backed cache.


Matthew

--
There once was a limerick .sig
that really was not very big
It was going quite fine
Till it reached the fourth line

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 On Thu, 3 Jun 2010, Craig James wrote:
 Also, are barriers *on* on the RAID1 mount and off on the RAID10
one?

 It was the barriers.  barrier=1 isn't just a bad idea on ext4,
 it's a disaster.
 
 This worries me a little. Does your array have a battery-backed
 cache? If so, then it should be fast regardless of barriers
 (although barriers may make a small difference). If it does not,
 then it is likely that the fast speed you are seeing with barriers
 off is unsafe.
 
I've seen this, too (with xfs).  Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync).  This
does strike me as surprising to the point of bordering on qualifying
as a bug.  It means that you can't take advantage of the BBU cache
and get the benefit of write barriers in OS cache behavior.  :-(
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow query performance

2010-06-03 Thread Anj Adu
I cant seem to pinpoint why this query is slow . No full table scans
are being done. The hash join is taking maximum time. The table
dev4_act_action has only 3 rows.

box is a 2 cpu quad core intel 5430 with 32G RAM... Postgres 8.4.0
1G work_mem
20G effective_cache
random_page_cost=1
default_statistics_target=1000

The larget table  in the inner query is dev4_act_dy_fact which is
partitioned into 3 partitions per month. Each partition has about 25
million rows.
The rest of the tables are very small (100- 1000 rows)

explain analyze
select ipconvert(srctest_num),CASE targetpt::character varying
WHEN NULL::text THEN serv.targetsrv
ELSE targetpt::character varying
END AS targetsrv, sesstype,hits as cons,bytes, srcz.srcarea as
srcz, dstz.dstarea as dstz from
(
select srctest_num, targetpt,targetsrv_id, sesstype_id, sum(total) as
hits, sum(bin) + sum(bout) as bts, sourcearea_id, destinationarea_id
 from dev4_act_dy_fact a, dev4_act_action act where thedate between
'2010-05-22' and '2010-05-22'
 and a.action_id = act.action_id and action in ('rejected','sess_rejected')
 and guardid_id in (select guardid_id from dev4_act_guardid where
guardid like 'cust00%')
 and node_id=(select node_id from dev4_act_node where node='10.90.100.2')
 group by srctest_num,targetpt,targetsrv_id,sesstype_id,
sourcearea_id, destinationarea_id
  order by (sum(bin) + sum(bout)) desc
 limit 1000
 ) a left outer join dev4_act_dstarea dstz on a.destinationarea_id =
dstz.dstarea_id
 left outer join dev4_act_srcarea srcz on a.sourcearea_id = srcz.srcarea_id
 left outer join  dev4_act_targetsrv serv on a.targetsrv_id = serv.targetsrv_id
 left outer join dev4_sesstype proto on a.sesstype_id = proto.sesstype_id
 order by bytes desc



Nested Loop Left Join  (cost=95392.32..95496.13 rows=20 width=510)
(actual time=164533.831..164533.831 rows=0 loops=1)
  -  Nested Loop Left Join  (cost=95392.32..95473.43 rows=20
width=396) (actual time=164533.830..164533.830 rows=0 loops=1)
-  Nested Loop Left Join  (cost=95392.32..95455.83 rows=20
width=182) (actual time=164533.829..164533.829 rows=0 loops=1)
  -  Nested Loop Left Join  (cost=95392.32..95410.17
rows=20 width=186) (actual time=164533.829..164533.829 rows=0
loops=1)
-  Limit  (cost=95392.32..95392.37 rows=20
width=52) (actual time=164533.828..164533.828 rows=0 loops=1)
  InitPlan 1 (returns $0)
-  Index Scan using dev4_act_node_uindx
on dev4_act_node  (cost=0.00..2.27 rows=1 width=4) (actual
time=0.052..0.052 rows=0 loops=1)
  Index Cond: ((node)::text =
'10.90.100.2'::text)
  -  Sort  (cost=95390.05..95390.10 rows=20
width=52) (actual time=164533.826..164533.826 rows=0 loops=1)
Sort Key: ((sum(a.bin) + sum(a.bout)))
Sort Method:  quicksort  Memory: 17kB
-  HashAggregate
(cost=95389.22..95389.62 rows=20 width=52) (actual
time=164533.796..164533.796 rows=0 loops=1)
  -  Nested Loop Semi Join
(cost=7.37..95388.77 rows=20 width=52) (actual
time=164533.793..164533.793 rows=0 loops=1)
-  Hash Join
(cost=7.37..94836.75 rows=2043 width=56) (actual
time=164533.792..164533.792 rows=0 loops=1)
  Hash Cond:
(a.action_id = act.action_id)
  -  Append
(cost=2.80..94045.71 rows=204277 width=60) (actual
time=164533.790..164533.790 rows=0 loops=1)
-  Bitmap
Heap Scan on dev4_act_dy_fact a  (cost=2.80..3.82 rows=1 width=60)
(actual time=0.064..0.064 rows=0 loops=1)
  Recheck
Cond: ((node_id = $0) AND (thedate = '2010-05-22 00:00:00'::timestamp
without time area) AND (thedate = '2010-05-22 00:00:00'::timestamp
without time area))
  -
BitmapAnd  (cost=2.80..2.80 rows=1 width=0) (actual time=0.062..0.062
rows=0 loops=1)

-  Bitmap Index Scan on dev4_act_dy_dm_nd_indx  (cost=0.00..1.27
rows=3 width=0) (actual time=0.062..0.062 rows=0 loops=1)

Index Cond: (node_id = $0)

-  Bitmap Index Scan on dev4_act_dy_dm_cd_indx  (cost=0.00..1.28
rows=3 width=0) (never executed)

Index Cond: ((thedate = '2010-05-22 00:00:00'::timestamp without
time area) AND (thedate = '2010-05-22 00:00:00'::timestamp without
time area))
-  Index
Scan using dev4_act_dy_fact_2010_05_t3_thedate on
dev4_act_dy_fact_2010_05_t3 a  (cost=0.00..94041.89 rows=204276
width=60) (actual time=164533.725..164533.725 rows=0 loops=1)
  Index
Cond: ((thedate = '2010-05-22 00:00:00'::timestamp without time area)
AND (thedate 

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Greg Smith

Kevin Grittner wrote:

I've seen this, too (with xfs).  Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync).  This
does strike me as surprising to the point of bordering on qualifying
as a bug.
Completely intentional, and documented at 
http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F


The issue is that XFS will actually send the full flush your cache 
call to the controller, rather than just the usual fsync call, and that 
eliminates the benefit of having a write cache there in the first 
place.  Good controllers respect that and flush their whole write cache 
out.  And ext4 has adopted the same mechanism.  This is very much a good 
thing from the perspective of database reliability for people with 
regular hard drives who don't have a useful write cache on their cheap 
hard drives.  It allows them to keep the disk's write cache on for other 
things, while still getting the proper cache flushes when the database 
commits demand them.  It does mean that everyone with a non-volatile 
battery backed cache, via RAID card typically, needs to turn barriers 
off manually.


I've already warned on this list that PostgreSQL commit performance on 
ext4 is going to appear really terrible to many people.  If you 
benchmark and don't recognize ext3 wasn't operating in a reliable mode 
before, the performance drop now that ext4 is doing the right thing with 
barriers looks impossibly bad.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Greg Smith

Craig James wrote:
This is really hard to believe, because the bonnie++ numbers and dd(1) 
numbers look good (see my original post).  But it's totally 
repeatable.  It must be some really unfortunate just missed the next 
sector going by the write head problem.


Commit performance is a separate number to measure that is not reflected 
in any benchmark that tests sequential performance.  I consider it the 
fourth axis of disk system performance (seq read, seq write, random 
IOPS, commit rate), and directly measure it with the sysbench fsync test 
I recommended already.  (You can do it with the right custom pgbench 
script too).


You only get one commit per rotation on a drive, which is exactly what 
you're seeing:  a bit under the 120 spins/second @ 7200 RPM.  Attempts 
to time things just right to catch more than one sector per spin are 
extremely difficult to accomplish, I spent a week on that once without 
making any good progress.  You can easily get 100MB/s on reads and 
writes but only manage 100 commits/second.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 Kevin Grittner wrote:
 I've seen this, too (with xfs).  Our RAID controller, in spite of
 having BBU cache configured for writeback, waits for actual
 persistence on disk for write barriers (unlike for fsync).  This
 does strike me as surprising to the point of bordering on
 qualifying as a bug.
 Completely intentional, and documented at 

http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
 
Yeah, I read that long ago and I've disabled write barriers because
of it; however, it still seems wrong that the RAID controller
insists on flushing to the drives in write-back mode.  Here are my
reasons for wishing it was otherwise:
 
(1)  We've had batteries on our RAID controllers fail occasionally. 
The controller automatically degrades to write-through, and we get
an email from the server and schedule a tech to travel to the site
and replace the battery; but until we take action we are now exposed
to possible database corruption.  Barriers don't automatically come
on when the controller flips to write-through mode.
 
(2)  It precludes any possibility of moving from fsync techniques to
write barrier techniques for ensuring database integrity.  If the OS
respected write barriers and the controller considered the write
satisfied when it hit BBU cache, write barrier techniques would
work, and checkpoints could be made smoother.  Think how nicely that
would inter-operate with point (1).
 
So, while I understand it's Working As Designed, I think the design
is surprising and sub-optimal.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-03 Thread Andy Colson

On 6/3/2010 12:47 PM, Anj Adu wrote:

I cant seem to pinpoint why this query is slow . No full table scans
are being done. The hash join is taking maximum time. The table
dev4_act_action has only 3 rows.

box is a 2 cpu quad core intel 5430 with 32G RAM... Postgres 8.4.0
1G work_mem
20G effective_cache
random_page_cost=1
default_statistics_target=1000

The larget table  in the inner query is dev4_act_dy_fact which is
partitioned into 3 partitions per month. Each partition has about 25
million rows.
The rest of the tables are very small (100- 1000 rows)

explain analyze
select ipconvert(srctest_num),CASE targetpt::character varying
 WHEN NULL::text THEN serv.targetsrv
 ELSE targetpt::character varying
 END AS targetsrv, sesstype,hits as cons,bytes, srcz.srcarea as
srcz, dstz.dstarea as dstz from
(
select srctest_num, targetpt,targetsrv_id, sesstype_id, sum(total) as
hits, sum(bin) + sum(bout) as bts, sourcearea_id, destinationarea_id
  from dev4_act_dy_fact a, dev4_act_action act where thedate between
'2010-05-22' and '2010-05-22'
  and a.action_id = act.action_id and action in ('rejected','sess_rejected')
  and guardid_id in (select guardid_id from dev4_act_guardid where
guardid like 'cust00%')
  and node_id=(select node_id from dev4_act_node where node='10.90.100.2')
  group by srctest_num,targetpt,targetsrv_id,sesstype_id,
sourcearea_id, destinationarea_id
   order by (sum(bin) + sum(bout)) desc
  limit 1000
  ) a left outer join dev4_act_dstarea dstz on a.destinationarea_id =
dstz.dstarea_id
  left outer join dev4_act_srcarea srcz on a.sourcearea_id = srcz.srcarea_id
  left outer join  dev4_act_targetsrv serv on a.targetsrv_id = serv.targetsrv_id
  left outer join dev4_sesstype proto on a.sesstype_id = proto.sesstype_id
  order by bytes desc





Wow, the word wrap on that makes it hard to read... can you paste it 
here and send us a link?


http://explain.depesz.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Scott Marlowe
On Thu, Jun 3, 2010 at 12:40 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Yeah, I read that long ago and I've disabled write barriers because
 of it; however, it still seems wrong that the RAID controller
 insists on flushing to the drives in write-back mode.  Here are my
 reasons for wishing it was otherwise:

I think it's a case of the quickest, simplest answer to semi-new tech.
 Not sure what to do with barriers?  Just flush the whole cache.

I'm guessing that this will get optimized in the future.

BTW, I'll have LSI Megaraid latest and greatest to test on in a month,
and older Areca 1680s as well. I'll be updating the firmware on the
arecas, and I'll run some tests on the whole barrier behaviour to see
if it's gotten any better lately.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Kevin Grittner
Scott Marlowe scott.marl...@gmail.com wrote:
 
 I think it's a case of the quickest, simplest answer to semi-new
 tech.  Not sure what to do with barriers?  Just flush the whole
 cache.
 
 I'm guessing that this will get optimized in the future.
 
Let's hope so.
 
That reminds me, the write barrier concept is at least on the
horizon as a viable technology; does anyone know if the asynchronous
graphs concept in this (one page) paper ever came to anything?  (I
haven't hear anything about it lately.)
 
http://www.usenix.org/events/fast05/wips/burnett.pdf
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Greg Smith

Scott Marlowe wrote:

I think it's a case of the quickest, simplest answer to semi-new tech.
 Not sure what to do with barriers?  Just flush the whole cache.
  


Well, that really is the only useful thing you can do with regular SATA 
drives; the ATA command set isn't any finer grained than that in a way 
that's useful for this context.  And it's also quite reasonable for a 
RAID controller to respond to that flush the whole cache call by 
flushing its cache.  So it's not just the simplest first answer, I 
believe it's the only answer until a better ATA command set becomes 
available.


I think this can only be resolved usefully for all of us at the RAID 
firmware level.  If the controller had some logic that said it's OK to 
not flush the cache when that call comes in if my battery is working 
fine, that would make this whole problem go away.  I don't expect it's 
possible to work around the exact set of concerns Kevin listed any other 
way, because as he pointed out the right thing to do is very dependent 
on the battery health, which the OS also doesn't know (again, would 
require some new command set verbage).


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Scott Marlowe
On Thu, Jun 3, 2010 at 1:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 I think it's a case of the quickest, simplest answer to semi-new tech.
  Not sure what to do with barriers?  Just flush the whole cache.


 Well, that really is the only useful thing you can do with regular SATA
 drives; the ATA command set isn't any finer grained than that in a way
 that's useful for this context.  And it's also quite reasonable for a RAID
 controller to respond to that flush the whole cache call by flushing its
 cache.  So it's not just the simplest first answer, I believe it's the only
 answer until a better ATA command set becomes available.

 I think this can only be resolved usefully for all of us at the RAID
 firmware level.  If the controller had some logic that said it's OK to not
 flush the cache when that call comes in if my battery is working fine,

That's what already happens for fsync on a BBU controller, so I don't
think the code to do so would be something fancy and new, just a
simple change of logic on which code path to take.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 I think this can only be resolved usefully for all of us at the
 RAID firmware level.  If the controller had some logic that said
 it's OK to not flush the cache when that call comes in if my
 battery is working fine, that would make this whole problem go
 away.
 
That is exactly what I've been trying to suggest.  Sorry for not
being more clear about it.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-03 Thread Scott Marlowe
On Thu, Jun 3, 2010 at 11:19 AM, Torsten Zühlsdorff
f...@meisterderspiele.de wrote:
 Scott Marlowe schrieb:

 On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff
 f...@meisterderspiele.de wrote:

 Hello,

 i have a set of unique data which about 150.000.000 rows. Regullary i get
 a
 list of data, which contains multiple times of rows than the already
 stored
 one. Often around 2.000.000.000 rows. Within this rows are many
 duplicates
 and often the set of already stored data.
 I want to store just every entry, which is not within the already stored
 one. Also i do not want to store duplicates. Example:

 The standard method in pgsql is to load the data into a temp table
 then insert where not exists in old table.

 Sorry, i didn't get it. I've googled some examples, but no one match at my
 case. Every example i found was a single insert which should be done or
 ignored, if the row is already stored.

 But in my case i have a bulk of rows with duplicates. Either your tipp
 doesn't match my case or i didn't unterstand it correctly. Can you provide a
 simple example?

create table main (id int primary key, info text);
create table loader (id int, info text);
insert into main values (1,'abc'),(2,'def'),(3,'ghi');
insert into loader values (1,'abc'),(4,'xyz');
select * from main;
 id | info
+--
  1 | abc
  2 | def
  3 | ghi
(3 rows)

select * from loader;
 id | info
+--
  1 | abc
  4 | xyz
(2 rows)

insert into main select * from loader except select * from main;
select * from main;
 id | info
+--
  1 | abc
  2 | def
  3 | ghi
  4 | xyz
(4 rows)

Note that for the where not exists to work the fields would need to be
all the same, or you'd need a more complex query.  If the info field
here was different you'd get an error an no insert / update.  For that
case you might want to use where not in:

insert into main select * from loader where id not in (select id from main);

If you wanted the new rows to update pre-existing rows, then you could
run an update first where the ids matched, then the insert where no id
matches.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-03 Thread Anj Adu
Link to plan

http://explain.depesz.com/s/kHa

On Thu, Jun 3, 2010 at 11:43 AM, Andy Colson a...@squeakycode.net wrote:
 On 6/3/2010 12:47 PM, Anj Adu wrote:

 I cant seem to pinpoint why this query is slow . No full table scans
 are being done. The hash join is taking maximum time. The table
 dev4_act_action has only 3 rows.

 box is a 2 cpu quad core intel 5430 with 32G RAM... Postgres 8.4.0
 1G work_mem
 20G effective_cache
 random_page_cost=1
 default_statistics_target=1000

 The larget table  in the inner query is dev4_act_dy_fact which is
 partitioned into 3 partitions per month. Each partition has about 25
 million rows.
 The rest of the tables are very small (100- 1000 rows)

 explain analyze
 select ipconvert(srctest_num),CASE targetpt::character varying
             WHEN NULL::text THEN serv.targetsrv
             ELSE targetpt::character varying
         END AS targetsrv, sesstype,hits as cons,bytes, srcz.srcarea as
 srcz, dstz.dstarea as dstz from
 (
 select srctest_num, targetpt,targetsrv_id, sesstype_id, sum(total) as
 hits, sum(bin) + sum(bout) as bts, sourcearea_id, destinationarea_id
  from dev4_act_dy_fact a, dev4_act_action act where thedate between
 '2010-05-22' and '2010-05-22'
  and a.action_id = act.action_id and action in
 ('rejected','sess_rejected')
  and guardid_id in (select guardid_id from dev4_act_guardid where
 guardid like 'cust00%')
  and node_id=(select node_id from dev4_act_node where node='10.90.100.2')
  group by srctest_num,targetpt,targetsrv_id,sesstype_id,
 sourcearea_id, destinationarea_id
   order by (sum(bin) + sum(bout)) desc
  limit 1000
  ) a left outer join dev4_act_dstarea dstz on a.destinationarea_id =
 dstz.dstarea_id
  left outer join dev4_act_srcarea srcz on a.sourcearea_id =
 srcz.srcarea_id
  left outer join  dev4_act_targetsrv serv on a.targetsrv_id =
 serv.targetsrv_id
  left outer join dev4_sesstype proto on a.sesstype_id = proto.sesstype_id
  order by bytes desc




 Wow, the word wrap on that makes it hard to read... can you paste it here
 and send us a link?

 http://explain.depesz.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-03 Thread Cédric Villemain
2010/6/1 Torsten Zühlsdorff f...@meisterderspiele.de:
 Hello,

 i have a set of unique data which about 150.000.000 rows. Regullary i get a
 list of data, which contains multiple times of rows than the already stored
 one. Often around 2.000.000.000 rows. Within this rows are many duplicates
 and often the set of already stored data.
 I want to store just every entry, which is not within the already stored
 one. Also i do not want to store duplicates. Example:

 Already stored set:
 a,b,c

 Given set:
 a,b,a,c,d,a,c,d,b

 Expected set after import:
 a,b,c,d

 I now looking for a faster way for the import. At the moment i import the
 new data with copy into an table 'import'. then i remove the duplicates and
 insert every row which is not already known. after that import is truncated.

 Is there a faster way? Should i just insert every row and ignore it, if the
 unique constrain fails?

 Here the simplified table-schema. in real life it's with partitions:
 test=# \d urls
                         Tabelle »public.urls«
  Spalte |   Typ   |                       Attribute
 +-+---
  url_id | integer | not null default nextval('urls_url_id_seq'::regclass)
  url    | text    | not null
 Indexe:
    »urls_url« UNIQUE, btree (url)
    »urls_url_id« btree (url_id)

 Thanks for every hint or advice! :)

I think you need to have a look at pgloader. It does COPY with error
handling. very effective.

http://pgloader.projects.postgresql.org/


 Greetings from Germany,
 Torsten
 --
 http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
 verschiedenen Datenbanksystemen abstrahiert,
 Queries von Applikationen trennt und automatisch die Query-Ergebnisse
 auswerten kann.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain   2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow query

2010-06-03 Thread Anj Adu
I am reposting as my original query was mangled

The link to the explain plan is here as it does not paste well into
the email body.

http://explain.depesz.com/s/kHa


The machine is a 2 cpu quad core 5430 with 32G RAM and 6x450G 15K
single raid-10 array

1G work_mem
default_statistics_target=1000
random_page_cost=1

I am curious why the hash join takes so long. The main table
dev4_act_dy_fact_2010_05_t has 25 million rows. The table is
partitioned into 3 parts per month. Remaining tables are very small (
 1000 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance tuning for postgres

2010-06-03 Thread Yogesh Naik
Hi

I am performing a DB insertion and update for 3000+ records and while doing so 
i get CPU utilization
to 100% with 67% of CPU used by postgres

I have also done optimization on queries too...

Is there any way to optimized the CPU utilization for postgres

I am currently using postgres 8.3 version...

Help will be appreciated

Regards

Yogesh Naik

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.