Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-14 Thread Greg Stark
Dan Harris <[EMAIL PROTECTED]> writes:

> Well, once every day, but there aren't a ton of inserts or updates  going on a
> daily basis.  Maybe 1,000 total inserts?

It's actually deletes and updates that matter. not inserts.

> I have a feeling I'm going to need to do a cluster soon.  I have done  several
> mass deletes and reloads on it.

CLUSTER effectively does a VACUUM FULL but takes a different approach and
writes out a whole new table, which if there's lots of free space is faster
than moving records around to compact the table.

> I tried that, and indeed it was using an index, although after  reading 
> Simon's
> post, I realize that was kind of dumb to have an  index on a bool. I have 
> since
> removed it.

If there are very few records (like well under 10%) with that column equal to
false (or very few equal to true) then it's not necessarily useless. But
probably more useful is a partial index on some other column.

Something like 

CREATE INDEX ON pk WHERE flag = false;

> No foreign keys or triggers.

Note that I'm talking about foreign keys in *other* tables that refer to
columns in this table. Every update on this table would have to scan those
other tables looking for records referencing the updated rows.


> Ok, so I remounted this drive as ext2 shortly before sending my first  email
> today.  It wasn't enough time for me to notice the ABSOLUTELY  HUGE difference
> in performance change.  Ext3 must really be crappy  for postgres, or at least
> is on this box.  Now that it's ext2, this  thing is flying like never before.
> My CPU utilization has  skyrocketed, telling me that the disk IO was
> constraining it immensely.
> 
> I always knew that it might be a little faster, but the box feels  like it can
> "breathe" again and things that used to be IO intensive  and run for an hour 
> or
> more are now running in < 5 minutes.  I'm a  little worried about not having a
> journalized file system, but that  performance difference will keep me from
> switching back ( at least to  ext3! ).  Maybe someday I will try XFS.

@spock(Fascinating).

I wonder if ext3 might be issuing IDE cache flushes on every fsync (to sync
the journal) whereas ext2 might not be issuing any cache flushes at all.

If the IDE cache is never being flushed then you'll see much better
performance but run the risk of data loss in a power failure or hardware
failure. (But not in the case of an OS crash, or at least no more than
otherwise.)

You could also try using the "-O journal_dev" option to put the ext3 journal
on a separate device.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-14 Thread Dan Harris


On Jul 14, 2005, at 9:47 AM, Alvaro Herrera wrote:


On Thu, Jul 14, 2005 at 12:28:05AM -0600, Dan Harris wrote:


.  Ext3 must really be crappy
for postgres, or at least is on this box.


Were you using the default journal settings for ext3?


Yes, I was.  Next time I get a chance to reboot this box, I will try  
writeback and compare the benchmarks to my previous config.  Thanks  
for the tip.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-14 Thread Alvaro Herrera
On Thu, Jul 14, 2005 at 12:28:05AM -0600, Dan Harris wrote:

> Ok, so I remounted this drive as ext2 shortly before sending my first  
> email today.  It wasn't enough time for me to notice the ABSOLUTELY  
> HUGE difference in performance change.  Ext3 must really be crappy  
> for postgres, or at least is on this box.  Now that it's ext2, this  
> thing is flying like never before.   My CPU utilization has  
> skyrocketed, telling me that the disk IO was constraining it immensely.

Were you using the default journal settings for ext3?

An interesting experiment would be to use the other journal options
(particularly data=writeback).  From the mount manpage:

   data=journal / data=ordered / data=writeback
  Specifies  the  journalling  mode  for  file  data.  Metadata is
  always journaled.  To use modes other than ordered on  the  root
  file system, pass the mode to the kernel as boot parameter, e.g.
  rootflags=data=journal.

  journal
 All data is committed into the  journal  prior  to  being
 written into the main file system.

  ordered
 This  is  the  default mode.  All data is forced directly
 out to the main file system prior to its  metadata  being
 committed to the journal.

  writeback
 Data ordering is not preserved - data may be written into
 the main file system after its metadata has been  commit-
 ted  to the journal.  This is rumoured to be the highest-
 throughput option.  It guarantees  internal  file  system
 integrity,  however  it  can  allow old data to appear in
 files after a crash and journal recovery.


-- 
Alvaro Herrera ()
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 14, 2005, at 12:12 AM, Greg Stark wrote:


Dan Harris <[EMAIL PROTECTED]> writes:



I keep the entire database vacuumed regularly.



How often is "regularly"?
Well, once every day, but there aren't a ton of inserts or updates  
going on a daily basis.  Maybe 1,000 total inserts?


Also, if you've done occasional massive batch updates like you  
describe here
you may need a VACUUM FULL or alternatively a CLUSTER command to  
compact the
table -- vacuum identifies the free space but if you've doubled the  
size of
your table with a large update that's a lot more free space than  
you want

hanging around waiting to be used.

I have a feeling I'm going to need to do a cluster soon.  I have done  
several mass deletes and reloads on it.




For example, as I'm writing this, I am running an UPDATE  
statement  that will
affect a small part of the table, and is querying on an  indexed  
boolean field.



...

update eventactivity set ftindex = false where ftindex = true;   
( added the

where clause because I don't want to alter where ftindex  is null )



It's definitely worthwhile doing an "EXPLAIN UPDATE..." to see if  
this even

used the index. It sounds like it did a sequential scan.



I tried that, and indeed it was using an index, although after  
reading Simon's post, I realize that was kind of dumb to have an  
index on a bool. I have since removed it.


Sequential scans during updates are especially painful. If there  
isn't free
space lying around in the page where the updated record lies then  
another page
has to be used or a new page added. If you're doing a massive  
update you can
exhaust the free space available making the update have to go back  
and forth
between the page being read and the end of the table where pages  
are being

written.


This is great info, thanks.





#

vmstat output ( as I am waiting for this to finish ):
procs ---memory-- ---swap-- -io --system--
cpu
r  b   swpd   freebuff   cache   si   sobibo   in 
cs  us sy id wa
0  1   5436 2823908  26140 918370401  2211   540  694
336   9  2 76 13




[I assume you ran "vmstat 10" or some other interval and then  
waited for at

least the second line? The first line outputted from vmstat is mostly
meaningless]


Yeah, this was at least 10 or so down the list ( the last one before  
ctrl-c )




Um. That's a pretty meager i/o rate. Just over 2MB/s. The cpu is  
76% idle
which sounds fine but that could be one processor pegged at 100%  
while the
others are idle. If this query is the only one running on the  
system then it

would behave just like that.
Well, none of my processors had ever reached 100% until I changed to  
ext2 today ( read below for more info )


Is it possible you have some foreign keys referencing these records  
that
you're updating? In which case every record being updated might be  
causing a
full table scan on another table (or multiple other tables). If  
those tables
are entirely in cache then it could cause these high cpu low i/o  
symptoms.




No foreign keys or triggers.


Ok, so I remounted this drive as ext2 shortly before sending my first  
email today.  It wasn't enough time for me to notice the ABSOLUTELY  
HUGE difference in performance change.  Ext3 must really be crappy  
for postgres, or at least is on this box.  Now that it's ext2, this  
thing is flying like never before.   My CPU utilization has  
skyrocketed, telling me that the disk IO was constraining it immensely.


I always knew that it might be a little faster, but the box feels  
like it can "breathe" again and things that used to be IO intensive  
and run for an hour or more are now running in < 5 minutes.  I'm a  
little worried about not having a journalized file system, but that  
performance difference will keep me from switching back ( at least to  
ext3! ).  Maybe someday I will try XFS.


I would be surprised if everyone who ran ext3 had this kind of  
problem, maybe it's specific to my kernel, raid controller, I don't  
know.  But, this is amazing.  It's like I have a new server.


Thanks to everyone for their valuable input and a big thanks to all  
the dedicated pg developers on here who make this possible!


-Dan


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Greg Stark
Dan Harris <[EMAIL PROTECTED]> writes:

> I keep the entire database vacuumed regularly.

How often is "regularly"? We get frequent posts from people who think daily or
every 4 hours is often enough. If the table is very busy you can need vacuums
as often as every 15 minutes. 

Also, if you've done occasional massive batch updates like you describe here
you may need a VACUUM FULL or alternatively a CLUSTER command to compact the
table -- vacuum identifies the free space but if you've doubled the size of
your table with a large update that's a lot more free space than you want
hanging around waiting to be used.

> For example, as I'm writing this, I am running an UPDATE statement  that will
> affect a small part of the table, and is querying on an  indexed boolean 
> field.
...
> update eventactivity set ftindex = false where ftindex = true;  ( added the
> where clause because I don't want to alter where ftindex  is null )

It's definitely worthwhile doing an "EXPLAIN UPDATE..." to see if this even
used the index. It sounds like it did a sequential scan.

Sequential scans during updates are especially painful. If there isn't free
space lying around in the page where the updated record lies then another page
has to be used or a new page added. If you're doing a massive update you can
exhaust the free space available making the update have to go back and forth
between the page being read and the end of the table where pages are being
written.

> #
> 
> vmstat output ( as I am waiting for this to finish ):
> procs ---memory-- ---swap-- -io --system--
> cpu
> r  b   swpd   freebuff   cache   si   sobibo   incs  us sy id 
> wa
> 0  1   5436 2823908  26140 918370401  2211   540  694   336   9  2 76 
> 13

[I assume you ran "vmstat 10" or some other interval and then waited for at
least the second line? The first line outputted from vmstat is mostly
meaningless]

Um. That's a pretty meager i/o rate. Just over 2MB/s. The cpu is 76% idle
which sounds fine but that could be one processor pegged at 100% while the
others are idle. If this query is the only one running on the system then it
would behave just like that.

Is it possible you have some foreign keys referencing these records that
you're updating? In which case every record being updated might be causing a
full table scan on another table (or multiple other tables). If those tables
are entirely in cache then it could cause these high cpu low i/o symptoms.

Or are there any triggers on this table?


-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Simon Riggs
On Wed, 2005-07-13 at 12:54 -0600, Dan Harris wrote:
> For example, as I'm writing this, I am running an UPDATE statement  
> that will affect a small part of the table, and is querying on an  
> indexed boolean field.

An indexed boolean field?

Hopefully, ftindex is false for very few rows of the table?

Try changing the ftindex to be a partial index, so only index the false
values. Or don't index it at all.

Split the table up into smaller pieces.

Don't use an UPDATE statement. Keep a second table, and insert records
into it when you would have updated previously. If a row is not found,
you know that it has ftindex=true. That way, you'll never have row
versions building up in the main table, which you'll still get even if
you VACUUM.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Vivek Khera


On Jul 13, 2005, at 2:54 PM, Dan Harris wrote:


4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller



if it is taking that long to update about 25% of your table, then you  
must be I/O bound. check I/o while you're running a big query.


also, what RAID controller are you running?  be sure you have the  
latest BIOS and drivers for it.


on a pair of dual opterons, I can do large operations on tables with  
100 million rows much faster than you seem to be able.  I have  
MegaRAID 320-2x controllers with 15kRPM drives.


Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 2:17 PM, Stephen Frost wrote:


Could you come up w/ a test case that others could reproduce where
explain isn't returning?


This was simply due to my n00bness :)  I had always been doing  
explain analyze, instead of just explain.  Next time one of these  
queries comes up, I will be sure to do the explain without analyze.


FYI that update query I mentioned in the initial thread just finished  
after updating 8.3 million rows.


-Dan


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Alvaro Herrera
On Wed, Jul 13, 2005 at 01:16:25PM -0600, Dan Harris wrote:

> On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
> 
> >I might be wrong, but there may be something much more substantially
> >wrong than slow i/o.
> 
> Yes, I'm afraid of that too.  I just don't know what tools I should  
> use to figure that out.  I have some 20 other databases on this  
> system, same schema but varying sizes, and the small ones perform  
> very well.  It feels like there is an O(n) increase in wait time that  
> has recently become very noticeable on the largest of them.

I'd guess it's stuck on some lock.  Try that EXPLAIN, and when it
blocks, watch the pg_locks view for locks not granted to the process
executing the EXPLAIN.  Then check what else is holding the locks.

-- 
Alvaro Herrera ()
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Stephen Frost
* Dan Harris ([EMAIL PROTECTED]) wrote:
> On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:
> >I might be wrong, but there may be something much more substantially
> >wrong than slow i/o.
> 
> Yes, I'm afraid of that too.  I just don't know what tools I should  
> use to figure that out.  I have some 20 other databases on this  
> system, same schema but varying sizes, and the small ones perform  
> very well.  It feels like there is an O(n) increase in wait time that  
> has recently become very noticeable on the largest of them.

Could you come up w/ a test case that others could reproduce where
explain isn't returning?  I think that would be very useful towards
solving at least that issue...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris


On Jul 13, 2005, at 1:11 PM, John A Meinel wrote:



I might be wrong, but there may be something much more substantially
wrong than slow i/o.
John



Yes, I'm afraid of that too.  I just don't know what tools I should  
use to figure that out.  I have some 20 other databases on this  
system, same schema but varying sizes, and the small ones perform  
very well.  It feels like there is an O(n) increase in wait time that  
has recently become very noticeable on the largest of them.


-Dan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread John A Meinel

Dan Harris wrote:

Gurus,



> even the  explain never

finishes when I try that.


Just a short bit. If "EXPLAIN SELECT" doesn't return, there seems to be
a very serious problem. Because I think EXPLAIN doesn't actually run the
query, just has the query planner run. And the query planner shouldn't
ever get heavily stuck.

I might be wrong, but there may be something much more substantially
wrong than slow i/o.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris

So sorry, I forgot to mention I'm running version 8.0.1

Thanks


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Quad Opteron stuck in the mud

2005-07-13 Thread Dan Harris

Gurus,

A table in one of my databases has just crossed the 30 million row  
mark and has begun to feel very sluggish for just about anything I do  
with it.  I keep the entire database vacuumed regularly.  And, as  
long as I'm not doing a sequential scan, things seem reasonably quick  
most of the time.  I'm now thinking that my problem is IO because  
anything that involves heavy ( like a seq scan ) IO seems to slow to  
a crawl.  Even if I am using indexed fields to grab a few thousand  
rows, then going to sequential scans it gets very very slow.


I have also had the occurrence where queries will not finish for days  
( I eventually have to kill them ).  I was hoping to provide an  
explain analyze for them, but if they never finish... even the  
explain never finishes when I try that.


For example, as I'm writing this, I am running an UPDATE statement  
that will affect a small part of the table, and is querying on an  
indexed boolean field.


I have been waiting for over an hour and a half as I write this and  
it still hasn't finished.  I'm thinking "I bet Tom, Simon or Josh  
wouldn't put up with this kind of wait time..", so I thought I would  
see if anyone here had some pointers.  Maybe I have a really stupid  
setting in my conf file that is causing this.  I really can't believe  
I am at the limits of this hardware, however.



The query:
update eventactivity set ftindex = false where ftindex = true;  
( added the where clause because I don't want to alter where ftindex  
is null )




The table:
  Column|Type | Modifiers
-+-+---
entrydate   | timestamp without time zone |
incidentid  | character varying(40)   |
statustype  | character varying(20)   |
unitid  | character varying(20)   |
recordtext  | character varying(255)  |
recordtext2 | character varying(255)  |
insertdate  | timestamp without time zone |
ftindex | boolean |
Indexes: eventactivity1 btree (incidentid),
 eventactivity_entrydate_idx btree (entrydate),
 eventactivity_ftindex_idx btree (ftindex),
 eventactivity_oid_idx btree (oid)




The hardware:

4 x 2.2GHz Opterons
12 GB of RAM
4x10k 73GB Ultra320 SCSI drives in RAID 0+1
1GB hardware cache memory on the RAID controller

The OS:
Fedora, kernel 2.6.6-1.435.2.3smp ( redhat stock kernel )
filesystem is mounted as ext2

#

vmstat output ( as I am waiting for this to finish ):
procs ---memory-- ---swap-- -io --system--  
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us  
sy id wa
0  1   5436 2823908  26140 918370401  2211   540  694   336   
9  2 76 13


#

iostat output ( as I am waiting for this to finish ):
avg-cpu:  %user   %nice%sys %iowait   %idle
   9.190.002.19   13.08   75.53

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0  329.26 17686.03  4317.57  161788630   39496378


#
This is a dedicated postgresql server, so maybe some of these  
settings are more liberal than they should be?


relevant ( I hope ) postgresql.conf options are:

shared_buffers = 5
effective_cache_size = 1348000
random_page_cost = 3
work_mem = 512000
max_fsm_pages = 8
log_min_duration_statement = 6
fsync = true ( not sure if I'm daring enough to run without this )
wal_buffers = 1000
checkpoint_segments = 64
checkpoint_timeout = 3000


# FOR PG_AUTOVACUUM --#
stats_command_string = true
stats_row_level = true

Thanks in advance,
Dan








---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster