File systems (RE: [PERFORM] Sanity check requested)

2003-07-18 Thread Nick Fankhauser

Thanks for the suggestions in the FS types- especially the Debian oriented
info. I'll start by playing with the memory allocation parameters that I
originally listed (seems like they should provide results in a way that is
unaffected by the disk IO). Then once I have them at optimal values, move on
to trying different file systems.

I assume that as I make changes that affect the disk IO performance, I'll
then need to do some testing to find new values for the IO cost for the
planner- Do you folks have some ballpark numbers to start with for this
based on your experience? I'm departing in three ways from the simple IDE
model that (I presume) the default random page cost of 4 is based on- The
disks are SCSI  RAID and the FS would be different.

At this point, I can't think of any better way to test this than simply
running my local test suite with various values and recording the wall-clock
results. Is there a different approach that might make more sense? (This
means that my results will be skewed to my environment, but I'll post them
anyway.)

I'll post results back to the list as I get to it- It might be a slow
process Since I spend about 18 hours of each day keeping the business
running, I'll have to cut back on sleep  do this in the other 10 hours. g

-NF


 Shridhar Daithankar wrote:
 I appreciate your approach but it almost proven that ext2 is
 not the best and fastest out there.

 Agreed.

 Ang Chin Han wrote:
 We've been using ext3fs for our production systems. (Red Hat Advanced
 Server 2.1)

 And since your (Nick) system is based on Debian, I have done some rough
 testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs,
 reiserfs and jfs. Can't get XFS going easily on Debian, though.

 I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after
 the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it.

 jfs's has been underperforming for me. Somehow the CPU usage is higher
 than the other two. As for ext3fs and reiserfs, I can't detect any
 significant difference. So if you're in a hurry, it'll be easier to
 convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd
 be nice if you could do your own testing, and post it to the list.

 --
 Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
 GNU/Linux
2:30pm  up 204 days,  5:35,  5 users,  load average: 5.50, 5.18, 5.13



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Sanity check requested

2003-07-18 Thread scott.marlowe
On Fri, 18 Jul 2003, Ang Chin Han wrote:

 Shridhar Daithankar wrote:
  On 17 Jul 2003 at 10:41, Nick Fankhauser wrote:
  
 I'm using ext2. For now, I'll leave this and the OS version alone. If I
  
  
  I appreciate your approach but it almost proven that ext2 is not the best and 
  fastest out there.
 
 Agreed.

Huh?  How can journaled file systems hope to outrun a simple unjournaled 
file system?  There's just less overhead for ext2 so it's quicker, it's 
just not as reliable.

I point you to this link from IBM:

http://www-124.ibm.com/developerworks/opensource/linuxperf/iozone/iozone.php

While ext3 is a clear loser to jfs and rfs, ext2 wins most of the contests 
against both reiser and jfs.  Note that xfs wasn't tested here.  But in 
general, ext2 is quite fast nowadays.

 
  IMO, you can safely change that to reiserfs or XFS. Or course, testing is 
  always recommended.
 
 We've been using ext3fs for our production systems. (Red Hat Advanced 
 Server 2.1)
 
 And since your (Nick) system is based on Debian, I have done some rough 
 testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs, 
 reiserfs and jfs. Can't get XFS going easily on Debian, though.
 
 I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after 
 the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it.
 
 jfs's has been underperforming for me. Somehow the CPU usage is higher 
 than the other two. As for ext3fs and reiserfs, I can't detect any 
 significant difference. So if you're in a hurry, it'll be easier to 
 convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd 
 be nice if you could do your own testing, and post it to the list.

I would like to see some tests on how they behave on top of large fast 
RAID arrays, like a 10 disk RAID5 or something.  It's likely that on a 
single IDE drive the most limiting factor is the bandwidth of the drive, 
whereas on a large array, the limiting factor would likely be the file 
system code.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Vincent van Leeuwen
On 2003-07-17 10:41:35 -0500, Nick Fankhauser wrote:
 I'm using ext2. For now, I'll leave this and the OS version alone. If I
 

I'd upgrade to a journaling filesystem as soon as possible for reliability.
Testing in our own environment has shown that PostgreSQL performs best on ext3
(yes, better than XFS, JFS or ReiserFS) with a linux 2.4.21 kernel. Be sure to
mount noatime and to create the ext3 partition with the correct stripe size of
your RAID array using the '-R stride=foo' option (see man mke2fs).

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Oliver Scheit
 Be sure to mount noatime 

I did chattr -R +A /var/lib/pgsql/data
that should do the trick as well or am I wrong?

regards,
Oli

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Vincent van Leeuwen
On 2003-07-18 18:20:55 +0200, Oliver Scheit wrote:
  Be sure to mount noatime 
 
 I did chattr -R +A /var/lib/pgsql/data
 that should do the trick as well or am I wrong?
 

According to the man page it gives the same effect. There are a few things you
should consider though:
- new files won't be created with the same options (I think), so you'll have
to run this command as a daily cronjob or something to that effect
- chattr is probably more filesystem-specific than a noatime mount, although
this isn't a problem on ext[23] ofcourse

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Oliver Scheit
  Be sure to mount noatime 
 
 I did chattr -R +A /var/lib/pgsql/data
 that should do the trick as well or am I wrong?
 

 According to the man page it gives the same effect.
 There are a few things you should consider though:
 - new files won't be created with the same options (I think),
 so you'll have to run this command as a daily cronjob or
 something to that effect

This would be a really interesting point to know.
I will look into this.

I think the advantage of chattr is that the last access time
is still available for the rest of the filesystem.
(Of course you could have your own filesystem just for the
database stuff, in this case the advantage would be obsolete)

regards,
Oli

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Nick Fankhauser
I'm confused:

Ang Chin Han wrote:

 We've been using ext3fs for our production systems. (Red Hat Advanced
 Server 2.1)

Vincent van Leeuwen wrote:

 I'd upgrade to a journaling filesystem as soon as possible for
 reliability.

...About one year ago I considered moving to a journaling file system, but
opted not to because it seems like that's what WAL does for us already. How
does putting a journaling file system under it add more reliability?

I also guessed that a journaling file system would add overhead because now
a write to the WAL file could itself be deferred and logged elsewhere.

...So now I'm really puzzled because folks are weighing in with solid
anecdotal evidence saying that I'll get both better reliability and
performance. Can someone explain what I'm missing about the concept?

-A puzzled Nick


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Josh Berkus
Nick,

 ...About one year ago I considered moving to a journaling file system, but
 opted not to because it seems like that's what WAL does for us already. How
 does putting a journaling file system under it add more reliability?

It lets you restart your server quickly after an unexpected power-out.  Ext2 
is notoriously bad about this.

Also, WAL cannot necessarily recover properly if the underlying filesystem is 
corrupted.

 I also guessed that a journaling file system would add overhead because now
 a write to the WAL file could itself be deferred and logged elsewhere.

You are correct.


-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Sanity check requested

2003-07-18 Thread Rod Taylor
 ...About one year ago I considered moving to a journaling file system, but
 opted not to because it seems like that's what WAL does for us already. How
 does putting a journaling file system under it add more reliability?

WAL only works if the WAL files are actually written to disk and can be
read off it again.  Ext2 has a number of deficiencies which can cause
problems with this basic operation (inode corruptions, etc).  Journaling
does not directly help.


signature.asc
Description: This is a digitally signed message part


Re: File systems (RE: [PERFORM] Sanity check requested)

2003-07-18 Thread Tom Lane
Nick Fankhauser [EMAIL PROTECTED] writes:
 I'm departing in three ways from the simple IDE
 model that (I presume) the default random page cost of 4 is based on- The
 disks are SCSI  RAID and the FS would be different.

Actually, the default 4 is based on experiments I did quite awhile back
on HPUX (with a SCSI disk) and Linux (with an IDE disk, and a different
filesystem).  I didn't see too much difference between 'em.  RAID might
alter the equation, or not.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Sanity check requested

2003-07-17 Thread Nick Fankhauser
Shridhar-

I appreciate your thoughts- I'll be running some before  after tests on
this using one of our development/hot-swap boxes, so I'll report the results
back to the list.

A few more thoughts/questions:

 1. 30 users does not seem to be much of a oevrhead. If possible
 try doing away with connection pooling.

The application needs to scale up gracefully. We actually have about 200
users that could decide to log on at the same time- 30 is just a typical
load. We'd also prefer to have 20,000 subscribers so we can start making a
living with this business g.

 2. While increasing sort memory, try 4/8/16 in that order. That
 way you will get a better picture of load behaviour. Though whatever you
put appears
 reasonable, having more data always help.

I'll try that approach while testing. Is it the case that the sort memory is
allocated for each connection and becomes unavailable to other processes
while the connection exists? If so, since I'm using a connection pool, I
should be able to control total usage precisely. Without a connection pool,
I could start starving the rest of the system for resources if the number of
users spiked unexpectedly. Correct?



 3. I don't know how this affects on SCSI drives, but what file
 system you are using? Can you try diferent ones?

 4. OK, this is too much but linux kernel 2.6 is in test and has
 vastly improved IO...

I'm using ext2. For now, I'll leave this and the OS version alone. If I
change too many variables, I won't be able to discern which one is causing a
change. Although I understand that there's an element of art to tuning, I'm
enough of a neophyte that I don't have a feeling for the tuning parameters
yet and hence I have to take a scientific approach of just tweaking a few
variables in an otherwise controlled and unchanged environment. If I can't
reach my goals with the simple approach, I'll consider some of the more
radical ideas.

Again, thanks for the ideas- I'll feed the results back after I've done some
tests

-Nick






---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Sanity check requested

2003-07-17 Thread Josh Berkus
Nick,

 I'll try that approach while testing. Is it the case that the sort memory
 is allocated for each connection and becomes unavailable to other processes
 while the connection exists? If so, since I'm using a connection pool, I
 should be able to control total usage precisely. Without a connection pool,
 I could start starving the rest of the system for resources if the number
 of users spiked unexpectedly. Correct?

Wrong, actually.   Sort memory is allocated *per sort*, not per connnection or 
per query.   So a single complex query could easily use 4xsort_mem if it has 
several merge joins ... and a pooled connection could use many times sort_mem 
depending on activity.  Thus connection pooling does not help you with 
sort_mem usage at all, unless your pooling mechanism can control the rate at 
which queries are fed to the planner.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Sanity check requested

2003-07-17 Thread Nick Fankhauser

 Wrong, actually.   Sort memory is allocated *per sort*, not per
connnection or
 per query.   So a single complex query could easily use 4xsort_mem if it
has
 several merge joins ...

Thanks for the correction- it sounds like this is one where usage can't be
precisely controlled in a dynamic user environment  I just need to get a
feel for what works under a load that approximates my production system.

-Nick


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Sanity check requested

2003-07-17 Thread Joe Conway
Nick Fankhauser wrote:
Thanks for the correction- it sounds like this is one where usage
can't be precisely controlled in a dynamic user environment  I just
need to get a feel for what works under a load that approximates my
production system.
I think the most important point here is that if you set sort_mem too 
high, and you have a lot of simultaneous sorts, you can drive the server 
into swapping, which obviously is a very bad thing. You want it set as 
high as possible, but not so high given your usage patterns that you 
wind up swapping.

Joe



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html