Re: [PERFORM] File Systems Compared

2006-12-17 Thread Bruno Wolff III
On Fri, Dec 15, 2006 at 10:44:39 -0600,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 
 The other feature I would like is to be able to use write barriers with
 encrypted file systems. I haven't found anythign on whether or not there
 are near term plans by any one to support that.

I asked about this on the dm-crypt list and was told that write barriers
work pre 2.6.19. There was a change for 2.6.19 that might break things for
SMP systems. But that will probably get fixed eventually.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III
The reply wasn't (directly copied to the performance list, but I will
copy this one back.

On Thu, Dec 14, 2006 at 13:21:11 -0800,
  Ron Mayer [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Thu, Dec 14, 2006 at 01:39:00 -0500,
Jim Nasby [EMAIL PROTECTED] wrote:
  On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
  This appears to be changing under Linux. Recent kernels have write  
  barriers implemented using cache flush commands (which 
  some drives ignore,  so you need to be careful).
 
 Is it true that some drives ignore this; or is it mostly
 an urban legend that was started by testers that didn't
 have kernels with write barrier support.   I'd be especially
 interested in knowing if there are any currently available
 drives which ignore those commands.
 
  In very recent kernels, software raid using raid 1 will also
  handle write barriers. To get this feature, you are supposed to
  mount ext3 file systems with the barrier=1 option. For other file  
  systems, the parameter may need to be different.
 
 With XFS the default is apparently to enable write barrier
 support unless you explicitly disable it with the nobarrier mount option.
 It also will warn you in the system log if the underlying device
 doesn't have write barrier support.
 
 SGI recommends that you use the nobarrier mount option if you do
 have a persistent (battery backed) write cache on your raid device.
 
   http://oss.sgi.com/projects/xfs/faq.html#wcache
 
 
  But would that actually provide a meaningful benefit? When you  
  COMMIT, the WAL data must hit non-volatile storage of some kind,  
  which without a BBU or something similar, means hitting the platter.  
  So I don't see how enabling the disk cache will help, unless of  
  course it's ignoring fsync.
 
 With write barriers, fsync() waits for the physical disk; but I believe
 the background writes from write() done by pdflush don't have to; so
 it's kinda like only disabling the cache for WAL files and the filesystem's
 journal, but having it enabled for the rest of your write activity (the
 tables except at checkpoints?  the log file?).
 
  Note the use case for this is more for hobbiests or development boxes. You 
  can
  only use it on software raid (md) 1, which rules out most real systems.
  
 
 Ugh.  Looking for where that's documented; and hoping it is or will soon
 work on software 1+0 as well.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III
On Thu, Dec 14, 2006 at 13:21:11 -0800,
  Ron Mayer [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Thu, Dec 14, 2006 at 01:39:00 -0500,
Jim Nasby [EMAIL PROTECTED] wrote:
  On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
  This appears to be changing under Linux. Recent kernels have write  
  barriers implemented using cache flush commands (which 
  some drives ignore,  so you need to be careful).
 
 Is it true that some drives ignore this; or is it mostly
 an urban legend that was started by testers that didn't
 have kernels with write barrier support.   I'd be especially
 interested in knowing if there are any currently available
 drives which ignore those commands.

I saw posts claiming this, but no specific drives mentioned. I did see one
post that claimed that the cache flush command was mandated (not optional)
by the spec.

  In very recent kernels, software raid using raid 1 will also
  handle write barriers. To get this feature, you are supposed to
  mount ext3 file systems with the barrier=1 option. For other file  
  systems, the parameter may need to be different.
 
 With XFS the default is apparently to enable write barrier
 support unless you explicitly disable it with the nobarrier mount option.
 It also will warn you in the system log if the underlying device
 doesn't have write barrier support.

I think there might be a similar patch for ext3 going into 2.6.19. I haven't
checked a 2.6.19 kernel to make sure though.

 
 SGI recommends that you use the nobarrier mount option if you do
 have a persistent (battery backed) write cache on your raid device.
 
   http://oss.sgi.com/projects/xfs/faq.html#wcache
 
 
  But would that actually provide a meaningful benefit? When you  
  COMMIT, the WAL data must hit non-volatile storage of some kind,  
  which without a BBU or something similar, means hitting the platter.  
  So I don't see how enabling the disk cache will help, unless of  
  course it's ignoring fsync.
 
 With write barriers, fsync() waits for the physical disk; but I believe
 the background writes from write() done by pdflush don't have to; so
 it's kinda like only disabling the cache for WAL files and the filesystem's
 journal, but having it enabled for the rest of your write activity (the
 tables except at checkpoints?  the log file?).

Not exactly. Whenever you commit the file system log or fsync the wal file,
all previously written blocks will be flushed to the disk platter, before
any new write requests are honored. So journalling semantics will work
properly.

  Note the use case for this is more for hobbiests or development boxes. You 
  can
  only use it on software raid (md) 1, which rules out most real systems.
  
 
 Ugh.  Looking for where that's documented; and hoping it is or will soon
 work on software 1+0 as well.

I saw a comment somewhere that raid 0 provided some problems and the suggestion
was to handle the barrier at a different level (though I don't know how you
could). So I don't belive 1+0 or 5 are currently supported or will be in the
near term.

The other feature I would like is to be able to use write barriers with
encrypted file systems. I haven't found anythign on whether or not there
are near term plans by any one to support that.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III
On Fri, Dec 15, 2006 at 10:34:15 -0600,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 The reply wasn't (directly copied to the performance list, but I will
 copy this one back.

Sorry about this one, I meant to intersperse my replies and hit the 'y'
key at the wrong time. (And there ended up being a copy on performance
anyway from the news gateway.)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] File Systems Compared

2006-12-14 Thread Bruno Wolff III
On Thu, Dec 14, 2006 at 01:39:00 -0500,
  Jim Nasby [EMAIL PROTECTED] wrote:
 On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
 
 This appears to be changing under Linux. Recent kernels have write  
 barriers
 implemented using cache flush commands (which some drives ignore,  
 so you
 need to be careful). In very recent kernels, software raid using  
 raid 1
 will also handle write barriers. To get this feature, you are  
 supposed to
 mount ext3 file systems with the barrier=1 option. For other file  
 systems,
 the parameter may need to be different.
 
 But would that actually provide a meaningful benefit? When you  
 COMMIT, the WAL data must hit non-volatile storage of some kind,  
 which without a BBU or something similar, means hitting the platter.  
 So I don't see how enabling the disk cache will help, unless of  
 course it's ignoring fsync.

When you do an fsync, the OS sends a cache flush command to the drive,
which on most drives (but supposedly there are ones that ignore this
command) doesn't return until all of the cached pages have been written
to the platter, and doesn't return from the fsync until the flush is complete.
While this writes more sectors than you really need, it is safe. And it allows
for caching to speed up some things (though not as much as having queued
commands would).

I have done some tests on my systems and the speeds I am getting make it
clear that write barriers slow things down to about the same range as having
caches disabled. So I believe that it is likely working as advertised.

Note the use case for this is more for hobbiests or development boxes. You can
only use it on software raid (md) 1, which rules out most real systems.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] File Systems Compared

2006-12-14 Thread Ron Mayer
Bruno Wolff III wrote:
 On Thu, Dec 14, 2006 at 01:39:00 -0500,
   Jim Nasby [EMAIL PROTECTED] wrote:
 On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
 This appears to be changing under Linux. Recent kernels have write  
 barriers implemented using cache flush commands (which 
 some drives ignore,  so you need to be careful).

Is it true that some drives ignore this; or is it mostly
an urban legend that was started by testers that didn't
have kernels with write barrier support.   I'd be especially
interested in knowing if there are any currently available
drives which ignore those commands.

 In very recent kernels, software raid using raid 1 will also
 handle write barriers. To get this feature, you are supposed to
 mount ext3 file systems with the barrier=1 option. For other file  
 systems, the parameter may need to be different.

With XFS the default is apparently to enable write barrier
support unless you explicitly disable it with the nobarrier mount option.
It also will warn you in the system log if the underlying device
doesn't have write barrier support.

SGI recommends that you use the nobarrier mount option if you do
have a persistent (battery backed) write cache on your raid device.

  http://oss.sgi.com/projects/xfs/faq.html#wcache


 But would that actually provide a meaningful benefit? When you  
 COMMIT, the WAL data must hit non-volatile storage of some kind,  
 which without a BBU or something similar, means hitting the platter.  
 So I don't see how enabling the disk cache will help, unless of  
 course it's ignoring fsync.

With write barriers, fsync() waits for the physical disk; but I believe
the background writes from write() done by pdflush don't have to; so
it's kinda like only disabling the cache for WAL files and the filesystem's
journal, but having it enabled for the rest of your write activity (the
tables except at checkpoints?  the log file?).

 Note the use case for this is more for hobbiests or development boxes. You can
 only use it on software raid (md) 1, which rules out most real systems.
 

Ugh.  Looking for where that's documented; and hoping it is or will soon
work on software 1+0 as well.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] File Systems Compared

2006-12-13 Thread Jim Nasby

On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:

On Wed, Dec 06, 2006 at 08:55:14 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:

Anyone run their RAIDs with disk caches enabled, or is this akin to
having fsync off?


Disk write caches are basically always akin to having fsync off.  The
only time a write-cache is (more or less) safe to enable is when  
it is

backed by a battery or in some other way made non-volatile.

So a RAID controller with a battery-backed write cache can enable its
own write cache, but can't safely enable the write-caches on the disk
drives it manages.


This appears to be changing under Linux. Recent kernels have write  
barriers
implemented using cache flush commands (which some drives ignore,  
so you
need to be careful). In very recent kernels, software raid using  
raid 1
will also handle write barriers. To get this feature, you are  
supposed to
mount ext3 file systems with the barrier=1 option. For other file  
systems,

the parameter may need to be different.


But would that actually provide a meaningful benefit? When you  
COMMIT, the WAL data must hit non-volatile storage of some kind,  
which without a BBU or something similar, means hitting the platter.  
So I don't see how enabling the disk cache will help, unless of  
course it's ignoring fsync.


Now, I have heard something about drives using their stored  
rotational energy to flush out the cache... but I tend to suspect  
urban legend there...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] File Systems Compared

2006-12-11 Thread Bruno Wolff III
On Wed, Dec 06, 2006 at 08:55:14 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:
  Anyone run their RAIDs with disk caches enabled, or is this akin to
  having fsync off?
 
 Disk write caches are basically always akin to having fsync off.  The
 only time a write-cache is (more or less) safe to enable is when it is
 backed by a battery or in some other way made non-volatile.
 
 So a RAID controller with a battery-backed write cache can enable its
 own write cache, but can't safely enable the write-caches on the disk
 drives it manages.

This appears to be changing under Linux. Recent kernels have write barriers
implemented using cache flush commands (which some drives ignore, so you
need to be careful). In very recent kernels, software raid using raid 1
will also handle write barriers. To get this feature, you are supposed to
mount ext3 file systems with the barrier=1 option. For other file systems,
the parameter may need to be different.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] File Systems Compared

2006-12-07 Thread Merlin Moncure

On 12/6/06, Brian Wipf [EMAIL PROTECTED] wrote:

 Hmmm.   Something is not right.  With a 16 HD RAID 10 based on 10K
 rpm HDs, you should be seeing higher absolute performance numbers.

 Find out what HW the Areca guys and Tweakers guys used to test the
 1280s.
 At LW2006, Areca was demonstrating all-in-cache reads and writes of
 ~1600MBps and ~1300MBps respectively along with RAID 0 Sustained
 Rates of ~900MBps read, and ~850MBps write.

 Luke, I know you've managed to get higher IO rates than this with
 this class of HW.  Is there a OS or SW config issue Brian should
 closely investigate?

I wrote 1280 by a mistake. It's actually a 1260. Sorry about that.
The IOP341 class of cards weren't available when we ordered the parts
for the box, so we had to go with the 1260. The box(es) we build next
month will either have the 1261ML or 1280 depending on whether we go
16 or 24 disk.

I noticed Bucky got almost 800 random seeks per second on her 6 disk
1 RPM SAS drive Dell PowerEdge 2950. The random seek performance
of this box disappointed me the most. Even running 2 concurrent
bonnies, the random seek performance only increased from 644 seeks/
sec to 813 seeks/sec. Maybe there is some setting I'm missing? This
card looked pretty impressive on tweakers.net.


I've been looking a lot at the SAS enclosures lately and am starting
to feel like that's the way to go.  Performance is amazing and the
flexibility of choosing low cost SATA or high speed SAS drives is
great.  not only that, but more and more SAS is coming out in 2.5
drives which seems to be a better fit for databases...more spindles.
with a 2.5 drive enclosure they can stuff 10 hot swap drives into a
1u enclosure...that's pretty amazing.

one downside of SAS is most of the HBAs are pci-express only, that can
limit your options unless your server is very new.  also you don't
want to skimp on the hba, get the best available, which looks to be
lsi logic at the moment (dell perc5/e is lsi logic controller as is
the intel sas hba)...others?

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] File Systems Compared

2006-12-06 Thread Brian Wipf

All tests are with bonnie++ 1.03a

Main components of system:
16 WD Raptor 150GB 1 RPM drives all in a RAID 10
ARECA 1280 PCI-Express RAID adapter with 1GB BB Cache (Thanks for the  
recommendation, Ron!)

32 GB RAM
Dual Intel 5160 Xeon Woodcrest 3.0 GHz processors
OS: SUSE Linux 10.1

All runs are with the write cache disabled on the hard disks, except  
for one additional test for xfs where it was enabled. I tested with  
ordered and writeback journaling modes for ext3 to see if writeback  
journaling would help over the default of ordered. The 1GB of battery  
backed cache on the RAID card was enabled for all tests as well.  
Tests are in order of increasing random seek performance. In my tests  
on this hardware, xfs is the decisive winner, beating all of the  
other file systems in performance on every single metric. 658 random  
seeks per second, 433 MB/sec sequential read, and 350 MB/sec  
sequential write seems decent enough, but not as high as numbers  
other people have suggested are attainable with a 16 disk RAID 10.  
350 MB/sec sequential write with disk caches enabled versus 280 MB/ 
sec sequential write with disk caches disabled sure makes enabling  
the disk write cache tempting. Anyone run their RAIDs with disk  
caches enabled, or is this akin to having fsync off?


ext3 (writeback data journaling mode):
/usr/local/sbin/bonnie++ -d bonnie -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 78625  91 279921  51 112346  13 89463  96 417695   
22 545.7   0
--Sequential Create-- Random  
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16  5903  99 + +++ + +++  6112  99 + ++ 
+ 18620 100
hulk4,64368M, 
78625,91,279921,51,112346,13,89463,96,417695,22,545.7,0,16,5903,99,+++ 
++,+++,+,+++,6112,99,+,+++,18620,100


ext3 (ordered data journaling mode):
/usr/local/sbin/bonnie++ -d bonnie -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 74902  89 250274  52 123637  16 88992  96 417222   
23 548.3   0
--Sequential Create-- Random  
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16  5941  97 + +++ + +++  6270  99 + ++ 
+ 18670  99
hulk4,64368M, 
74902,89,250274,52,123637,16,88992,96,417222,23,548.3,0,16,5941,97,+++ 
++,+++,+,+++,6270,99,+,+++,18670,99



reiserfs:
/usr/local/sbin/bonnie++ -d bonnie -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 81004  99 269191  50 128322  16 87865  96 407035   
28 550.3   0
--Sequential Create-- Random  
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + ++ 
+ + +++
hulk4,64368M, 
81004,99,269191,50,128322,16,87865,96,407035,28,550.3,0,16,+,+++,+ 
,+++,+,+++,+,+++,+,+++,+,+++


jfs:
/usr/local/sbin/bonnie++ -d bonnie/ -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 73246  80 268886  28 110465   9 89516  96 413897   
21 639.5   0
--Sequential Create-- Random  
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16  3756   5 + +++ + +++ 23763  90 + ++ 
+ 22371  70
hulk4,64368M, 
73246,80,268886,28,110465,9,89516,96,413897,21,639.5,0,16,3756,5, 
+,+++,+,+++,23763,90,+,+++,22371,70


xfs (with write cache disabled on disks):
/usr/local/sbin/bonnie++ -d bonnie/ -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 

Re: [PERFORM] File Systems Compared

2006-12-06 Thread Brian Hurt

Brian Wipf wrote:


All tests are with bonnie++ 1.03a


Thanks for posting these tests.  Now I have actual numbers to beat our 
storage server provider about the head and shoulders with.  Also, I 
found them interesting in and of themselves.


These numbers are close enough to bus-saturation rates that I'd strongly 
advise new people setting up systems to go this route over spending 
money on some fancy storage area network solution- unless you need more 
HD space than fits nicely in one of these raids.  If reliability is a 
concern, buy 2 servers and implement Sloni for failover. 


Brian


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Alexander Staubo

On Dec 6, 2006, at 16:40 , Brian Wipf wrote:


All tests are with bonnie++ 1.03a

[snip]

Care to post these numbers *without* word wrapping? Thanks.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Luke Lonergan
Brian,

On 12/6/06 8:02 AM, Brian Hurt [EMAIL PROTECTED] wrote:

 These numbers are close enough to bus-saturation rates

PCIX is 1GB/s + and the memory architecture is 20GB/s+, though each CPU is
likely to obtain only 2-3GB/s.

We routinely achieve 1GB/s I/O rate on two 3Ware adapters and 2GB/s on the
Sun X4500 with ZFS.

 advise new people setting up systems to go this route over spending
 money on some fancy storage area network solution

People buy SANs for interesting reasons, some of them having to do with the
manageability features of high end SANs.  I've heard it said in those cases
that performance doesn't matter much.

As you suggest, database replication provides one of those features, and
Solaris ZFS has many of the data management features found in high end SANs.
Perhaps we can get the best of both?

In the end, I think SAN vs. server storage is a religious battle.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Markus Schiltknecht

Hi,

Alexander Staubo wrote:

Care to post these numbers *without* word wrapping? Thanks.


How is one supposed to do that? Care giving an example?

Markus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Joshua D. Drake

 As you suggest, database replication provides one of those features, and
 Solaris ZFS has many of the data management features found in high end SANs.
 Perhaps we can get the best of both?
 
 In the end, I think SAN vs. server storage is a religious battle.

I agree. I have many people that want to purchase a SAN because someone
told them that is what they need... Yet they can spend 20% of the cost
on two external arrays and get incredible performance...

We are seeing great numbers from the following config:

(2) HP MS 30s (loaded) dual bus
(2) HP 6402, one connected to each MSA.

The performance for the money is incredible.

Sincerely,

Joshua D. Drake



 
 - Luke
 
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 
-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Brian Hurt

Luke Lonergan wrote:


Brian,

On 12/6/06 8:02 AM, Brian Hurt [EMAIL PROTECTED] wrote:

 


These numbers are close enough to bus-saturation rates
   



PCIX is 1GB/s + and the memory architecture is 20GB/s+, though each CPU is
likely to obtain only 2-3GB/s.

We routinely achieve 1GB/s I/O rate on two 3Ware adapters and 2GB/s on the
Sun X4500 with ZFS.

 

For some reason I'd got it stuck in my head that PCI-Express maxed out 
at a theoretical 533 MByte/sec- at which point, getting 480 MByte/sec 
across it is pretty dang good.  But actually looking things up, I see 
that PCI-Express has a theoretical 8 Gbit/sec, or about 800Mbyte/sec.  
It's PCI-X that's 533 MByte/sec.  So there's still some headroom 
available there.


Brian



Re: [PERFORM] File Systems Compared

2006-12-06 Thread Steinar H. Gunderson
On Wed, Dec 06, 2006 at 05:31:01PM +0100, Markus Schiltknecht wrote:
 Care to post these numbers *without* word wrapping? Thanks.
 How is one supposed to do that? Care giving an example?

This is a rather long sentence without any kind of word wrapping except what 
would be imposed on your own side -- how to set that up properly depends on the 
sending e-mail client, but in mine it's just a matter of turning off the word 
wrapping in your editor :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Florian Weimer
* Brian Wipf:

 Anyone run their RAIDs with disk caches enabled, or is this akin to
 having fsync off?

If your cache is backed by a battery, enabling write cache shouldn't
be a problem.  You can check if the whole thing is working well by
running this test script: http://brad.livejournal.com/2116715.html

Enabling write cache leads to various degrees of data corruption in
case of a power outage (possibly including file system corruption
requiring manual recover).

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Mark Lewis
 Anyone run their RAIDs with disk caches enabled, or is this akin to
 having fsync off?

Disk write caches are basically always akin to having fsync off.  The
only time a write-cache is (more or less) safe to enable is when it is
backed by a battery or in some other way made non-volatile.

So a RAID controller with a battery-backed write cache can enable its
own write cache, but can't safely enable the write-caches on the disk
drives it manages.

-- Mark Lewis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Luke Lonergan
Brian,

On 12/6/06 8:40 AM, Brian Hurt [EMAIL PROTECTED] wrote:

 But actually looking things up, I see that PCI-Express has a theoretical 8
 Gbit/sec, or about 800Mbyte/sec. It's PCI-X that's 533 MByte/sec.  So there's
 still some headroom available there.

See here for the official specifications of both:
  http://www.pcisig.com/specifications/pcix_20/

Note that PCI-X version 1.0 at 133MHz runs at 1GB/s.  It's a parallel bus,
64 bits wide (8 bytes) and runs at 133MHz, so 8 x 133 ~= 1 gigabyte/second.

PCI Express with 16 lanes (PCIe x16) can transfer data at 4GB/s.  The Arecas
use (PCIe x8, see here:
http://www.areca.com.tw/products/html/pcie-sata.htm), so they can do 2GB/s.

- Luke 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Markus Schiltknecht

Hi,

Steinar H. Gunderson wrote:

This is a rather long sentence without any kind of word wrapping except what 
would be imposed on your own side -- how to set that up properly depends on the 
sending e-mail client, but in mine it's just a matter of turning off the word 
wrapping in your editor :-)


Duh!

Cool, thank you for the example :-)  I thought the MTA or at least the the 
mailing list would wrap mails at some limit. I've now set word-wrap to  
characters (it seems not possible to turn it off completely in thunderbird). 
But when writing, I'm now getting one long line.

What's common practice? What's it on the pgsql mailing lists?

Regards

Markus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Arnaud Lesauvage

Markus Schiltknecht a écrit :

What's common practice? What's it on the pgsql mailing lists?


The netiquette usually advise mailers to wrap after 72 characters 
on mailing lists.
This does not apply for format=flowed I guess (that's the format 
used in Steinar's message).


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Michael Stone

On Wed, Dec 06, 2006 at 06:59:12PM +0100, Arnaud Lesauvage wrote:

Markus Schiltknecht a écrit :

What's common practice? What's it on the pgsql mailing lists?


The netiquette usually advise mailers to wrap after 72 characters 
on mailing lists.
This does not apply for format=flowed I guess (that's the format 
used in Steinar's message).


It would apply to either; format=flowed can be wrapped at the receiver's 
end, but still be formatted to a particular column for readers that 
don't understand format=flowed. (Which is likely to be many, since 
that's a standard that never really took off.) No wrap netiquette 
applies to formatted text blocks which are unreadable if wrapped (such 
as bonnie or EXPLAIN output).


Mike Stone

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Merlin Moncure

On 12/6/06, Luke Lonergan [EMAIL PROTECTED] wrote:

People buy SANs for interesting reasons, some of them having to do with the
manageability features of high end SANs.  I've heard it said in those cases
that performance doesn't matter much.


There is movement in the industry right now away form tape systems to
managed disk storage for backups and data retention.  In these cases
performance requirements are not very high -- and a single server can
manage a huge amount of storage.  In theory, you can do the same thing
attached via sas expanders but fc networking is imo more flexible and
scalable.

The manageability features of SANs are a mixed bag and decidedly
overrated but they have a their place, imo.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Brian Hurt

Luke Lonergan wrote:


Brian,

On 12/6/06 8:40 AM, Brian Hurt [EMAIL PROTECTED] wrote:

 


But actually looking things up, I see that PCI-Express has a theoretical 8
Gbit/sec, or about 800Mbyte/sec. It's PCI-X that's 533 MByte/sec.  So there's
still some headroom available there.
   



See here for the official specifications of both:
 http://www.pcisig.com/specifications/pcix_20/

Note that PCI-X version 1.0 at 133MHz runs at 1GB/s.  It's a parallel bus,
64 bits wide (8 bytes) and runs at 133MHz, so 8 x 133 ~= 1 gigabyte/second.

PCI Express with 16 lanes (PCIe x16) can transfer data at 4GB/s.  The Arecas
use (PCIe x8, see here:
http://www.areca.com.tw/products/html/pcie-sata.htm), so they can do 2GB/s.

- Luke 





 


Thanks.  I stand corrected (again).

Brian



Re: [PERFORM] File Systems Compared

2006-12-06 Thread Bruno Wolff III
On Wed, Dec 06, 2006 at 18:45:56 +0100,
  Markus Schiltknecht [EMAIL PROTECTED] wrote:
 
 Cool, thank you for the example :-)  I thought the MTA or at least the the 
 mailing list would wrap mails at some limit. I've now set word-wrap to  
 characters (it seems not possible to turn it off completely in 
 thunderbird). But when writing, I'm now getting one long line.
 
 What's common practice? What's it on the pgsql mailing lists?

If you do this you should set format=flowed (see rfc 2646). If you do that,
then clients can break the lines in an appropiate way. This is actually
better than fixing the line width in the original message, since the
recipient may not have the same number of characters (or pixels) of display
as the sender.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Ron

At 10:40 AM 12/6/2006, Brian Wipf wrote:

All tests are with bonnie++ 1.03a

Main components of system:
16 WD Raptor 150GB 1 RPM drives all in a RAID 10
ARECA 1280 PCI-Express RAID adapter with 1GB BB Cache (Thanks for the 
recommendation, Ron!)

32 GB RAM
Dual Intel 5160 Xeon Woodcrest 3.0 GHz processors
OS: SUSE Linux 10.1


xfs (with write cache disabled on disks):
/usr/local/sbin/bonnie++ -d bonnie/ -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 90621  99 283916  35 105871  11 88569  97 
433890  23 644.5   0

--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16 28435  95 + +++ 28895  82 28523  91 + 
++ + 24369  86
hulk4,64368M, 
90621,99,283916,35,105871,11,88569,97,433890,23,644.5,0,16,28435,95,++ 
+++,+++,28895,82,28523,91,+,+++,24369,86


xfs (with write cache enabled on disks):
/usr/local/sbin/bonnie++ -d bonnie -s 64368:8k
Version  1.03   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
hulk464368M 90861  99 348401  43 131887  14 89412  97 
432964  23 658.7   0

--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
 16 28871  90 + +++ 28923  91 30879  93 + 
++ + 28012  94
hulk4,64368M, 
90861,99,348401,43,131887,14,89412,97,432964,23,658.7,0,16,28871,90,++ 
+++,+++,28923,91,30879,93,+,+++,28012,94
Hmmm.   Something is not right.  With a 16 HD RAID 10 based on 10K 
rpm HDs, you should be seeing higher absolute performance numbers.


Find out what HW the Areca guys and Tweakers guys used to test the 1280s.
At LW2006, Areca was demonstrating all-in-cache reads and writes of 
~1600MBps and ~1300MBps respectively along with RAID 0 Sustained 
Rates of ~900MBps read, and ~850MBps write.


Luke, I know you've managed to get higher IO rates than this with 
this class of HW.  Is there a OS or SW config issue Brian should 
closely investigate?


Ron Peacetree


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Brian Wipf
Hmmm.   Something is not right.  With a 16 HD RAID 10 based on 10K  
rpm HDs, you should be seeing higher absolute performance numbers.


Find out what HW the Areca guys and Tweakers guys used to test the  
1280s.
At LW2006, Areca was demonstrating all-in-cache reads and writes of  
~1600MBps and ~1300MBps respectively along with RAID 0 Sustained  
Rates of ~900MBps read, and ~850MBps write.


Luke, I know you've managed to get higher IO rates than this with  
this class of HW.  Is there a OS or SW config issue Brian should  
closely investigate?


I wrote 1280 by a mistake. It's actually a 1260. Sorry about that.  
The IOP341 class of cards weren't available when we ordered the parts  
for the box, so we had to go with the 1260. The box(es) we build next  
month will either have the 1261ML or 1280 depending on whether we go  
16 or 24 disk.


I noticed Bucky got almost 800 random seeks per second on her 6 disk  
1 RPM SAS drive Dell PowerEdge 2950. The random seek performance  
of this box disappointed me the most. Even running 2 concurrent  
bonnies, the random seek performance only increased from 644 seeks/ 
sec to 813 seeks/sec. Maybe there is some setting I'm missing? This  
card looked pretty impressive on tweakers.net.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] File Systems Compared

2006-12-06 Thread Greg Smith

On Wed, 6 Dec 2006, Alexander Staubo wrote:


Care to post these numbers *without* word wrapping?


Brian's message was sent with format=flowed and therefore it's easy to 
re-assemble into original form if your software understands that.  I just 
checked with two e-mail clients (Thunderbird and Pine) and all his 
bonnie++ results were perfectly readable on both as soon as I made the 
display wide enough.  If you had trouble reading it, you might consider 
upgrading your mail client to one that understands that standard. 
Statistically, though, if you have this problem you're probably using 
Outlook and there may not be a useful upgrade path for you.  I know it's 
been added to the latest Express version (which even defaults to sending 
messages flowed, driving many people crazy), but am not sure if any of the 
Office Outlooks know what to do with flowed messages yet.


And those of you pointing people at the RFC's, that's a bit hardcore--the 
RFC documents themselves could sure use some better formatting. 
https://bugzilla.mozilla.org/attachment.cgi?id=134270action=view has a 
readable introduction to the encoding of flowed messages, 
http://mailformat.dan.info/body/linelength.html gives some history to how 
we all got into this mess in the first place, and 
http://joeclark.org/ffaq.html also has some helpful (albeit out of date in 
spots) comments on this subject.


Even if it is correct netiquette to disable word-wrapping for long lines 
like bonnie output (there are certainly two sides with valid points in 
that debate), to make them more compatible with flow-impaired clients, you 
can't expect that mail composition software is sophisticated enough to 
allow doing that for one section while still wrapping the rest of the text 
correctly.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate