Re: [PERFORM] SSD + RAID

2010-03-03 Thread Ron Mayer
Greg Smith wrote: Bruce Momjian wrote: I always assumed SCSI disks had a write-through cache and therefore didn't need a drive cache flush comment. Some do. SCSI disks have write-back caches. Some have both(!) - a write-back cache but the user can explicitly send write-through requests.

Re: [PERFORM] SSD + RAID

2010-03-02 Thread Pierre C
I always assumed SCSI disks had a write-through cache and therefore didn't need a drive cache flush comment. Maximum performance can only be reached with a writeback cache so the drive can reorder and cluster writes, according to the realtime position of the heads and platter rotation.

Re: [PERFORM] SSD + RAID

2010-03-01 Thread Bruce Momjian
Ron Mayer wrote: Bruce Momjian wrote: Greg Smith wrote: Bruce Momjian wrote: I have added documentation about the ATAPI drive flush command, and the If one of us goes back into that section one day to edit again it might be worth mentioning that FLUSH CACHE EXT is the actual

Re: [PERFORM] SSD + RAID

2010-03-01 Thread Bruce Momjian
Greg Smith wrote: Ron Mayer wrote: Linux apparently sends FLUSH_CACHE commands to IDE drives in the exact sample places it sends SYNCHRONIZE CACHE commands to SCSI drives[2]. [2] http://hardware.slashdot.org/comments.pl?sid=149349cid=12519114 Well, that's old enough to not even

Re: [PERFORM] SSD + RAID

2010-03-01 Thread Greg Smith
Bruce Momjian wrote: I always assumed SCSI disks had a write-through cache and therefore didn't need a drive cache flush comment. There's more detail on all this mess at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks and it includes this perception, which I've recently come to

Re: [PERFORM] SSD + RAID

2010-02-27 Thread Greg Smith
Bruce Momjian wrote: I have added documentation about the ATAPI drive flush command, and the typical SSD behavior. If one of us goes back into that section one day to edit again it might be worth mentioning that FLUSH CACHE EXT is the actual ATAPI-6 command that a drive needs to support

Re: [PERFORM] SSD + RAID

2010-02-27 Thread Ron Mayer
Bruce Momjian wrote: Greg Smith wrote: Bruce Momjian wrote: I have added documentation about the ATAPI drive flush command, and the If one of us goes back into that section one day to edit again it might be worth mentioning that FLUSH CACHE EXT is the actual ATAPI-6 command that a

Re: [PERFORM] SSD + RAID

2010-02-27 Thread Greg Smith
Ron Mayer wrote: Linux apparently sends FLUSH_CACHE commands to IDE drives in the exact sample places it sends SYNCHRONIZE CACHE commands to SCSI drives[2]. [2] http://hardware.slashdot.org/comments.pl?sid=149349cid=12519114 Well, that's old enough to not even be completely right anymore

Re: [PERFORM] SSD + RAID

2010-02-26 Thread Bruce Momjian
I have added documentation about the ATAPI drive flush command, and the typical SSD behavior. --- Greg Smith wrote: Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie

Re: [PERFORM] SSD + RAID

2010-02-24 Thread Dave Crooke
It's always possible to rebuild into a consistent configuration by assigning a precedence order; for parity RAID, the data drives take precedence over parity drives, and for RAID-1 sets it assigns an arbitrary master. You *should* never lose a whole stripe ... for example, RAID-5 updates do read

Re: [PERFORM] SSD + RAID

2010-02-23 Thread david
On Mon, 22 Feb 2010, Ron Mayer wrote: Also worth noting - Linux's software raid stuff (MD and LVM) need to handle this right as well - and last I checked (sometime last year) the default setups didn't. I think I saw some stuff in the last few months on this issue on the kernel mailing

Re: [PERFORM] SSD + RAID

2010-02-23 Thread Pierre C
Note that's power draw per bit. dram is usually much more densely packed (it can be with fewer transistors per cell) so the individual chips for each may have similar power draws while the dram will be 10 times as densely packed as the sram. Differences between SRAM and DRAM : - price per

Re: [PERFORM] SSD + RAID

2010-02-23 Thread Nikolas Everett
On Tue, Feb 23, 2010 at 6:49 AM, Pierre C li...@peufeu.com wrote: Note that's power draw per bit. dram is usually much more densely packed (it can be with fewer transistors per cell) so the individual chips for each may have similar power draws while the dram will be 10 times as densely

Re: [PERFORM] SSD + RAID

2010-02-23 Thread Scott Carey
On Feb 23, 2010, at 3:49 AM, Pierre C wrote: Now I wonder about something. SSDs use wear-leveling which means the information about which block was written where must be kept somewhere. Which means this information must be updated. I wonder how crash-safe and how atomic these updates

Re: [PERFORM] SSD + RAID

2010-02-23 Thread david
On Tue, 23 Feb 2010, da...@lang.hm wrote: On Mon, 22 Feb 2010, Ron Mayer wrote: Also worth noting - Linux's software raid stuff (MD and LVM) need to handle this right as well - and last I checked (sometime last year) the default setups didn't. I think I saw some stuff in the last few

Re: [PERFORM] SSD + RAID

2010-02-23 Thread Aidan Van Dyk
* da...@lang.hm da...@lang.hm [100223 15:05]: However, one thing that you do not get protection against with software raid is the potential for the writes to hit some drives but not others. If this happens the software raid cannot know what the correct contents of the raid stripe are, and

Re: [PERFORM] SSD + RAID

2010-02-23 Thread david
On Tue, 23 Feb 2010, Aidan Van Dyk wrote: * da...@lang.hm da...@lang.hm [100223 15:05]: However, one thing that you do not get protection against with software raid is the potential for the writes to hit some drives but not others. If this happens the software raid cannot know what the

Re: [PERFORM] SSD + RAID

2010-02-23 Thread Mark Mielke
On 02/23/2010 04:22 PM, da...@lang.hm wrote: On Tue, 23 Feb 2010, Aidan Van Dyk wrote: * da...@lang.hm da...@lang.hm [100223 15:05]: However, one thing that you do not get protection against with software raid is the potential for the writes to hit some drives but not others. If this happens

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Bruce Momjian
Greg Smith wrote: Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Bruce Momjian
Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Ron Mayer
Bruce Momjian wrote: Greg Smith wrote: If you have a regular SATA drive, it almost certainly supports proper cache flushing OK, but I have a few questions. Is a write to the drive and a cache flush command the same? I believe they're different as of ATAPI-6 from 2001. Which

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Mark Mielke
On 02/22/2010 08:04 PM, Greg Smith wrote: Arjen van der Meijden wrote: That's weird. Intel's SSD's didn't have a write cache afaik: I asked Intel about this and it turns out that the DRAM on the Intel drive isn't used for user data because of the risk of data loss, instead it is used as

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Greg Smith
Ron Mayer wrote: I know less about other file systems. Apparently the NTFS guys are aware of such stuff - but don't know what kinds of fsync equivalent you'd need to make it happen. It's actually pretty straightforward--better than ext3. Windows with NTFS has been perfectly aware how to

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote: Mark Mielke wrote: I had read the above when posted, and then looked up SRAM. SRAM seems to suggest it will hold the data even after power loss, but only for a period of time. As long as power can restore within a few

Re: [PERFORM] SSD + RAID

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 7:21 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote: Mark Mielke wrote: I had read the above when posted, and then looked up SRAM. SRAM seems to suggest it will hold the data even after power

Re: [PERFORM] SSD + RAID

2010-02-21 Thread Bruce Momjian
Scott Carey wrote: On Feb 20, 2010, at 3:19 PM, Bruce Momjian wrote: Dan Langille wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Matthew Wakeling wrote: On Fri, 13 Nov 2009, Greg Smith wrote: In order for a drive to work reliably for database use such

Re: [PERFORM] SSD + RAID

2010-02-21 Thread Ron Mayer
Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a cache: Under 120GB Maxtor

Re: [PERFORM] SSD + RAID

2010-02-21 Thread Greg Smith
Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a

Re: [PERFORM] SSD + RAID

2010-02-21 Thread Arjen van der Meijden
On 22-2-2010 6:39 Greg Smith wrote: But the point of this whole testing exercise coming back into vogue again is that SSDs have returned this negligent behavior to the mainstream again. See http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion of this in a ZFS context just

Re: [PERFORM] SSD + RAID

2010-02-20 Thread Bruce Momjian
Matthew Wakeling wrote: On Fri, 13 Nov 2009, Greg Smith wrote: In order for a drive to work reliably for database use such as for PostgreSQL, it cannot have a volatile write cache. You either need a write cache with a battery backup (and a UPS doesn't count), or to turn the cache off.

Re: [PERFORM] SSD + RAID

2010-02-20 Thread Dan Langille
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Matthew Wakeling wrote: On Fri, 13 Nov 2009, Greg Smith wrote: In order for a drive to work reliably for database use such as for PostgreSQL, it cannot have a volatile write cache. You either need a write cache with a

Re: [PERFORM] SSD + RAID

2010-02-20 Thread Bruce Momjian
Dan Langille wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Matthew Wakeling wrote: On Fri, 13 Nov 2009, Greg Smith wrote: In order for a drive to work reliably for database use such as for PostgreSQL, it cannot have a volatile write cache. You either need

Re: [PERFORM] SSD + RAID

2009-12-03 Thread Scott Carey
On 11/19/09 1:04 PM, Greg Smith g...@2ndquadrant.com wrote: That won't help. Once the checkpoint is done, the problem isn't just that the WAL segments are recycled. The server isn't going to use them even if they were there. The reason why you can erase/recycle them is that you're doing

Re: [PERFORM] SSD + RAID

2009-11-30 Thread Bruce Momjian
Greg Smith wrote: Bruce Momjian wrote: I thought our only problem was testing the I/O subsystem --- I never suspected the file system might lie too. That email indicates that a large percentage of our install base is running on unreliable file systems --- why have I not heard about this

Re: [PERFORM] SSD + RAID

2009-11-30 Thread Ron Mayer
Bruce Momjian wrote: For example, ext3 fsync() will issue write barrier commands if the inode was modified; but not if the inode wasn't. See test program here: http://www.mail-archive.com/linux-ker...@vger.kernel.org/msg272253.html and read two paragraphs further to see how touching the

Re: [PERFORM] SSD + RAID

2009-11-30 Thread Ron Mayer
Bruce Momjian wrote: Greg Smith wrote: Bruce Momjian wrote: I thought our only problem was testing the I/O subsystem --- I never suspected the file system might lie too. That email indicates that a large percentage of our install base is running on unreliable file systems --- why have I not

Re: [PERFORM] SSD + RAID

2009-11-30 Thread Bruce Momjian
Ron Mayer wrote: Bruce Momjian wrote: Greg Smith wrote: Bruce Momjian wrote: I thought our only problem was testing the I/O subsystem --- I never suspected the file system might lie too. That email indicates that a large percentage of our install base is running on unreliable file

Re: [PERFORM] SSD + RAID

2009-11-29 Thread Ron Mayer
Bruce Momjian wrote: Greg Smith wrote: A good test program that is a bit better at introducing and detecting the write cache issue is described at http://brad.livejournal.com/2116715.html Wow, I had not seen that tool before. I have added a link to it from our documentation, and also

Re: [PERFORM] SSD + RAID

2009-11-29 Thread Bruce Momjian
Ron Mayer wrote: Bruce Momjian wrote: Greg Smith wrote: A good test program that is a bit better at introducing and detecting the write cache issue is described at http://brad.livejournal.com/2116715.html Wow, I had not seen that tool before. I have added a link to it from our

Re: [PERFORM] SSD + RAID

2009-11-29 Thread Greg Smith
Bruce Momjian wrote: I thought our only problem was testing the I/O subsystem --- I never suspected the file system might lie too. That email indicates that a large percentage of our install base is running on unreliable file systems --- why have I not heard about this before? Do the write

Re: [PERFORM] SSD + RAID

2009-11-28 Thread Bruce Momjian
Greg Smith wrote: Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. The funny thing about Murphy is that he doesn't visit when things are quiet. It's quite possible the window for data

Re: [PERFORM] SSD + RAID

2009-11-21 Thread Merlin Moncure
On Fri, Nov 20, 2009 at 7:27 PM, Greg Smith g...@2ndquadrant.com wrote: Richard Neill wrote: The key issue for short,fast transactions seems to be how fast an fdatasync() call can run, forcing the commit to disk, and allowing the transaction to return to userspace. Attached is a short C

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Axel Rau
Am 13.11.2009 um 14:57 schrieb Laszlo Nagy: I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, I cannot find information about using ARECA cards with SSD drives. They told me: currently not supported, but they have positive customer reports. No date yet for

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Matthew Wakeling
On Thu, 19 Nov 2009, Greg Smith wrote: This is why turning the cache off can tank performance so badly--you're going to be writing a whole 128K block no matter what if it's force to disk without caching, even if it's just to write a 8K page to it. Theoretically, this does not need to be the

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Jeff Janes
On Wed, Nov 18, 2009 at 8:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Carey sc...@richrelevance.com writes: For your database DATA disks, leaving the write cache on is 100% acceptable, even with power loss, and without a RAID controller. And even in high write environments. Really? How

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Richard Neill
Axel Rau wrote: Am 13.11.2009 um 14:57 schrieb Laszlo Nagy: I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, I cannot find information about using ARECA cards with SSD drives. They told me: currently not supported, but they have positive customer reports. No date yet for

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Greg Smith
Richard Neill wrote: The key issue for short,fast transactions seems to be how fast an fdatasync() call can run, forcing the commit to disk, and allowing the transaction to return to userspace. Attached is a short C program which may be of use. Right. I call this the commit rate of the storage,

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Craig Ringer
On 19/11/2009 12:22 PM, Scott Carey wrote: 3: Have PG wait a half second (configurable) after the checkpoint fsync() completes before deleting/ overwriting any WAL segments. This would be a trivial feature to add to a postgres release, I think. How does that help? It doesn't provide any

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Greg Smith
Scott Carey wrote: For your database DATA disks, leaving the write cache on is 100% acceptable, even with power loss, and without a RAID controller. And even in high write environments. That is what the XLOG is for, isn't it? That is where this behavior is critical. But that has completely

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Karl Denninger
Greg Smith wrote: Scott Carey wrote: For your database DATA disks, leaving the write cache on is 100% acceptable, even with power loss, and without a RAID controller. And even in high write environments. That is what the XLOG is for, isn't it? That is where this behavior is critical.

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Greg Smith
Scott Carey wrote: Moral of the story: Nothing is 100% safe, so sometimes a small bit of KNOWN risk is perfectly fine. There is always UNKNOWN risk. If one risks losing 256K of cached data on an SSD if you're really unlucky with timing, how dangerous is that versus the chance that the raid

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Scott Marlowe
On Thu, Nov 19, 2009 at 10:01 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 18, 2009 at 11:39 PM, Scott Carey sc...@richrelevance.com wrote: Well, that is sort of true for all benchmarks, but I do find that bonnie++ is the worst of the bunch.  I consider it relatively useless

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Anton Rommerskirchen
Am Donnerstag, 19. November 2009 13:29:56 schrieb Craig Ringer: On 19/11/2009 12:22 PM, Scott Carey wrote: 3: Have PG wait a half second (configurable) after the checkpoint fsync() completes before deleting/ overwriting any WAL segments. This would be a trivial feature to add to a

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Brad Nicholson
On Thu, 2009-11-19 at 19:01 +0100, Anton Rommerskirchen wrote: Am Donnerstag, 19. November 2009 13:29:56 schrieb Craig Ringer: On 19/11/2009 12:22 PM, Scott Carey wrote: 3: Have PG wait a half second (configurable) after the checkpoint fsync() completes before deleting/ overwriting any

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Greg Smith
Scott Carey wrote: Have PG wait a half second (configurable) after the checkpoint fsync() completes before deleting/ overwriting any WAL segments. This would be a trivial feature to add to a postgres release, I think. Actually, it already exists! Turn on log archiving, and have the script

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Greg Smith
Scott Marlowe wrote: On Thu, Nov 19, 2009 at 10:01 AM, Merlin Moncure mmonc...@gmail.com wrote: pgbench is actually a pretty awesome i/o tester assuming you have big enough scaling factor Seeing as how pgbench only goes to scaling factor of 4000, are the any plans on enlarging that number?

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Merlin Moncure
On Thu, Nov 19, 2009 at 4:10 PM, Greg Smith g...@2ndquadrant.com wrote: You can use pgbench to either get interesting peak read results, or peak write ones, but it's not real useful for things in between.  The standard test basically turns into a huge stack of writes to a single table, and the

Re: [PERFORM] SSD + RAID

2009-11-19 Thread Scott Marlowe
On Thu, Nov 19, 2009 at 2:39 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Nov 19, 2009 at 4:10 PM, Greg Smith g...@2ndquadrant.com wrote: You can use pgbench to either get interesting peak read results, or peak write ones, but it's not real useful for things in between.  The standard

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Kenny Gorman
I found a bit of time to play with this. I started up a test with 20 concurrent processes all inserting into the same table and committing after each insert. The db was achieving about 5000 inserts per second, and I kept it running for about 10 minutes. The host was doing about 5MB/s of

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Scott Carey
On 11/13/09 10:21 AM, Karl Denninger k...@denninger.net wrote: One caution for those thinking of doing this - the incremental improvement of this setup on PostGresql in WRITE SIGNIFICANT environment isn't NEARLY as impressive. Indeed the performance in THAT case for many workloads may

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Scott Carey
On 11/15/09 12:46 AM, Craig Ringer cr...@postnewspapers.com.au wrote: Possible fixes for this are: - Don't let the drive lie about cache flush operations, ie disable write buffering. - Give Pg some way to find out, from the drive, when particular write operations have actually hit disk.

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes: For your database DATA disks, leaving the write cache on is 100% acceptable, even with power loss, and without a RAID controller. And even in high write environments. Really? How hard have you tested that configuration? That is what the XLOG is

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Scott Carey
On 11/17/09 10:51 AM, Greg Smith g...@2ndquadrant.com wrote: Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. The funny thing about Murphy is that he doesn't visit when things are

Re: [PERFORM] SSD + RAID

2009-11-18 Thread Scott Carey
On 11/17/09 10:58 PM, da...@lang.hm da...@lang.hm wrote: keep in mind that bonnie++ isn't always going to reflect your real performance. I have run tests on some workloads that were definantly I/O limited where bonnie++ results that differed by a factor of 10x made no measurable

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Merlin Moncure
2009/11/13 Greg Smith g...@2ndquadrant.com: As far as what real-world apps have that profile, I like SSDs for small to medium web applications that have to be responsive, where the user shows up and wants their randomly distributed and uncached data with minimal latency. SSDs can also be used

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Brad Nicholson
On Tue, 2009-11-17 at 11:36 -0500, Merlin Moncure wrote: 2009/11/13 Greg Smith g...@2ndquadrant.com: As far as what real-world apps have that profile, I like SSDs for small to medium web applications that have to be responsive, where the user shows up and wants their randomly distributed

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Scott Marlowe
On Tue, Nov 17, 2009 at 9:54 AM, Brad Nicholson bnich...@ca.afilias.info wrote: On Tue, 2009-11-17 at 11:36 -0500, Merlin Moncure wrote: 2009/11/13 Greg Smith g...@2ndquadrant.com: As far as what real-world apps have that profile, I like SSDs for small to medium web applications that have to

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Peter Eisentraut
On tis, 2009-11-17 at 11:36 -0500, Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. I am becoming increasingly suspicious that peter's results are not representative: given that 90% of

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Greg Smith
Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. The funny thing about Murphy is that he doesn't visit when things are quiet. It's quite possible the window for data loss on the drive is

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Merlin Moncure
On Tue, Nov 17, 2009 at 1:51 PM, Greg Smith g...@2ndquadrant.com wrote: Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. The funny thing about Murphy is that he doesn't visit when things

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Mark Mielke
On 11/17/2009 01:51 PM, Greg Smith wrote: Merlin Moncure wrote: I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. The funny thing about Murphy is that he doesn't visit when things are quiet. It's quite possible

Re: [PERFORM] SSD + RAID

2009-11-17 Thread Greg Smith
Merlin Moncure wrote: But what's up with the 400 iops measured from bonnie++? I don't know really. SSD writes are really sensitive to block size and the ability to chunk writes into larger chunks, so it may be that Peter has just found the worst-case behavior and everybody else is seeing

Re: [PERFORM] SSD + RAID

2009-11-17 Thread david
On Wed, 18 Nov 2009, Greg Smith wrote: Merlin Moncure wrote: But what's up with the 400 iops measured from bonnie++? I don't know really. SSD writes are really sensitive to block size and the ability to chunk writes into larger chunks, so it may be that Peter has just found the worst-case

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Craig Ringer
On 15/11/2009 11:57 AM, Laszlo Nagy wrote: Ok, I'm getting confused here. There is the WAL, which is written sequentially. If the WAL is not corrupted, then it can be replayed on next database startup. Please somebody enlighten me! In my mind, fsync is only needed for the WAL. If I could

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Laszlo Nagy
A change has been written to the WAL and fsync()'d, so Pg knows it's hit disk. It can now safely apply the change to the tables themselves, and does so, calling fsync() to tell the drive containing the tables to commit those changes to disk. The drive lies, returning success for the fsync when

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Craig Ringer
On 15/11/2009 2:05 PM, Laszlo Nagy wrote: A change has been written to the WAL and fsync()'d, so Pg knows it's hit disk. It can now safely apply the change to the tables themselves, and does so, calling fsync() to tell the drive containing the tables to commit those changes to disk. The

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Laszlo Nagy
- Pg doesn't know the erase block sizes or positions. It can't group writes up by erase block except by hoping that, within a given file, writing in page order will get the blocks to the disk in roughly erase-block order. So your write caching isn't going to do anywhere near as good a job as

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Craig James
I've wondered whether this would work for a read-mostly application: Buy a big RAM machine, like 64GB, with a crappy little single disk. Build the database, then make a really big RAM disk, big enough to hold the DB and the WAL. Then build a duplicate DB on another machine with a decent disk

Re: [PERFORM] SSD + RAID

2009-11-15 Thread Heikki Linnakangas
Craig James wrote: I've wondered whether this would work for a read-mostly application: Buy a big RAM machine, like 64GB, with a crappy little single disk. Build the database, then make a really big RAM disk, big enough to hold the DB and the WAL. Then build a duplicate DB on another machine

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Lists
Laszlo Nagy wrote: Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based SSD is not a

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Ivan Voras
Lists wrote: Laszlo Nagy wrote: Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Heikki Linnakangas
Merlin Moncure wrote: 2009/11/13 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: Laszlo Nagy wrote: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy.

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Merlin Moncure
On Sat, Nov 14, 2009 at 6:17 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: lots of ram doesn't help you if: *) your database gets written to a lot and you have high performance requirements When all the (hot) data is cached, all writes are sequential writes to the WAL,

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Heikki Linnakangas
Merlin Moncure wrote: On Sat, Nov 14, 2009 at 6:17 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: lots of ram doesn't help you if: *) your database gets written to a lot and you have high performance requirements When all the (hot) data is cached, all writes are sequential

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy
Heikki Linnakangas wrote: Laszlo Nagy wrote: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. At 32GB database size, I'd seriously consider just

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Robert Haas
2009/11/14 Laszlo Nagy gand...@shopzeus.com: 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe. Power out - data

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Merlin Moncure
On Sat, Nov 14, 2009 at 8:47 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Merlin Moncure wrote: On Sat, Nov 14, 2009 at 6:17 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: lots of ram doesn't help you if: *) your database gets written to a lot and you

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy
Robert Haas wrote: 2009/11/14 Laszlo Nagy gand...@shopzeus.com: 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe.

Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy
* I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. This was the solution I went with (4 drives in a raid 10 actually). Not a cheap solution, but the performance is

[PERFORM] SSD + RAID

2009-11-13 Thread Laszlo Nagy
Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Laszlo Nagy
Note that some RAID controllers (3Ware in particular) refuse to recognize the MLC drives, in particular, they act as if the OCZ Vertex series do not exist when connected. I don't know what they're looking for (perhaps some indication that actual rotation is happening?) but this is a potential

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Marcos Ortiz Valmaseda
This is very fast. On IT Toolbox there are many whitepapers about it. On the ERP and DataCenter sections specifically. We need that all tests that we do, we can share it on the Project Wiki. Regards On Nov 13, 2009, at 7:02 AM, Karl Denninger wrote: Laszlo Nagy wrote: Hello, I'm about to

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Scott Marlowe
2009/11/13 Laszlo Nagy gand...@shopzeus.com: Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns:   * I need at least 32GB disk space.

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Merlin Moncure
On Fri, Nov 13, 2009 at 9:48 AM, Scott Marlowe scott.marl...@gmail.com wrote: I think RAID6 is gonna reduce the throughput due to overhead to something far less than what a software RAID-10 would achieve. I was wondering about this. I think raid 5/6 might be a better fit for SSD than

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Heikki Linnakangas
Laszlo Nagy wrote: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. At 32GB database size, I'd seriously consider just buying a server with a regular hard

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Merlin Moncure
2009/11/13 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: Laszlo Nagy wrote:    * I need at least 32GB disk space. So DRAM based SSD is not a real      option. I would have to buy 8x4GB memory, costs a fortune. And      then it would still not have redundancy. At 32GB database size,

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Scott Carey
On 11/13/09 7:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Nov 13, 2009 at 9:48 AM, Scott Marlowe scott.marl...@gmail.com wrote: I think RAID6 is gonna reduce the throughput due to overhead to something far less than what a software RAID-10 would achieve. I was wondering

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Karl Denninger
Greg Smith wrote: In order for a drive to work reliably for database use such as for PostgreSQL, it cannot have a volatile write cache. You either need a write cache with a battery backup (and a UPS doesn't count), or to turn the cache off. The SSD performance figures you've been looking

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Greg Smith
Karl Denninger wrote: If power is unexpectedly removed from the system, this is true. But the caches on the SSD controllers are BUFFERS. An operating system crash does not disrupt the data in them or cause corruption. An unexpected disconnection of the power source from the drive (due to

Re: [PERFORM] SSD + RAID

2009-11-13 Thread Karl Denninger
Greg Smith wrote: Karl Denninger wrote: If power is unexpectedly removed from the system, this is true. But the caches on the SSD controllers are BUFFERS. An operating system crash does not disrupt the data in them or cause corruption. An unexpected disconnection of the power source from

  1   2   >