Re: [PERFORM] understanding postgres issues/bottlenecks
At 03:28 PM 1/8/2009, Merlin Moncure wrote: On Thu, Jan 8, 2009 at 9:42 AM, Stefano Nichele wrote: > Merlin Moncure wrote: >> IIRC that's the 'perc 6ir' card...no write caching. You are getting >> killed with syncs. If you can restart the database, you can test with >> fsync=off comparing load to confirm this. (another way is to compare >> select only vs regular transactions on pgbench). > > I'll try next Saturday. > just be aware of the danger . hard reset (power off) class of failure when fsync = off means you are loading from backups. merlin That's what redundant power conditioning UPS's are supposed to help prevent ;-) Merlin is of course absolutely correct that you are taking a bigger risk if you turn fsync off. I would not recommend fysnc = off if you do not have other safety measures in place to protect against data loss because of a power event.. (At least for most DB applications.) ...and of course, those lucky few with bigger budgets can use SSD's and not care what fsync is set to. Ron -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, Jan 10, 2009 at 5:40 AM, Ron wrote: > At 03:28 PM 1/8/2009, Merlin Moncure wrote: >> just be aware of the danger . hard reset (power off) class of failure >> when fsync = off means you are loading from backups. > > That's what redundant power conditioning UPS's are supposed to help prevent > ;-) But of course, they can't prevent them, but only reduce the likelihood of their occurrance. Everyone who's working in large hosting environments has at least one horror story to tell about a power outage that never should have happened. > I would not recommend fysnc = off if you do not have other safety measures > in place to protect against data loss because of a power event.. > (At least for most DB applications.) Agreed. Keep in mind that you'll be losing whatever wasn't transferred to the backup machines. > ...and of course, those lucky few with bigger budgets can use SSD's and not > care what fsync is set to. Would that prevent any corruption if the writes got out of order because of lack of fsync? Or partial writes? Or wouldn't fsync still need to be turned on to keep the data safe. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
"Scott Marlowe" writes: > On Sat, Jan 10, 2009 at 5:40 AM, Ron wrote: >> At 03:28 PM 1/8/2009, Merlin Moncure wrote: >>> just be aware of the danger . hard reset (power off) class of failure >>> when fsync = off means you are loading from backups. >> >> That's what redundant power conditioning UPS's are supposed to help prevent >> ;-) > > But of course, they can't prevent them, but only reduce the likelihood > of their occurrance. Everyone who's working in large hosting > environments has at least one horror story to tell about a power > outage that never should have happened. Or a system crash. If the kernel panics for any reason when it has dirty buffers in memory the database will need to be restored. >> ...and of course, those lucky few with bigger budgets can use SSD's and not >> care what fsync is set to. > > Would that prevent any corruption if the writes got out of order > because of lack of fsync? Or partial writes? Or wouldn't fsync still > need to be turned on to keep the data safe. I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] block device benchmarking
Hi, I'm fiddling with a hand-made block device based benchmarking thingie, which I want to run random reads and writes of relatively small blocks (somewhat similar to databases). I'm much less interested in measuring throughput, but rather in latency. Besides varying block sizes, I'm also testing with a varying number of concurrent threads and varying read/write ratios. As a result, I'm interested in roughly the following graphs: * (single thread) i/o latency vs. seek distance * (single thread) throughput vs. (accurator) position * (single thread) i/o latency vs. no of concurrent threads * total requests per second + throughput vs. no of concurrent threads * total requests per second + throughput vs. read/write ratio * total requests per second + throughput vs. block size * distribution of access times (histogram) (Of course, not all of these are relevant for all types of storages.) Does there already exist a tool giving (most of) these measures? Am I missing something interesting? What would you expect from a block device benchmarking tool? Regards Markus Wanner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, 10 Jan 2009, Gregory Stark wrote: ...and of course, those lucky few with bigger budgets can use SSD's and not care what fsync is set to. Would that prevent any corruption if the writes got out of order because of lack of fsync? Or partial writes? Or wouldn't fsync still need to be turned on to keep the data safe. I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. I don't think that this is true, even if your SSD is battery backed RAM (as opposed to the flash based devices that have slower writes than high-end hard drives) you can complete 'writes' to the system RAM faster than the OS can get the data to the drive, so if you don't do a fsync you can still loose a lot in a power outage. raid controllers with battery backed ram cache will make the fsyncs very cheap (until the cache fills up anyway) with SSDs having extremely good read speeds, but poor (at least by comparison) write speeds I wonder if any of the RAID controllers are going to get a mode where they cache writes, but don't cache reads, leaving all ofyour cache to handle writes. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Hi, da...@lang.hm wrote: > On Sat, 10 Jan 2009, Gregory Stark wrote: >> I think the idea is that with SSDs or a RAID with a battery backed >> cache you >> can leave fsync on and not have any significant performance hit since >> the seek >> times are very fast for SSD. They have limited bandwidth but bandwidth >> to the >> WAL is rarely an issue -- just latency. That's also my understanding. > with SSDs having extremely good read speeds, but poor (at least by > comparison) write speeds I wonder if any of the RAID controllers are > going to get a mode where they cache writes, but don't cache reads, > leaving all ofyour cache to handle writes. My understanding of SSDs so far is, that they are not that bad at writing *on average*, but to perform wear-leveling, they sometimes have to shuffle around multiple blocks at once. So there are pretty awful spikes for writing latency (IIRC more than 100ms has been measured on cheaper disks). A battery backed cache could theoretically flatten those, as long as your avg. WAL throughput is below the SSDs avg. writing throughput. Regards Markus Wanner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, Jan 10, 2009 at 12:00 PM, Markus Wanner wrote: > Hi, > > da...@lang.hm wrote: >> On Sat, 10 Jan 2009, Gregory Stark wrote: >>> I think the idea is that with SSDs or a RAID with a battery backed >>> cache you >>> can leave fsync on and not have any significant performance hit since >>> the seek >>> times are very fast for SSD. They have limited bandwidth but bandwidth >>> to the >>> WAL is rarely an issue -- just latency. > > That's also my understanding. > >> with SSDs having extremely good read speeds, but poor (at least by >> comparison) write speeds I wonder if any of the RAID controllers are >> going to get a mode where they cache writes, but don't cache reads, >> leaving all ofyour cache to handle writes. > > My understanding of SSDs so far is, that they are not that bad at > writing *on average*, but to perform wear-leveling, they sometimes have > to shuffle around multiple blocks at once. So there are pretty awful > spikes for writing latency (IIRC more than 100ms has been measured on > cheaper disks). Multiply it by 10 and apply to both reads and writes for most cheap SSDs when doing random writes and reads mixed together. Which is why so many discussions specificall mention the intel XM series, because they don't suck like that. They keep good access times even under several random read / write threads. Some review of the others was posted here a while back and it was astounding how slow the others became in a mixed read / write benchmark. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, 10 Jan 2009, Markus Wanner wrote: da...@lang.hm wrote: On Sat, 10 Jan 2009, Gregory Stark wrote: I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. That's also my understanding. with SSDs having extremely good read speeds, but poor (at least by comparison) write speeds I wonder if any of the RAID controllers are going to get a mode where they cache writes, but don't cache reads, leaving all ofyour cache to handle writes. My understanding of SSDs so far is, that they are not that bad at writing *on average*, but to perform wear-leveling, they sometimes have to shuffle around multiple blocks at once. So there are pretty awful spikes for writing latency (IIRC more than 100ms has been measured on cheaper disks). well, I have one of those cheap disks. brand new out of the box, format the 32G drive, then copy large files to it (~1G per file). this should do almost no wear-leveling, but it's write performance is still poor and it has occasional 1 second pauses. I for my initial tests I hooked it up to a USB->SATA adapter and the write speed is showing about half of what I can get on a 1.5TB SATA drive hooked to the same system. the write speed is fairly comparable to what you can do with slow laptop drives (even ignoring the pauses) read speed is much better (and I think limited by the USB) the key thing with any new storage technology (including RAID controller) is that you need to do your own testing, treat the manufacturers specs as ideal conditions or 'we guarentee that the product will never do better than this' specs Imation has a white paper on their site about solid state drive performance that is interesting. among other things it shows that high-speed SCSI drives are still a significant win in random-write workloads at this point, if I was specing out a new high-end system I would be looking at and testing somthing like the following SSD for read-mostly items (OS, possibly some indexes) 15K SCSI drives for heavy writing (WAL, indexes, temp tables, etc) SATA drives for storage capacity (table contents) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
da...@lang.hm writes: > On Sat, 10 Jan 2009, Markus Wanner wrote: > >> My understanding of SSDs so far is, that they are not that bad at >> writing *on average*, but to perform wear-leveling, they sometimes have >> to shuffle around multiple blocks at once. So there are pretty awful >> spikes for writing latency (IIRC more than 100ms has been measured on >> cheaper disks). That would be fascinating. And frightening. A lot of people have been recommending these for WAL disks and this would be make them actually *worse* than regular drives. > well, I have one of those cheap disks. > > brand new out of the box, format the 32G drive, then copy large files to it > (~1G per file). this should do almost no wear-leveling, but it's write > performance is still poor and it has occasional 1 second pauses. This isn't similar to the way WAL behaves though. What you're testing is the behaviour when the bandwidth to the SSD is saturated. At that point some point in the stack, whether in the SSD, the USB hardware or driver, or OS buffer cache can start to queue up writes. The stalls you see could be the behaviour when that queue fills up and it needs to push back to higher layers. To simulate WAL you want to transfer smaller volumes of data, well below the bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time each fsync and see whether the time they take is proportional to the amount of data written in the meantime or whether they randomly spike upwards. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, 10 Jan 2009, Gregory Stark wrote: da...@lang.hm writes: On Sat, 10 Jan 2009, Markus Wanner wrote: My understanding of SSDs so far is, that they are not that bad at writing *on average*, but to perform wear-leveling, they sometimes have to shuffle around multiple blocks at once. So there are pretty awful spikes for writing latency (IIRC more than 100ms has been measured on cheaper disks). That would be fascinating. And frightening. A lot of people have been recommending these for WAL disks and this would be make them actually *worse* than regular drives. well, I have one of those cheap disks. brand new out of the box, format the 32G drive, then copy large files to it (~1G per file). this should do almost no wear-leveling, but it's write performance is still poor and it has occasional 1 second pauses. This isn't similar to the way WAL behaves though. What you're testing is the behaviour when the bandwidth to the SSD is saturated. At that point some point in the stack, whether in the SSD, the USB hardware or driver, or OS buffer cache can start to queue up writes. The stalls you see could be the behaviour when that queue fills up and it needs to push back to higher layers. To simulate WAL you want to transfer smaller volumes of data, well below the bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time each fsync and see whether the time they take is proportional to the amount of data written in the meantime or whether they randomly spike upwards. if you have a specific benchmark for me to test I would be happy to do this. the test that I did is basicly the best-case for the SSD (more-or-less sequential writes where the vendors claim that the drives match or slightly outperform the traditional disks). for random writes the vendors put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives. take a look at this paper http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf this is not one of the low-performance drives, they include a sandisk drive in the paper that shows significantly less performance (but the same basic pattern) than the imation drives. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
At 10:36 AM 1/10/2009, Gregory Stark wrote: "Scott Marlowe" writes: > On Sat, Jan 10, 2009 at 5:40 AM, Ron wrote: >> At 03:28 PM 1/8/2009, Merlin Moncure wrote: >>> just be aware of the danger . hard reset (power off) class of failure >>> when fsync = off means you are loading from backups. >> >> That's what redundant power conditioning UPS's are supposed to help prevent >> ;-) > > But of course, they can't prevent them, but only reduce the likelihood > of their occurrance. Everyone who's working in large hosting > environments has at least one horror story to tell about a power > outage that never should have happened. Or a system crash. If the kernel panics for any reason when it has dirty buffers in memory the database will need to be restored. A power conditioning UPS should prevent a building wide or circuit level bad power event, caused by either dirty power or a power loss, from affecting the host. Within the design limits of the UPS in question of course. So the real worry with fsync = off in a environment with redundant decent UPS's is pretty much limited to host level HW failures, SW crashes, and unlikely catastrophes like building collapses, lightning strikes, floods, etc. Not that your fsync setting is going to matter much in the event of catastrophes in the physical environment... Like anything else, there is usually more than one way to reduce risk while at the same time meeting (realistic) performance goals. If you need the performance implied by fsync off, then you have to take other steps to reduce the risk of data corruption down to about the same statistical level as running with fsync on. Or you have to decide that you are willing to life with the increased risk (NOT my recommendation for most DB hosting scenarios.) >> ...and of course, those lucky few with bigger budgets can use SSD's and not >> care what fsync is set to. > > Would that prevent any corruption if the writes got out of order > because of lack of fsync? Or partial writes? Or wouldn't fsync still > need to be turned on to keep the data safe. I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. Yes, Greg understands what I meant here. In the case of SSDs, the performance hit of fsync = on is essentially zero. In the case of battery backed RAM caches for RAID arrays, the efficacy is dependent on how the size of the cache compares with the working set of the disk access pattern. Ron -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, 10 Jan 2009, Ron wrote: At 10:36 AM 1/10/2009, Gregory Stark wrote: "Scott Marlowe" writes: > On Sat, Jan 10, 2009 at 5:40 AM, Ron wrote: >> At 03:28 PM 1/8/2009, Merlin Moncure wrote: >>> just be aware of the danger . hard reset (power off) class of failure >>> when fsync = off means you are loading from backups. >> >> That's what redundant power conditioning UPS's are supposed to help prevent >> ;-) > > But of course, they can't prevent them, but only reduce the likelihood > of their occurrance. Everyone who's working in large hosting > environments has at least one horror story to tell about a power > outage that never should have happened. Or a system crash. If the kernel panics for any reason when it has dirty buffers in memory the database will need to be restored. A power conditioning UPS should prevent a building wide or circuit level bad power event, caused by either dirty power or a power loss, from affecting the host. Within the design limits of the UPS in question of course. So the real worry with fsync = off in a environment with redundant decent UPS's is pretty much limited to host level HW failures, SW crashes, and unlikely catastrophes like building collapses, lightning strikes, floods, etc. I've seen datacenters with redundant UPSs go dark unexpectedly. it's less common, but it does happen. Not that your fsync setting is going to matter much in the event of catastrophes in the physical environment... questionable, but sometimes true. in the physical environment disasters you will loose access to your data for a while, but after the drives are dug out of the rubble (or dried out from the flood) the data can probably be recovered. for crying out loud, they were able to recover most of the data from the hard drives in the latest shuttle disaster. Like anything else, there is usually more than one way to reduce risk while at the same time meeting (realistic) performance goals. very true. >> ...and of course, those lucky few with bigger budgets can use SSD's and not >> care what fsync is set to. > > Would that prevent any corruption if the writes got out of order > because of lack of fsync? Or partial writes? Or wouldn't fsync still > need to be turned on to keep the data safe. I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. Yes, Greg understands what I meant here. In the case of SSDs, the performance hit of fsync = on is essentially zero. this is definantly not the case. fsync off the data stays in memory and may never end up being sent to the drive. RAM speeds are several orders of magnatude faster than the interfaces to the drives (or even to the RAID controllers in high-speed slots) it may be that it's fast enough (see the other posts disputing that), but don't think that it's the same. David Lang In the case of battery backed RAM caches for RAID arrays, the efficacy is dependent on how the size of the cache compares with the working set of the disk access pattern. Ron -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Ron writes: > At 10:36 AM 1/10/2009, Gregory Stark wrote: >> >> Or a system crash. If the kernel panics for any reason when it has dirty >> buffers in memory the database will need to be restored. > > A power conditioning UPS should prevent a building wide or circuit level bad > power event Except of course those caused *by* a faulty UPS. Or for that matter by the power supply in the computer or drive array, or someone just accidentally hitting the wrong power button. I'm surprised people are so confident in their kernels though. I know some computers with uptimes measured in years but I know far more which don't. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sat, 10 Jan 2009, Luke Lonergan wrote: The new MLC based SSDs have better wear leveling tech and don't suffer the pauses. Intel X25-M 80 and 160 GB SSDs are both pause-free. See Anandtech's test results for details. they don't suffer the pauses, but they still don't have fantasic write speeds. David Lang Intel's SLC SSDs should also be good enough but they're smaller. - Luke - Original Message - From: pgsql-performance-ow...@postgresql.org To: Gregory Stark Cc: Markus Wanner ; Scott Marlowe ; Ron ; pgsql-performance@postgresql.org Sent: Sat Jan 10 14:40:51 2009 Subject: Re: [PERFORM] understanding postgres issues/bottlenecks On Sat, 10 Jan 2009, Gregory Stark wrote: da...@lang.hm writes: On Sat, 10 Jan 2009, Markus Wanner wrote: My understanding of SSDs so far is, that they are not that bad at writing *on average*, but to perform wear-leveling, they sometimes have to shuffle around multiple blocks at once. So there are pretty awful spikes for writing latency (IIRC more than 100ms has been measured on cheaper disks). That would be fascinating. And frightening. A lot of people have been recommending these for WAL disks and this would be make them actually *worse* than regular drives. well, I have one of those cheap disks. brand new out of the box, format the 32G drive, then copy large files to it (~1G per file). this should do almost no wear-leveling, but it's write performance is still poor and it has occasional 1 second pauses. This isn't similar to the way WAL behaves though. What you're testing is the behaviour when the bandwidth to the SSD is saturated. At that point some point in the stack, whether in the SSD, the USB hardware or driver, or OS buffer cache can start to queue up writes. The stalls you see could be the behaviour when that queue fills up and it needs to push back to higher layers. To simulate WAL you want to transfer smaller volumes of data, well below the bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time each fsync and see whether the time they take is proportional to the amount of data written in the meantime or whether they randomly spike upwards. if you have a specific benchmark for me to test I would be happy to do this. the test that I did is basicly the best-case for the SSD (more-or-less sequential writes where the vendors claim that the drives match or slightly outperform the traditional disks). for random writes the vendors put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives. take a look at this paper http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf this is not one of the low-performance drives, they include a sandisk drive in the paper that shows significantly less performance (but the same basic pattern) than the imation drives. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
The new MLC based SSDs have better wear leveling tech and don't suffer the pauses. Intel X25-M 80 and 160 GB SSDs are both pause-free. See Anandtech's test results for details. Intel's SLC SSDs should also be good enough but they're smaller. - Luke - Original Message - From: pgsql-performance-ow...@postgresql.org To: Gregory Stark Cc: Markus Wanner ; Scott Marlowe ; Ron ; pgsql-performance@postgresql.org Sent: Sat Jan 10 14:40:51 2009 Subject: Re: [PERFORM] understanding postgres issues/bottlenecks On Sat, 10 Jan 2009, Gregory Stark wrote: > da...@lang.hm writes: > >> On Sat, 10 Jan 2009, Markus Wanner wrote: >> >>> My understanding of SSDs so far is, that they are not that bad at >>> writing *on average*, but to perform wear-leveling, they sometimes have >>> to shuffle around multiple blocks at once. So there are pretty awful >>> spikes for writing latency (IIRC more than 100ms has been measured on >>> cheaper disks). > > That would be fascinating. And frightening. A lot of people have been > recommending these for WAL disks and this would be make them actually *worse* > than regular drives. > >> well, I have one of those cheap disks. >> >> brand new out of the box, format the 32G drive, then copy large files to it >> (~1G per file). this should do almost no wear-leveling, but it's write >> performance is still poor and it has occasional 1 second pauses. > > This isn't similar to the way WAL behaves though. What you're testing is the > behaviour when the bandwidth to the SSD is saturated. At that point some point > in the stack, whether in the SSD, the USB hardware or driver, or OS buffer > cache can start to queue up writes. The stalls you see could be the behaviour > when that queue fills up and it needs to push back to higher layers. > > To simulate WAL you want to transfer smaller volumes of data, well below the > bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time > each fsync and see whether the time they take is proportional to the amount of > data written in the meantime or whether they randomly spike upwards. if you have a specific benchmark for me to test I would be happy to do this. the test that I did is basicly the best-case for the SSD (more-or-less sequential writes where the vendors claim that the drives match or slightly outperform the traditional disks). for random writes the vendors put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives. take a look at this paper http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf this is not one of the low-performance drives, they include a sandisk drive in the paper that shows significantly less performance (but the same basic pattern) than the imation drives. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] block device benchmarking
Markus Wanner wrote: > Hi, > > I'm fiddling with a hand-made block device based benchmarking thingie, > which I want to run random reads and writes of relatively small blocks > (somewhat similar to databases). I'm much less interested in measuring > throughput, but rather in latency. Besides varying block sizes, I'm also > testing with a varying number of concurrent threads and varying > read/write ratios. As a result, I'm interested in roughly the following > graphs: > > * (single thread) i/o latency vs. seek distance > * (single thread) throughput vs. (accurator) position > * (single thread) i/o latency vs. no of concurrent threads > * total requests per second + throughput vs. no of concurrent threads > * total requests per second + throughput vs. read/write ratio > * total requests per second + throughput vs. block size > * distribution of access times (histogram) > > (Of course, not all of these are relevant for all types of storages.) > > Does there already exist a tool giving (most of) these measures? Am I > missing something interesting? What would you expect from a block device > benchmarking tool? > > Regards > > Markus Wanner > Check out the work of Jens Axboe and Alan Brunelle, specifically the packages "blktrace" and "fio". "blktrace" acts as a "sniffer" for I/O, recording the path of every I/O operation through the block I/O layer. Using another tool in the package, "btreplay/btrecord", you can translate the captured trace into a benchmark that re-issues the I/Os. And the third tool in the package, "btt", does statistical analysis. I don't think you really need "benchmarks" if you can extract this kind of detail from a real application. :) However, if you do want to build a benchmark, "fio" is a customizable benchmark utility. In the absence of real-world traces, you can emulate any I/O activity pattern with "fio". "fio" is what Mark Wong's group has been using to characterize filesystem behavior. I'm not sure where the presentations are at the moment, but there is some of it at http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide There are also some more generic filesystem benchmarks like "iozone" and "bonnie++". They're a good general tool for comparing filesystems and I/O subsystems, but the other tools are more useful if you have a specific workload, for example, a PostgreSQL application. BTW ... I am working on my blktrace howto even as I type this. I don't have an ETA -- that's going to depend on how long it takes me to get the PostgreSQL benchmarks I'm using to work on my machine. But everything will be on Github at http://github.com/znmeb/linux_perf_viz/tree/master/blktrace-howto as it evolves. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
I believe they write at 200MB/s which is outstanding for sequential BW. Not sure about the write latency, though the Anandtech benchmark results showed high detail and IIRC the write latencies were very good. - Luke - Original Message - From: da...@lang.hm To: Luke Lonergan Cc: st...@enterprisedb.com ; mar...@bluegap.ch ; scott.marl...@gmail.com ; rjpe...@earthlink.net ; pgsql-performance@postgresql.org Sent: Sat Jan 10 16:03:32 2009 Subject: Re: [PERFORM] understanding postgres issues/bottlenecks On Sat, 10 Jan 2009, Luke Lonergan wrote: > The new MLC based SSDs have better wear leveling tech and don't suffer > the pauses. Intel X25-M 80 and 160 GB SSDs are both pause-free. See > Anandtech's test results for details. they don't suffer the pauses, but they still don't have fantasic write speeds. David Lang > Intel's SLC SSDs should also be good enough but they're smaller. > > - Luke > > - Original Message - > From: pgsql-performance-ow...@postgresql.org > > To: Gregory Stark > Cc: Markus Wanner ; Scott Marlowe > ; Ron ; > pgsql-performance@postgresql.org > Sent: Sat Jan 10 14:40:51 2009 > Subject: Re: [PERFORM] understanding postgres issues/bottlenecks > > On Sat, 10 Jan 2009, Gregory Stark wrote: > >> da...@lang.hm writes: >> >>> On Sat, 10 Jan 2009, Markus Wanner wrote: >>> My understanding of SSDs so far is, that they are not that bad at writing *on average*, but to perform wear-leveling, they sometimes have to shuffle around multiple blocks at once. So there are pretty awful spikes for writing latency (IIRC more than 100ms has been measured on cheaper disks). >> >> That would be fascinating. And frightening. A lot of people have been >> recommending these for WAL disks and this would be make them actually *worse* >> than regular drives. >> >>> well, I have one of those cheap disks. >>> >>> brand new out of the box, format the 32G drive, then copy large files to it >>> (~1G per file). this should do almost no wear-leveling, but it's write >>> performance is still poor and it has occasional 1 second pauses. >> >> This isn't similar to the way WAL behaves though. What you're testing is the >> behaviour when the bandwidth to the SSD is saturated. At that point some >> point >> in the stack, whether in the SSD, the USB hardware or driver, or OS buffer >> cache can start to queue up writes. The stalls you see could be the behaviour >> when that queue fills up and it needs to push back to higher layers. >> >> To simulate WAL you want to transfer smaller volumes of data, well below the >> bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time >> each fsync and see whether the time they take is proportional to the amount >> of >> data written in the meantime or whether they randomly spike upwards. > > if you have a specific benchmark for me to test I would be happy to do > this. > > the test that I did is basicly the best-case for the SSD (more-or-less > sequential writes where the vendors claim that the drives match or > slightly outperform the traditional disks). for random writes the vendors > put SSDs at fewer IOPS than 5400 rpm drives, let along 15K rpm drives. > > take a look at this paper > http://www.imation.com/PageFiles/83/Imation-SSD-Performance-White-Paper.pdf > > this is not one of the low-performance drives, they include a sandisk > drive in the paper that shows significantly less performance (but the same > basic pattern) than the imation drives. > > David Lang > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
da...@lang.hm wrote: On Sat, 10 Jan 2009, Luke Lonergan wrote: The new MLC based SSDs have better wear leveling tech and don't suffer the pauses. Intel X25-M 80 and 160 GB SSDs are both pause-free. See Anandtech's test results for details. they don't suffer the pauses, but they still don't have fantasic write speeds. David Lang Intel's SLC SSDs should also be good enough but they're smaller. From what I can see, SLC SSDs are still quite superior for reliability and (write) performance. However they are too small and too expensive right now. Hopefully the various manufacturers are working on improving the size/price issue for SLC, as well as improving the performance/reliability area for the MLC products. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
On Sun, 11 Jan 2009, Mark Kirkwood wrote: da...@lang.hm wrote: On Sat, 10 Jan 2009, Luke Lonergan wrote: The new MLC based SSDs have better wear leveling tech and don't suffer the pauses. Intel X25-M 80 and 160 GB SSDs are both pause-free. See Anandtech's test results for details. they don't suffer the pauses, but they still don't have fantasic write speeds. David Lang Intel's SLC SSDs should also be good enough but they're smaller. From what I can see, SLC SSDs are still quite superior for reliability and (write) performance. However they are too small and too expensive right now. Hopefully the various manufacturers are working on improving the size/price issue for SLC, as well as improving the performance/reliability area for the MLC products. the very nature of the technology means that SLC will never be as cheap as MLC and MLC will never be as reliable as SLC take a look at http://www.imation.com/PageFiles/83/SSD-Reliability-Lifetime-White-Paper.pdf for a good writeup of the technology. for both technologies, the price will continue to drop, and the reliability and performance will continue to climb, but I don't see anything that would improve one without the other (well, I could see MLC gaining a 50% capacity boost if they can get to 3 bits per cell vs the current 2, but that would come at the cost of reliability again) for write performance I don't think there is as much of a difference between the two technologies. today there is a huge difference in most of the shipping products, but Intel has now demonstrated that it's mostly due to the controller chip, so I expect much of that difference to vanish in the next year or so (as new generations of controller chips ship) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance