Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
On Tue, Aug 25, 2015 at 4:31 PM, Gavin Flower wrote: > On 26/08/15 05:54, David Kerr wrote: > >> On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote: >> >>> However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your I/O bandwith). Am I missing something? Two things I can think of: >>> >>> Transaction writes are entirely sequential. If you have disks >>> assigned for just this purpose, then the heads will always be in the >>> right spot, and the writes go through more quickly. >>> >>> A database server process waits until the transaction logs are >>> written and then returns control to the client. The data writes can >>> be done in the background while the client goes on to do other >>> things. Splitting up data and logs mean that there is less chance >>> the disk controller will cause data writes to interfere with log >>> files. >>> >>> Kind regards, >>> Andomar >>> >>> hmm, yeah those are both what I'd lump into "I/O bandwith". >> If your disk subsystem is fast enough, or you're on a RAIDd SAN >> or EBS you'd either overcome that, or not neccssarily be able to. >> >> >> >> Back when I actually understood the various timings of disc accessing on > a MainFrame system, back in the 1980's (disc layout & accessing, is way > more complicated now!), I found that there was a considerable difference > between mainly sequential & mostly random access - easily greater than a > factor of 5 (from memory) in terms of throughput. > > Considering the time to move heads between tracks and rotational latency > (caused by not reading sequential blocks on the same track). There are > other complications, which I have glossed over! > > It can go even further now with the use of SSDs. You can put the xlogs on an SSD and the rest of the database on a mechanical drive. Same can be said about partitions, you can place the most accessed partition on an SSD and the rest of the db on a mechanical drive. -Joseph Kregloh > > Cheers, > Gavin > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
On 26/08/15 05:54, David Kerr wrote: On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote: However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your I/O bandwith). Am I missing something? Two things I can think of: Transaction writes are entirely sequential. If you have disks assigned for just this purpose, then the heads will always be in the right spot, and the writes go through more quickly. A database server process waits until the transaction logs are written and then returns control to the client. The data writes can be done in the background while the client goes on to do other things. Splitting up data and logs mean that there is less chance the disk controller will cause data writes to interfere with log files. Kind regards, Andomar hmm, yeah those are both what I'd lump into "I/O bandwith". If your disk subsystem is fast enough, or you're on a RAIDd SAN or EBS you'd either overcome that, or not neccssarily be able to. Back when I actually understood the various timings of disc accessing on a MainFrame system, back in the 1980's (disc layout & accessing, is way more complicated now!), I found that there was a considerable difference between mainly sequential & mostly random access - easily greater than a factor of 5 (from memory) in terms of throughput. Considering the time to move heads between tracks and rotational latency (caused by not reading sequential blocks on the same track). There are other complications, which I have glossed over! Cheers, Gavin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
> On Aug 25, 2015, at 10:45 AM, Bill Moran wrote: > > On Tue, 25 Aug 2015 10:08:48 -0700 > David Kerr wrote: > >> Howdy All, >> >> For a very long time I've held the belief that splitting PGDATA and xlog on >> linux systems fairly universally gives a decent performance benefit for many >> common workloads. >> (i've seen up to 20% personally). >> >> I was under the impression that this had to do with regular fsync()'s from >> the WAL >> interfearing with and over-reaching writing out the filesystem buffers. >> >> Basically, I think i was conflating fsync() with sync(). >> >> So if it's not that, then that just leaves bandwith (ignoring all of the >> other best practice reasons for reliablity, etc.). So, in theory if you're >> not swamping your disk I/O then you won't really benefit from relocating >> your XLOGs. > > Disk performance can be a bit more complicated than just "swamping." Even if Funny, on revision of my question, I left out basically that exact line for simplicity sake. =) > you're not maxing out the IO bandwidth, you could be getting enough that some > writes are waiting on other writes before they can be processed. Consider the > fact that old-style ethernet was only able to hit ~80% of its theoretical > capacity in the real world, because the chance of collisions increased with > the amount of data, and each collision slowed down the overall transfer speed. > Contrasted with modern ethernet that doesn't do collisions, you can get much > closer to 100% of the rated bandwith because the communications are > effectively > partitioned from each other. > > In the worst case scenerion, if two processes (due to horrible luck) _always_ > try to write at the same time, the overall responsiveness will be lousy, even > if the bandwidth usage is only a small percent of the available. Of course, > that worst case doesn't happen in actual practice, but as the usage goes up, > the chance of hitting that interference increases, and the effective response > goes down, even when there's bandwidth still available. > > Separate the competing processes, and the chance of conflict is 0. So your > responsiveness is pretty much at best-case all the time. Understood. Now in my previous delve into this issue, I showed minimal/no disk queuing, the SAN showed nothing on it's queues and no retries. (of course #NeverTrustTheSANGuy) but I still yielded a 20% performance increase by splitting the WAL and $PGDATA But that's besides the point and my data on that environment is long gone. I'm content to leave this at "I/O is complicated" I just wanted to make sure that i wasn't correct but for a slightly wrong reason. Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote: > >However, I know from experience that's not entirely true, (although it's not > >always easy to measure all aspects of your I/O bandwith). > > > >Am I missing something? > > > Two things I can think of: > > Transaction writes are entirely sequential. If you have disks > assigned for just this purpose, then the heads will always be in the > right spot, and the writes go through more quickly. > > A database server process waits until the transaction logs are > written and then returns control to the client. The data writes can > be done in the background while the client goes on to do other > things. Splitting up data and logs mean that there is less chance > the disk controller will cause data writes to interfere with log > files. > > Kind regards, > Andomar > hmm, yeah those are both what I'd lump into "I/O bandwith". If your disk subsystem is fast enough, or you're on a RAIDd SAN or EBS you'd either overcome that, or not neccssarily be able to. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
On Tue, 25 Aug 2015 10:08:48 -0700 David Kerr wrote: > Howdy All, > > For a very long time I've held the belief that splitting PGDATA and xlog on > linux systems fairly universally gives a decent performance benefit for many > common workloads. > (i've seen up to 20% personally). > > I was under the impression that this had to do with regular fsync()'s from > the WAL > interfearing with and over-reaching writing out the filesystem buffers. > > Basically, I think i was conflating fsync() with sync(). > > So if it's not that, then that just leaves bandwith (ignoring all of the > other best practice reasons for reliablity, etc.). So, in theory if you're > not swamping your disk I/O then you won't really benefit from relocating your > XLOGs. Disk performance can be a bit more complicated than just "swamping." Even if you're not maxing out the IO bandwidth, you could be getting enough that some writes are waiting on other writes before they can be processed. Consider the fact that old-style ethernet was only able to hit ~80% of its theoretical capacity in the real world, because the chance of collisions increased with the amount of data, and each collision slowed down the overall transfer speed. Contrasted with modern ethernet that doesn't do collisions, you can get much closer to 100% of the rated bandwith because the communications are effectively partitioned from each other. In the worst case scenerion, if two processes (due to horrible luck) _always_ try to write at the same time, the overall responsiveness will be lousy, even if the bandwidth usage is only a small percent of the available. Of course, that worst case doesn't happen in actual practice, but as the usage goes up, the chance of hitting that interference increases, and the effective response goes down, even when there's bandwidth still available. Separate the competing processes, and the chance of conflict is 0. So your responsiveness is pretty much at best-case all the time. > However, I know from experience that's not entirely true, (although it's not > always easy to measure all aspects of your I/O bandwith). > > Am I missing something? -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your I/O bandwith). Am I missing something? Two things I can think of: Transaction writes are entirely sequential. If you have disks assigned for just this purpose, then the heads will always be in the right spot, and the writes go through more quickly. A database server process waits until the transaction logs are written and then returns control to the client. The data writes can be done in the background while the client goes on to do other things. Splitting up data and logs mean that there is less chance the disk controller will cause data writes to interfere with log files. Kind regards, Andomar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?
Howdy All, For a very long time I've held the belief that splitting PGDATA and xlog on linux systems fairly universally gives a decent performance benefit for many common workloads. (i've seen up to 20% personally). I was under the impression that this had to do with regular fsync()'s from the WAL interfearing with and over-reaching writing out the filesystem buffers. Basically, I think i was conflating fsync() with sync(). So if it's not that, then that just leaves bandwith (ignoring all of the other best practice reasons for reliablity, etc.). So, in theory if you're not swamping your disk I/O then you won't really benefit from relocating your XLOGs. However, I know from experience that's not entirely true, (although it's not always easy to measure all aspects of your I/O bandwith). Am I missing something? Thanks -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general