subject:"\[GENERAL\] Why does splitting $PGDATA and xlog yield a performance benefit\?"

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread Joseph Kregloh

On Tue, Aug 25, 2015 at 4:31 PM, Gavin Flower  wrote:

> On 26/08/15 05:54, David Kerr wrote:
>
>> On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:
>>
>>> However, I know from experience that's not entirely true, (although it's
 not always easy to measure all aspects of your I/O bandwith).

 Am I missing something?

 Two things I can think of:
>>>
>>> Transaction writes are entirely sequential.  If you have disks
>>> assigned for just this purpose, then the heads will always be in the
>>> right spot, and the writes go through more quickly.
>>>
>>> A database server process waits until the transaction logs are
>>> written and then returns control to the client. The data writes can
>>> be done in the background while the client goes on to do other
>>> things.  Splitting up data and logs mean that there is less chance
>>> the disk controller will cause data writes to interfere with log
>>> files.
>>>
>>> Kind regards,
>>> Andomar
>>>
>>> hmm, yeah those are both what I'd lump into "I/O bandwith".
>> If your disk subsystem is fast enough, or you're on a RAIDd SAN
>> or EBS you'd either overcome that, or not neccssarily be able to.
>>
>>
>>
>> Back when I actually understood the various timings of disc accessing on
> a MainFrame system, back in the 1980's (disc layout & accessing, is way
> more complicated now!), I found that there was a considerable difference
> between mainly sequential & mostly random access - easily greater than a
> factor of 5 (from memory) in terms of throughput.
>
> Considering the time to move heads between tracks and rotational latency
> (caused by not reading sequential blocks on the same track).  There are
> other complications, which I have glossed over!
>
>
It can go even further now with the use of SSDs. You can put the xlogs on
an SSD and the rest of the database on a mechanical drive. Same can be said
about partitions, you can place the most accessed partition on an SSD and
the rest of the db on a mechanical drive.

-Joseph Kregloh



>
> Cheers,
> Gavin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread Gavin Flower


On 26/08/15 05:54, David Kerr wrote:

On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:

However, I know from experience that's not entirely true, (although it's not 
always easy to measure all aspects of your I/O bandwith).

Am I missing something?


Two things I can think of:

Transaction writes are entirely sequential.  If you have disks
assigned for just this purpose, then the heads will always be in the
right spot, and the writes go through more quickly.

A database server process waits until the transaction logs are
written and then returns control to the client. The data writes can
be done in the background while the client goes on to do other
things.  Splitting up data and logs mean that there is less chance
the disk controller will cause data writes to interfere with log
files.

Kind regards,
Andomar


hmm, yeah those are both what I'd lump into "I/O bandwith".
If your disk subsystem is fast enough, or you're on a RAIDd SAN
or EBS you'd either overcome that, or not neccssarily be able to.



Back when I actually understood the various timings of disc accessing on 
a MainFrame system, back in the 1980's (disc layout & accessing, is way 
more complicated now!), I found that there was a considerable difference 
between mainly sequential & mostly random access - easily greater than a 
factor of 5 (from memory) in terms of throughput.


Considering the time to move heads between tracks and rotational latency 
(caused by not reading sequential blocks on the same track).  There are 
other complications, which I have glossed over!



Cheers,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread David Kerr


> On Aug 25, 2015, at 10:45 AM, Bill Moran  wrote:
> 
> On Tue, 25 Aug 2015 10:08:48 -0700
> David Kerr  wrote:
> 
>> Howdy All,
>> 
>> For a very long time I've held the belief that splitting PGDATA and xlog on 
>> linux systems fairly universally gives a decent performance benefit for many 
>> common workloads.
>> (i've seen up to 20% personally).
>> 
>> I was under the impression that this had to do with regular fsync()'s from 
>> the WAL 
>> interfearing with and over-reaching writing out the filesystem buffers. 
>> 
>> Basically, I think i was conflating fsync() with sync(). 
>> 
>> So if it's not that, then that just leaves bandwith (ignoring all of the 
>> other best practice reasons for reliablity, etc.). So, in theory if you're 
>> not swamping your disk I/O then you won't really benefit from relocating 
>> your XLOGs.
> 
> Disk performance can be a bit more complicated than just "swamping." Even if

Funny, on revision of my question, I left out basically that exact line for 
simplicity sake. =)

> you're not maxing out the IO bandwidth, you could be getting enough that some
> writes are waiting on other writes before they can be processed. Consider the
> fact that old-style ethernet was only able to hit ~80% of its theoretical
> capacity in the real world, because the chance of collisions increased with
> the amount of data, and each collision slowed down the overall transfer speed.
> Contrasted with modern ethernet that doesn't do collisions, you can get much
> closer to 100% of the rated bandwith because the communications are 
> effectively
> partitioned from each other.
> 
> In the worst case scenerion, if two processes (due to horrible luck) _always_
> try to write at the same time, the overall responsiveness will be lousy, even
> if the bandwidth usage is only a small percent of the available. Of course,
> that worst case doesn't happen in actual practice, but as the usage goes up,
> the chance of hitting that interference increases, and the effective response
> goes down, even when there's bandwidth still available.
> 
> Separate the competing processes, and the chance of conflict is 0. So your
> responsiveness is pretty much at best-case all the time.

Understood. Now in my previous delve into this issue, I showed minimal/no disk 
queuing, the SAN showed nothing on it's queues and no retries. (of course 
#NeverTrustTheSANGuy) but I still yielded a 20% performance increase by 
splitting the WAL and $PGDATA

But that's besides the point and my data on that environment is long gone.

I'm content to leave this at "I/O is complicated" I just wanted to make sure 
that i wasn't correct but for a slightly wrong reason.

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread David Kerr

On Tue, Aug 25, 2015 at 10:16:37AM PDT, Andomar wrote:
> >However, I know from experience that's not entirely true, (although it's not 
> >always easy to measure all aspects of your I/O bandwith).
> >
> >Am I missing something?
> >
> Two things I can think of:
> 
> Transaction writes are entirely sequential.  If you have disks
> assigned for just this purpose, then the heads will always be in the
> right spot, and the writes go through more quickly.
> 
> A database server process waits until the transaction logs are
> written and then returns control to the client. The data writes can
> be done in the background while the client goes on to do other
> things.  Splitting up data and logs mean that there is less chance
> the disk controller will cause data writes to interfere with log
> files.
> 
> Kind regards,
> Andomar
> 

hmm, yeah those are both what I'd lump into "I/O bandwith". 
If your disk subsystem is fast enough, or you're on a RAIDd SAN 
or EBS you'd either overcome that, or not neccssarily be able to.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread Bill Moran

On Tue, 25 Aug 2015 10:08:48 -0700
David Kerr  wrote:

> Howdy All,
> 
> For a very long time I've held the belief that splitting PGDATA and xlog on 
> linux systems fairly universally gives a decent performance benefit for many 
> common workloads.
> (i've seen up to 20% personally).
> 
> I was under the impression that this had to do with regular fsync()'s from 
> the WAL 
> interfearing with and over-reaching writing out the filesystem buffers. 
> 
> Basically, I think i was conflating fsync() with sync(). 
> 
> So if it's not that, then that just leaves bandwith (ignoring all of the 
> other best practice reasons for reliablity, etc.). So, in theory if you're 
> not swamping your disk I/O then you won't really benefit from relocating your 
> XLOGs.

Disk performance can be a bit more complicated than just "swamping." Even if
you're not maxing out the IO bandwidth, you could be getting enough that some
writes are waiting on other writes before they can be processed. Consider the
fact that old-style ethernet was only able to hit ~80% of its theoretical
capacity in the real world, because the chance of collisions increased with
the amount of data, and each collision slowed down the overall transfer speed.
Contrasted with modern ethernet that doesn't do collisions, you can get much
closer to 100% of the rated bandwith because the communications are effectively
partitioned from each other.

In the worst case scenerion, if two processes (due to horrible luck) _always_
try to write at the same time, the overall responsiveness will be lousy, even
if the bandwidth usage is only a small percent of the available. Of course,
that worst case doesn't happen in actual practice, but as the usage goes up,
the chance of hitting that interference increases, and the effective response
goes down, even when there's bandwidth still available.

Separate the competing processes, and the chance of conflict is 0. So your
responsiveness is pretty much at best-case all the time.

> However, I know from experience that's not entirely true, (although it's not 
> always easy to measure all aspects of your I/O bandwith).
> 
> Am I missing something?

-- 
Bill Moran

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread Andomar


However, I know from experience that's not entirely true, (although it's not 
always easy to measure all aspects of your I/O bandwith).

Am I missing something?


Two things I can think of:

Transaction writes are entirely sequential.  If you have disks assigned 
for just this purpose, then the heads will always be in the right spot, 
and the writes go through more quickly.


A database server process waits until the transaction logs are written 
and then returns control to the client. The data writes can be done in 
the background while the client goes on to do other things.  Splitting 
up data and logs mean that there is less chance the disk controller will 
cause data writes to interfere with log files.


Kind regards,
Andomar


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread David Kerr

Howdy All,

For a very long time I've held the belief that splitting PGDATA and xlog on 
linux systems fairly universally gives a decent performance benefit for many 
common workloads.
(i've seen up to 20% personally).

I was under the impression that this had to do with regular fsync()'s from the 
WAL 
interfearing with and over-reaching writing out the filesystem buffers. 

Basically, I think i was conflating fsync() with sync(). 

So if it's not that, then that just leaves bandwith (ignoring all of the other 
best practice reasons for reliablity, etc.). So, in theory if you're not 
swamping your disk I/O then you won't really benefit from relocating your XLOGs.

However, I know from experience that's not entirely true, (although it's not 
always easy to measure all aspects of your I/O bandwith).

Am I missing something?

Thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

[GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

7 matches

Site Navigation

Mail list logo

Footer information