Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-13 Thread Jeff Janes
On Thu, Jan 12, 2012 at 7:26 PM, Greg Smith g...@2ndquadrant.com wrote: On 1/11/12 9:25 AM, Andres Freund wrote: The heavy pressure putting it directly in the writeback queue leads to less efficient io because quite often it won't reorder sensibly with other io anymore and thelike. At least

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-12 Thread Greg Smith
On 1/11/12 9:25 AM, Andres Freund wrote: The heavy pressure putting it directly in the writeback queue leads to less efficient io because quite often it won't reorder sensibly with other io anymore and thelike. At least that was my experience in using it with in another application. Sure, this

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Simon Riggs
On Wed, Jan 11, 2012 at 4:38 AM, Greg Smith g...@2ndquadrant.com wrote: On 1/10/12 9:14 PM, Robert Haas wrote: Based on that, I whipped up the attached patch, which, if sync_file_range is available, simply iterates through everything that will eventually be fsync'd before beginning the write

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Florian Weimer
* Greg Smith: One idea I was thinking about here was building a little hash table inside of the fsync absorb code, tracking how many absorb operations have happened for whatever the most popular relation files are. The idea is that we might say use sync_file_range every time N calls for a

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Simon Riggs
On Wed, Jan 11, 2012 at 9:28 AM, Simon Riggs si...@2ndquadrant.com wrote: It does open the door to various other uses, so I think this work will be useful. Yes, I think this would allow a better design for the checkpointer. Checkpoint scan will collect buffers to write for checkpoint and sort

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Andres Freund
On Wednesday, January 11, 2012 03:14:31 AM Robert Haas wrote: Greg Smith muttered a while ago about wanting to do something with sync_file_range to improve checkpoint behavior on Linux. I thought he was talking about trying to sync only the range of blocks known to be dirty, which didn't seem

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Andres Freund
On Wednesday, January 11, 2012 10:28:11 AM Simon Riggs wrote: On Wed, Jan 11, 2012 at 4:38 AM, Greg Smith g...@2ndquadrant.com wrote: On 1/10/12 9:14 PM, Robert Haas wrote: Based on that, I whipped up the attached patch, which, if sync_file_range is available, simply iterates through

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Andres Freund
On Wednesday, January 11, 2012 10:33:47 AM Florian Weimer wrote: * Greg Smith: One idea I was thinking about here was building a little hash table inside of the fsync absorb code, tracking how many absorb operations have happened for whatever the most popular relation files are. The idea

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Robert Haas
On Tue, Jan 10, 2012 at 11:38 PM, Greg Smith g...@2ndquadrant.com wrote: What you're doing here doesn't care though, and I hadn't considered that SYNC_FILE_RANGE_WRITE could be used that way on my last pass through its docs.  Used this way, it's basically fsync without the wait or guarantee; it

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Greg Smith
On 1/11/12 4:33 AM, Florian Weimer wrote: Isn't this pretty much like tuning vm.dirty_bytes? We generally set it to pretty low values, and seems to help to smoothen the checkpoints. When I experimented with dropping the actual size of the cache, checkpoint spikes improved, but things like

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Greg Smith
On 1/11/12 7:46 AM, Andres Freund wrote: I played around with this before and my problem was that sync_file_range is not really a hint. It actually starts writeback *directly* and only returns when the io is placed inside the queue (at least thats the way it was back then). Which very quickly

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-11 Thread Andres Freund
On Wednesday, January 11, 2012 03:20:09 PM Greg Smith wrote: On 1/11/12 7:46 AM, Andres Freund wrote: I played around with this before and my problem was that sync_file_range is not really a hint. It actually starts writeback *directly* and only returns when the io is placed inside the

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-10 Thread Greg Smith
On 1/10/12 9:14 PM, Robert Haas wrote: Based on that, I whipped up the attached patch, which, if sync_file_range is available, simply iterates through everything that will eventually be fsync'd before beginning the write phase and tells the Linux kernel to put them all under write-out. I