Re: [HACKERS] posix_fadvsise in base backups

2011-09-25 Thread Andres Freund
Hi Greg,

On Sunday, September 25, 2011 03:25:50 AM Greg Stark wrote:
 On Sat, Sep 24, 2011 at 4:16 PM, Magnus Hagander mag...@hagander.net 
wrote:
  I was assuming the kernel was smart enough to read this as *this*
  process is not going to be using this file anymore, not nobody in
  the whole machine is going to use this file anymore. And the process
  running the base backup is certainly not going to read it again.
  
  But that's a good point - do you know if that is the case, or does it
  mandate more testing?
 It's not the case on Linux. I used to use DONTNEED to flush pages from
 cache before running a benchmark. I verified with mincore that the
 pages were actually getting removed from cache. Sometimes there was
 the occasional straggler but nearly all got flushed and after a second
 or third pass the stragglers were gone too.
Not sure what exactly is not the case on Linux. Your answer could be read in 
a way that the fadvise/DONTNEED adheres to some sort of refcounting scheme 
(which it afaik does not) or that it doesn't.

 In case you're wondering, this was because using /proc/.../drop_caches
 caused flaky benchmarks. My theory was that it was causing pages of
 the executable to trigger page faults in the middle of the benchmark.
That should be easily possible to rule out by preloading the 
applications+libraries?
I think there were plans to teach the dynamic linker to enforce doing so, but 
I am not sure they were ever folloowed through.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Magnus Hagander
Attached patch adds a simple call to posix_fadvise with
POSIX_FADV_DONTNEED on all the files being read when doing a base
backup, to help the kernel not to trash the filesystem cache.

Seems like a simple enough fix - in fact, I don't remember why I took
it out of the original patch :O

Any reason not to put this in? Is it even safe enough to put into 9.1
(probably not, but maybe?)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 4841095..54c4d13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -781,6 +781,15 @@ sendFile(char *readfilename, char *tarfilename, struct stat * statbuf)
 		pq_putmessage('d', buf, pad);
 	}
 
+	/*
+	 * If we have posix_fadvise(), send a note to the kernel that we are not
+	 * going to need this data anytime soon, so that it can be discarded
+	 * from the filesystem cache.
+	 */
+#if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
+	(void) posix_fadvise(fileno(fp), 0, 0, POSIX_FADV_DONTNEED);
+#endif
+
 	FreeFile(fp);
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Andres Freund
Hi,

On Saturday, September 24, 2011 05:08:17 PM Magnus Hagander wrote:
 Attached patch adds a simple call to posix_fadvise with
 POSIX_FADV_DONTNEED on all the files being read when doing a base
 backup, to help the kernel not to trash the filesystem cache.
 
 Seems like a simple enough fix - in fact, I don't remember why I took
 it out of the original patch :O
 
 Any reason not to put this in? Is it even safe enough to put into 9.1
 (probably not, but maybe?)
Won't that possibly throw a formerly fully cached database out of the cache?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Magnus Hagander
On Sat, Sep 24, 2011 at 17:14, Andres Freund and...@anarazel.de wrote:
 Hi,

 On Saturday, September 24, 2011 05:08:17 PM Magnus Hagander wrote:
 Attached patch adds a simple call to posix_fadvise with
 POSIX_FADV_DONTNEED on all the files being read when doing a base
 backup, to help the kernel not to trash the filesystem cache.

 Seems like a simple enough fix - in fact, I don't remember why I took
 it out of the original patch :O

 Any reason not to put this in? Is it even safe enough to put into 9.1
 (probably not, but maybe?)
 Won't that possibly throw a formerly fully cached database out of the cache?

I was assuming the kernel was smart enough to read this as *this*
process is not going to be using this file anymore, not nobody in
the whole machine is going to use this file anymore. And the process
running the base backup is certainly not going to read it again.

But that's a good point - do you know if that is the case, or does it
mandate more testing?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Andres Freund
Hi,

On Saturday, September 24, 2011 05:16:48 PM Magnus Hagander wrote:
 On Sat, Sep 24, 2011 at 17:14, Andres Freund and...@anarazel.de wrote:
  On Saturday, September 24, 2011 05:08:17 PM Magnus Hagander wrote:
  Attached patch adds a simple call to posix_fadvise with
  POSIX_FADV_DONTNEED on all the files being read when doing a base
  backup, to help the kernel not to trash the filesystem cache.
  Seems like a simple enough fix - in fact, I don't remember why I took
  it out of the original patch :O
  Any reason not to put this in? Is it even safe enough to put into 9.1
  (probably not, but maybe?)
  Won't that possibly throw a formerly fully cached database out of the
  cache?
 I was assuming the kernel was smart enough to read this as *this*
 process is not going to be using this file anymore, not nobody in
 the whole machine is going to use this file anymore. And the process
 running the base backup is certainly not going to read it again.
 But that's a good point - do you know if that is the case, or does it
 mandate more testing?
I am pretty but not totally sure that the kernel does not track each process 
that uses a page. For one doing so would probably prohibitively expensive. For 
another I am pretty (but not ...) sure that I restructured an application not 
to fadvise(DONTNEED) memory that is also used in other processes.

Currently I can only think of to workarounds, both os specific:
- Use O_DIRECT for reading the base backup. Will be slow in fully cached 
situations, but should work ok enough in all others. Need to be carefull about 
the usual O_DIRECT pitfalls (pagesize, alignment etcetera).
- use mmap/mincore() to gather whether data is in cache and restore that state 
afterwards.

Too bad that POSIX_FADV_NOREUSE is not really implemented.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Cédric Villemain
2011/9/24 Andres Freund and...@anarazel.de:
 Hi,

 On Saturday, September 24, 2011 05:16:48 PM Magnus Hagander wrote:
 On Sat, Sep 24, 2011 at 17:14, Andres Freund and...@anarazel.de wrote:
  On Saturday, September 24, 2011 05:08:17 PM Magnus Hagander wrote:
  Attached patch adds a simple call to posix_fadvise with
  POSIX_FADV_DONTNEED on all the files being read when doing a base
  backup, to help the kernel not to trash the filesystem cache.
  Seems like a simple enough fix - in fact, I don't remember why I took
  it out of the original patch :O
  Any reason not to put this in? Is it even safe enough to put into 9.1
  (probably not, but maybe?)
  Won't that possibly throw a formerly fully cached database out of the
  cache?
 I was assuming the kernel was smart enough to read this as *this*
 process is not going to be using this file anymore, not nobody in
 the whole machine is going to use this file anymore. And the process
 running the base backup is certainly not going to read it again.
 But that's a good point - do you know if that is the case, or does it
 mandate more testing?
 I am pretty but not totally sure that the kernel does not track each process
 that uses a page. For one doing so would probably prohibitively expensive. For
 another I am pretty (but not ...) sure that I restructured an application not
 to fadvise(DONTNEED) memory that is also used in other processes.

DONTNEED will remove pages from cache. It may happens that it doesn't
(DONTNEED, WILLNEED are just flags, but DONTNEED is honored most of
the time)
You can either readahead the mincore status of a page to decide if you
need to remove it after (this is what some modified dd are doing).
You can also use pgfincore to work before/after basebackup to revcover
the previous state of the page cache.
There are some ideas floating around pgfincore to do seqscan (pg_dump)
with less impact on the page cache this way. (probably possible with
ExecStart/Stop hooks)


 Currently I can only think of to workarounds, both os specific:
 - Use O_DIRECT for reading the base backup. Will be slow in fully cached
 situations, but should work ok enough in all others. Need to be carefull about
 the usual O_DIRECT pitfalls (pagesize, alignment etcetera).
 - use mmap/mincore() to gather whether data is in cache and restore that state
 afterwards.

 Too bad that POSIX_FADV_NOREUSE is not really implemented.

yes.



 Andres

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] posix_fadvsise in base backups

2011-09-24 Thread Greg Stark
On Sat, Sep 24, 2011 at 4:16 PM, Magnus Hagander mag...@hagander.net wrote:
 I was assuming the kernel was smart enough to read this as *this*
 process is not going to be using this file anymore, not nobody in
 the whole machine is going to use this file anymore. And the process
 running the base backup is certainly not going to read it again.

 But that's a good point - do you know if that is the case, or does it
 mandate more testing?

It's not the case on Linux. I used to use DONTNEED to flush pages from
cache before running a benchmark. I verified with mincore that the
pages were actually getting removed from cache. Sometimes there was
the occasional straggler but nearly all got flushed and after a second
or third pass the stragglers were gone too.

In case you're wondering, this was because using /proc/.../drop_caches
caused flaky benchmarks. My theory was that it was causing pages of
the executable to trigger page faults in the middle of the benchmark.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers