Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-12-12 Thread Amit Kapila
On Wednesday, December 12, 2012 5:23 AM Greg Smith wrote:
 On 11/23/12 5:57 AM, Amit kapila wrote:
  Let us try to see by example:
  Total RAM - 22G
  Database size - 16G
 ...
  Case -2 (Shared Buffers - 10G)
  a. Load all the files in OS buffers. In best case OS buffers can
 contain10-12G data as OS has 12G of memory available.
  b. Try to load all in Shared buffers. Last 10G will be there in shared
 buffers.
  c. Now as there is no direct correlation of data between Shared
 Buffers and OS buffers, so whenever PG has to access any data
  which is not there in Shared Buffers, good chances are there that
 it can lead to IO.
 
 I don't think either of these examples are very representative of
 real-world behavior.  The idea of load all the files in OS buffers
 assumes someone has used a utility like pg_prewarm or pgfincore.  It's
 not something that happens in normal use.  Being able to re-populate all
 of RAM using those utilities isn't realistic anyway.  Anyone who tries
 to load more than (memory - shared_buffers) that way is likely to be
 disappointed by the result.

True, I also think nobody will directly try to do it in this way, but such
similar situations can arise after long run.
Something like if we assume most used pages fall under the range of RAM.

 
 Similarly, the problem you're describing here has been described as the
 double buffering one for a while now.  The old suggestion that
 shared_buffers not be set about 25% of RAM comes from this sort of
 concern.  If triggering a problem requires doing that, essentially
 misconfiguring the server, it's hard to get too excited about it.
 
 Anyway, none of that impacts on me mixing testing for this into what I'm
 working on.  The way most pgbench tests happen, it's hard to *not* have
 the important data in cache.  Once you run the init step, you have to
 either reboot or drop the OS cache to get those pages out of RAM.  That
 means the sort of cached setup you're using pg_prewarm to
 simulate--things are in the OS cache, but not the PostgreSQL one--is one
 that anyone running an init/test pair will often create.  You don't need
 pg_prewarm to do it.  

The way, I have ran the tests is to just try to simulate scenario's where
invalidating buffers 
by bgwriter/checkpoint can have advantage.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-12-11 Thread Greg Smith

On 11/23/12 5:57 AM, Amit kapila wrote:

Let us try to see by example:
Total RAM - 22G
Database size - 16G
...
Case -2 (Shared Buffers - 10G)
a. Load all the files in OS buffers. In best case OS buffers can contain10-12G 
data as OS has 12G of memory available.
b. Try to load all in Shared buffers. Last 10G will be there in shared buffers.
c. Now as there is no direct correlation of data between Shared Buffers and OS 
buffers, so whenever PG has to access any data
which is not there in Shared Buffers, good chances are there that it can 
lead to IO.


I don't think either of these examples are very representative of 
real-world behavior.  The idea of load all the files in OS buffers 
assumes someone has used a utility like pg_prewarm or pgfincore.  It's 
not something that happens in normal use.  Being able to re-populate all 
of RAM using those utilities isn't realistic anyway.  Anyone who tries 
to load more than (memory - shared_buffers) that way is likely to be 
disappointed by the result.


Similarly, the problem you're describing here has been described as the 
double buffering one for a while now.  The old suggestion that 
shared_buffers not be set about 25% of RAM comes from this sort of 
concern.  If triggering a problem requires doing that, essentially 
misconfiguring the server, it's hard to get too excited about it.


Anyway, none of that impacts on me mixing testing for this into what I'm 
working on.  The way most pgbench tests happen, it's hard to *not* have 
the important data in cache.  Once you run the init step, you have to 
either reboot or drop the OS cache to get those pages out of RAM.  That 
means the sort of cached setup you're using pg_prewarm to 
simulate--things are in the OS cache, but not the PostgreSQL one--is one 
that anyone running an init/test pair will often create.  You don't need 
pg_prewarm to do it.  If you initialize the database, then restart the 
server to clear shared_buffers, the result will be similar to what 
you're doing.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit Kapila
Shouldn't that data be in the shared buffers if not the OS cache and hence
approximately same IO will be required?

 

I don't think so as the data in OS cache or PG Shared buffers doesn't have
any direct relation, OS can flush its buffers based on its scheduler
algorithm.

 

Let us try to see by example:

Total RAM - 22G

Database size - 16G

 

Case -1 (Shared Buffers - 5G)

a. Load all the files in OS buffers. Chances are good that all 16G data
will be there in OS buffers as OS has still 17G of memory available.

b. Try to load all in Shared buffers. Last 5G will be there in shared
buffers.

c. Chances are high that remaining 11G buffers access will not lead to IO
as they will be in OS buffers.

 

Case -2 (Shared Buffers - 10G)

a. Load all the files in OS buffers. In best case OS buffers can
contain10-12G data as OS has 12G of memory available.

b. Try to load all in Shared buffers. Last 10G will be there in shared
buffers.

c. Now as there is no direct correlation of data between Shared Buffers and
OS buffers, so whenever PG has to access any data

   which is not there in Shared Buffers, good chances are there that it can
lead to IO.

 

 

 Again, the drop in the performance is so severe that it seems worth
investigating that further, especially because you can reproduce it
reliably.

 

   Yes, I agree that it is worth investigating, but IMO this is a different
problem which might not be addressed with the Patch in discussion. 

The 2 reasons I can think for dip in performance when Shared Buffers
increase beyond certain  threshhold percentage of RAM are, 

 a. either the algorithm of Buffer Management has some bottleneck

   b. due to the way data is managed in Shared Buffers and OS buffer cache

 

The point I want to tell is explained at below link as well.

http://blog.kimiensoftware.com/2011/05/postgresql-vs-oracle-differences-4-sh
ared-memory-usage-257

 

So if above is true, I think the performance will regain if in the test
Shared Buffers are set to 16G. I shall once try that setting for test run.

 

With Regards,

Amit Kapila.

 



Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit kapila
On Friday, November 23, 2012 11:15 AM Pavan Deolasee wrote:
On Thu, Nov 22, 2012 at 2:05 PM, Amit Kapila amit.kap...@huawei.com wrote:



Sorry, I haven't followed this thread at all, but the numbers (43171 and 
57920) in the last two runs of @mv-free-list for 32 clients look 
aberrations, no ?  I wonder if
that's skewing the average.

Yes, that is one of the main reasons, but in all runs this is consistent that 
for 32 clients or above this kind of numbers  are observed.
Even Jeff has pointed the similar thing in one of his mails and suggested to 
run the tests such that first test should run “with patch” and then “without 
patch”. 
After doing what he suggested the observations are still similar.


Are we convinced that the jump that we are seeing is a real one then ? 

  Still not convinced, as the data has been collected in only my setup. 

I'm a bit surprised because it happens only with the patch and only for 32 
clients. How would you explain that ?

The reason this patch can improve performance is due to reduce contention for 
BufFreeListLock and PartitionLock (which it takes in BufferAlloc a. to remove 
old page from buffer or b. to see if block is already in buffer pool) in 
backends. As the number of backends increase the chances of improved 
performance is much better. In particular for 32 clients when tests run for 
longer time results are not that skewed.

For 32 clients, as mentioned in previous mail when the test has ran for 1 hr, 
the differrence is not very skewed.

 32 client /32 thread for 1 hour 
@mv-free-lst@9.3devl 
Single-run:9842.019229  8050.357981 

 I also looked at the the Results.htm file down thread. There seem to be a 
 steep degradation when the shared buffers are increased from 5GB to 10GB, 
 both with and 
 without the patch. Is that expected ? If so, isn't that worth investigating 
 and possibly even fixing before we do anything else ?

 The reason for decrease in performance is that when shared buffers are 
 increased from 5GB to 10GB, the I/O starts as after increasing it cannot 
 hold all
 the data in OS buffers.


Shouldn't that data be in the shared buffers if not the OS cache and hence 
approximately same IO will be required?

I don't think so as the data in OS cache or PG Shared buffers doesn't have any 
direct relation, OS can flush its buffers based on its scheduler algorithm.

Let us try to see by example:
Total RAM - 22G
Database size - 16G

Case -1 (Shared Buffers - 5G)
a. Load all the files in OS buffers. Chances are good that all 16G data will be 
there in OS buffers as OS has still 17G of memory available.
b. Try to load all in Shared buffers. Last 5G will be there in shared buffers.
c. Chances are high that remaining 11G buffers access will not lead to IO as 
they will be in OS buffers.

Case -2 (Shared Buffers - 10G)
a. Load all the files in OS buffers. In best case OS buffers can contain10-12G 
data as OS has 12G of memory available.
b. Try to load all in Shared buffers. Last 10G will be there in shared buffers.
c. Now as there is no direct correlation of data between Shared Buffers and OS 
buffers, so whenever PG has to access any data
   which is not there in Shared Buffers, good chances are there that it can 
lead to IO.


 Again, the drop in the performance is so severe that it seems worth 
 investigating that further, especially because you can reproduce it reliably.

   Yes, I agree that it is worth investigating, but IMO this is a different 
problem which might not be addressed with the Patch in discussion. 
The 2 reasons I can think for dip in performance when Shared Buffers 
increase beyond certain threshhold percentage of RAM are, 
   a. either the algorithm of Buffer Management has some bottleneck
   b. due to the way data is managed in Shared Buffers and OS buffer cache

Any Suggestion/Comments?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-22 Thread Amit Kapila
 

From: Pavan Deolasee [mailto:pavan.deola...@gmail.com] 
Sent: Thursday, November 22, 2012 12:26 PM
To: Amit kapila
Cc: Jeff Janes; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer
Management

 

 

 

On Mon, Nov 19, 2012 at 8:52 PM, Amit kapila amit.kap...@huawei.com wrote:

On Monday, November 19, 2012 5:53 AM Jeff Janes wrote:
On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com
wrote:
 On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:

Run the modes in reciprocating order?
 Sorry, I didn't understood this, What do you mean by modes in
reciprocating order?

 Sorry for the long delay.  In your scripts, it looks like you always
 run the unpatched first, and then the patched second.

   Yes, thats true.


 By reciprocating, I mean to run them in the reverse order, or in random
order.

Today for some configurations, I have ran by reciprocating the order.
Below are readings:
Configuration
16GB (Database) -7GB (Shared Buffers)

Here i had run in following order
1. Run perf report with patch for 32 client
2. Run perf report without patch for 32 client
3. Run perf report with patch for 16 client
4. Run perf report without patch for 16 client

Each execution is 5 minutes,
16 client /16 thread|   32 client /32 thread
   @mv-free-lst @9.3devl|  @mv-free-lst @9.3devl
---
  36694056|   53565258
  39874121|   46255185
  48404574|   45026796
  64656932|   45588233
  69667222|   49558237
  75517219|   91158269
  83157168|   431718340
  91027136|   579208349
---
  63626054|   167757333

 

Sorry, I haven't followed this thread at all, but the numbers (43171 and
57920) in the last two runs of @mv-free-list for 32 clients look
aberrations, no ?  I wonder if that's skewing the average.

 

Yes, that is one of the main reasons, but in all runs this is consistent
that for 32 clients or above this kind of numbers  are observed.

Even Jeff has pointed the similar thing in one of his mails and suggested to
run the tests such that first test should run with patch and then without
patch. 

After doing what he suggested the observations are still similar.

 

 

 I also looked at the the Results.htm file down thread. There seem to be a
steep degradation when the shared buffers are increased from 5GB to 10GB,
both with and 

 without the patch. Is that expected ? If so, isn't that worth
investigating and possibly even fixing before we do anything else ?

 

The reason for decrease in performance is that when shared buffers are
increased from 5GB to 10GB, the I/O starts as after increasing it cannot
hold all

the data in OS buffers.

 

With Regards,

Amit Kapila



Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-22 Thread Pavan Deolasee
On Thu, Nov 22, 2012 at 2:05 PM, Amit Kapila amit.kap...@huawei.com wrote:

 ** **


 ** **

 Sorry, I haven't followed this thread at all, but the numbers (43171 and
 57920) in the last two runs of @mv-free-list for 32 clients look
 aberrations, no ?  I wonder if **that's skewing the average.

 ** **

 Yes, that is one of the main reasons, but in all runs this is consistent
 that for 32 clients or above this kind of numbers  are observed.

 Even Jeff has pointed the similar thing in one of his mails and suggested
 to run the tests such that first test should run “with patch” and then
 “without patch”. 

 After doing what he suggested the observations are still similar.

 **


Are we convinced that the jump that we are seeing is a real one then ? I'm
a bit surprised because it happens only with the patch and only for 32
clients. How would you explain that ?



 **

 ** **

  I also looked at the the Results.htm file down thread. There seem to be
 a steep degradation when the shared buffers are increased from 5GB to 10GB,
 both with and 

  without the patch. Is that expected ? If so, isn't that worth
 investigating and possibly even fixing before we do anything else ?

 ** **

 The reason for decrease in performance is that when shared buffers are
 increased from 5GB to 10GB, the I/O starts as after increasing it cannot
 hold all

 the data in OS buffers.


Shouldn't that data be in the shared buffers if not the OS cache and hence
approximately same IO will be required ? Again, the drop in the performance
is so severe that it seems worth investigating that further, especially
because you can reproduce it reliably.

Thanks,
Pavan


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-21 Thread Pavan Deolasee
On Mon, Nov 19, 2012 at 8:52 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Monday, November 19, 2012 5:53 AM Jeff Janes wrote:
 On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com
 wrote:
  On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:
 
 Run the modes in reciprocating order?
  Sorry, I didn't understood this, What do you mean by modes in
 reciprocating order?

  Sorry for the long delay.  In your scripts, it looks like you always
  run the unpatched first, and then the patched second.

Yes, thats true.

  By reciprocating, I mean to run them in the reverse order, or in random
 order.

 Today for some configurations, I have ran by reciprocating the order.
 Below are readings:
 Configuration
 16GB (Database) -7GB (Shared Buffers)

 Here i had run in following order
 1. Run perf report with patch for 32 client
 2. Run perf report without patch for 32 client
 3. Run perf report with patch for 16 client
 4. Run perf report without patch for 16 client

 Each execution is 5 minutes,
 16 client /16 thread|   32 client /32 thread
@mv-free-lst @9.3devl|  @mv-free-lst @9.3devl
 ---
   36694056|   53565258
   39874121|   46255185
   48404574|   45026796
   64656932|   45588233
   69667222|   49558237
   75517219|   91158269
   83157168|   431718340
   91027136|   579208349
 ---
   63626054|   167757333


Sorry, I haven't followed this thread at all, but the numbers (43171 and
57920) in the last two runs of @mv-free-list for 32 clients look
aberrations, no ?  I wonder if that's skewing the average.

I also looked at the the Results.htm file down thread. There seem to be a
steep degradation when the shared buffers are increased from 5GB to 10GB,
both with and without the patch. Is that expected ? If so, isn't that worth
investigating and possibly even fixing before we do anything else ?

Thanks,
Pavan


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-19 Thread Amit kapila
On Monday, November 19, 2012 6:05 AM Jeff Janes  wrote:
On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote:


 Today again I have again collected the data for configuration Shared_buffers 
 = 7G along with vmstat.
 The data and vmstat information (bi) are attached with this mail. It is 
 observed from vmstat info that I/O is happening for both cases, however 
 after running for
 long time, the I/O is also comparatively less with new patch.

What I see in the vmstat report is that it takes 5.5 runs to get
really good and warmed up, and so it crawls for the first 5.5
benchmarks and then flies for the last 0.5 benchmark.  The way you
have your runs ordered, that last 0.5 of a benchmark is for the
patched version, and this drives up the average tps for the patched
case.


 Also, there is no theoretical reason to think that your patch would
 decrease the amount of IO needed (in fact, by invalidating buffers
 early, it could be expected to increase the amount of IO).  So this
 also argues that the increase in performance is caused by the decrease
 in IO, but the patch isn't causing that decrease, it merely benefits
 from it due to an accident of timing.

Today, I have ran in the opposite order, still I see for some readings the 
similar observation.
I am also not sure of IO part, just based on data I was trying to interpret 
that way. However
may be for some particular scenario, due to OS buffer management it behaves 
that way.
As I am not aware of OS buffer management algorithm, so it's difficult to say 
that such a change would have any impact on OS buffer management
which can yield better performance.

With Regards,
Amit Kapila.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-18 Thread Jeff Janes
On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:

Run the modes in reciprocating order?
 Sorry, I didn't understood this, What do you mean by modes in reciprocating 
 order?

Sorry for the long delay.  In your scripts, it looks like you always
run the unpatched first, and then the patched second.

By reciprocating, I mean to run them in the reverse order, or in random order.

Also, for the select only transactions, I think that 20 minutes is
much longer than necessary.  I'd rather see many more runs, each one
being shorter.

Because you can't restart the server without wiping out the
shared_buffers, what I would do is make a test patch which introduces
a new guc.c setting which allows the behavior to be turned on and off
with a SIGHUP (pg_ctl reload).



I haven't been able to detect any reliable difference in performance
with this patch.  I've been testing with 150 scale factor with 4GB of
ram and 4 cores, over a variety of shared_buffers and concurrencies.

 I think the main reason for this is that when shared buffers are less, then 
 there is no performance gain,
 even the same is observed by me when I ran this test with shared buffers=2G, 
 there is no performance gain.
 Please see the results of shared buffers=2G in below mail:
 http://archives.postgresql.org/pgsql-hackers/2012-09/msg00422.php

True, but I think that testing with shared_buffers=2G when RAM is 4GB
(and pgbench scale is also lower) should behave different than doing
so when RAM is 24 GB.


 The reason I can think of is because when shared buffers are less then clock 
 sweep runs very fast and there is no bottleneck.
 Only when shared buffers increase above some threshhold, it spends reasonable 
 time in clock sweep.

I am rather skeptical of this.  When the work set doesn't fit in
memory under a select-only workload, then about half the buffers will
be evictable at any given time, and half will have usagecount=1, and a
handful will usagecount=4 (index meta, root and branch blocks).  This
will be the case over a wide range of shared_buffers, as long as it is
big enough to hold all index branch blocks but not big enough to hold
everything.  Given this state of affairs, the average clock sweep
should be about 2, regardless of the exact size of shared_buffers.

The one wrinkle I could think of is if all the usagecount=1 buffers
are grouped into a continuous chunk, and all the usagecount=0 are in
another chunk.  The average would still be 2, but the average would be
made up of N/2 runs of length 1, followed by one run of length N/2.
Now if 1 process is stuck in the N/2 stretch and all other processes
are waiting on that, maybe that somehow escalates the waits so that
they are larger when N is larger, but I still don't see how the math
works on that.

Are you working on this just because it was on the ToDo List, or
because you have actually run into a problem with it?  I've never seen
freelist lock contention be a problem on machines with less than 8
CPU, but both of us are testing on smaller machines.  I think we
really need to test this on something bigger.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-18 Thread Jeff Janes
On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote:


 Today again I have again collected the data for configuration Shared_buffers 
 = 7G along with vmstat.
 The data and vmstat information (bi) are attached with this mail. It is 
 observed from vmstat info that I/O is happening for both cases, however after 
 running for
 long time, the I/O is also comparatively less with new patch.

What I see in the vmstat report is that it takes 5.5 runs to get
really good and warmed up, and so it crawls for the first 5.5
benchmarks and then flies for the last 0.5 benchmark.  The way you
have your runs ordered, that last 0.5 of a benchmark is for the
patched version, and this drives up the average tps for the patched
case.

Also, there is no theoretical reason to think that your patch would
decrease the amount of IO needed (in fact, by invalidating buffers
early, it could be expected to increase the amount of IO).  So this
also argues that the increase in performance is caused by the decrease
in IO, but the patch isn't causing that decrease, it merely benefits
from it due to an accident of timing.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-22 Thread Amit Kapila
On Saturday, October 20, 2012 11:07 PM  Jeff Janes wrote:
 On Fri, Oct 19, 2012 at 11:00 PM, Amit kapila amit.kap...@huawei.com
 wrote:
 
  Robert wrote an accounting patch a while ago that tallied how often a
  buffer was cleaned but then reclaimed for the same page before being
  evicted.  But now I can't find it.  If you can find that thread,
 there
  might be some benchmarks posted to it that would be useful.
 
  In my first level search, I am also not able to find it. But now I am
 planning to check all
  mails of Robert Haas on PostgreSQL site (which are approximately
 13,000).
  If you can tell me how long ago approximately (last year, 2 yrs back,
 ..) or whether such a patch is submitted
  to any CF or was just discussed in mail chain, then it will be little
 easier for me.
 
 It was just an instrumentation patch for doing experiments, not
 intended for commit.
 
 I've tracked it down to the thread Initial 9.2 pgbench write
 results.  But I don't think it applies to the -S benchmark, because
 it records when the background writer cleaned a buffer by finding it
 dirty and writing it out to make it clean, while in this situation we
 would need something more like either made the buffer clean and
 reusable, observed the buffer to already be clean and reusable

Do you think an instrumentation patch which can give us how many times a
buffer is found by Clock Sweep and how many times it's found from freelist
will be useful?
I have written something on similar lines when I was testing this patch to
find out how many times this patch can avoid clock sweep.
My observation was that although the new implementation saves many cycles of
clock sweep, but still with shared buffers upto 2,2.5G there is no visible
performance gain.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-21 Thread Amit kapila
On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:
On Fri, Sep 7, 2012 at 6:14 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Thursday, September 06, 2012 2:38 PM Amit kapila wrote:
 On Tuesday, September 04, 2012 6:55 PM Amit kapila wrote:
 On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
 On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list

 The results for the updated code is attached with this mail.
 The scenario is same as in original mail.
1. Load all the files in to OS buffers (using pg_prewarm with 'read' 
 operation) of all tables and indexes.
2. Try to load all buffers with pgbench_accounts table and 
 pgbench_accounts_pkey pages (using pg_prewarm with 'buffers' operation).
3. Run the pgbench with select only for 20 minutes.

 Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

 Server Configuration:
shared_buffers = 5GB (1/4 th of RAM size)
Total data size = 16GB
 Pgbench configuration:
transaction type: SELECT only
scaling factor: 1200
query mode: simple
number of clients: varying from 8 to 64 
number of threads: varying from 8 to 64 
duration: 1200 s

 I shall take further readings for following configurations and post the 
 same:
 1. The intention for taking with below configuration is that, with the 
 defined testcase, there will be some cases where I/O can happen. So I 
 wanted to check the
 impact of it.

 Shared_buffers - 7 GB
 number of clients: varying from 8 to 64 
 number of threads: varying from 8 to 64 
 transaction type: SELECT only

 The data for shared_buffers = 7GB is attached with this mail. I have also 
 attached scripts used to take this data.

 Is this result reproducible?  Did you monitor IO (with something like
vmstat) to make sure there was no IO going on during the runs?  

Yes, I have reproduced it 2 times. However I shall reproduce once more and use 
vmstat as well. 
I have not observed with vmstat but it is observable in the data.
When I have kept shared buffers = 5G, the tps is more and when I increased it 
to 7G, the tps is reduced which shows there is some I/O started happening.
When I increased to 10G, the tps reduced drastically which shows there is lot 
of I/O. Tommorow I will post 10G shared buffers data as well.

Run the modes in reciprocating order?
Sorry, I didn't understood this, What do you mean by modes in reciprocating 
order?

 If you have 7GB of shared_buffers and 16GB of database, that comes out
 to 23GB of data to be held in 24GB of RAM.  In my experience it is
 hard to get that much data cached by simple prewarm. the newer data
 will drive out the older data even if technically there is room.  So
 then when you start running the benchmark, you still have to read in
 some of the data which dramatically slows down the benchmark.

Yes with 7G, the chances of doing I/O is high but with 5G, chances are less 
which is observed in the data as well(TPS in 7G data is less than in 5G).
Please see the results of 5G shared buffers in mail below:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00318.php 

In 7G case, you can see in the data that without this patch, the tps with 
original code is quite less as compare to 5G data.
I am sorry, there is one typo error in 7G shared buffers data, it is mentioned 
wrongly 5G in heading of data.

I haven't been able to detect any reliable difference in performance
with this patch.  I've been testing with 150 scale factor with 4GB of
ram and 4 cores, over a variety of shared_buffers and concurrencies.

I think the main reason for this is that when shared buffers are less, then 
there is no performance gain,
even the same is observed by me when I ran this test with shared buffers=2G, 
there is no performance gain.
Please see the results of shared buffers=2G in below mail:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00422.php

The reason I can think of is because when shared buffers are less then clock 
sweep runs very fast and there is no bottleneck.
Only when shared buffers increase above some threshhold, it spends reasonable 
time in clock sweep. 

I shall once run with the same configuration as mentioned by you, but I think 
it will not give any performance gain due to reason mentioned above.
Is it feasible for you to run with higher shared buffers and also somewhat 
large data and RAM.
Basically I want to know if you can mimic the situation mentioned by tests I 
have posted. In anycase I shall run the tests once again and post the data.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-20 Thread Amit kapila
On Friday, October 19, 2012 9:15 PM Jeff Janes wrote:
On Tue, Sep 4, 2012 at 6:25 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
 On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list



 I have tried implementing it and taken the readings for Select when all the
 data is in either OS buffers

 or Shared Buffers.



 As I understood and anlyzed based on above, that there is problem in 
 attached patch such that in function
 InvalidateBuffer(), after UnlockBufHdr() and before PartitionLock if some 
 backend uses that buffer and increase the usage count to 1, still
 InvalidateBuffer() will remove the buffer from hash table and put it in 
 Freelist.
 I have modified the code to address above by checking refcount  usage_count 
  inside Partition Lock
 , LockBufHdr and only after that move it to freelist which is similar to 
 InvalidateBuffer.
 In actual code we can optimize the current code by using extra parameter in 
 InvalidateBuffer.

 Please let me know if I understood you correctly or you want to say 
 something else by above comment?

 Yes, I think that this is part of the risk I was hinting at.  I
 haven't evaluated your fix to it.  But assuming it is now safe, I
 still think it is a bad idea to invalidate a perfectly good buffer.
 Now a process that wants that page will have to read it in again, even
 though it is still sitting there.  This is particularly bad because
 the background writer is coded to always circle the buffer pool every
 2 minutes, whether that many clean buffers are needed or not.  I think
 that that is a bad idea, but having it invalidate buffers as it goes
 is even worse.

That is true, but is it not the case of low activity, and in general BGwriter 
takes into account how many buffers alloced and clock swipe completed passes to 
make sure it cleans the buffers appropriately.
One more doubt I have is whether this behavior (circle the buffer pool every 2 
minutes) can't be controlled by 'bgwriter_lru_maxpages' as this number can 
dictate how much buffers to clean in each cycle.

 I think the code for the free-list linked list is written so that it
 performs correctly for a valid buffer to be on the freelist, even
 though that does not happen under current implementations. 

 If you
 find that a buffer on the freelist has become pinned, used, or dirty
 since it was added (which can only happen if it is still valid), you
 just remove it and try again.

Is it  actually possible in any usecase, that buffer mgmt algorithm can find 
any buffer on freelist which is pinned or is dirty?


 Also, do we want to actually invalidate the buffers?  If someone does
 happen to want one after it is put on the freelist, making it read it
 in again into a different buffer doesn't seem like a nice thing to do,
 rather than just letting it reclaim it.

 But even if bgwriter/checkpoint don't do, Backend needing new buffer will do 
 similar things (remove from hash table) for this buffer as this is nextvictim 
 buffer.

 Right, but only if it is the nextvictim, here we do it if it is
 nextvictim+N, for some largish values of N.  (And due to the 2 minutes
 rule, sometimes for very large values of N)

Can't we control this 2 minutes rule using new or existing GUC, is there any 
harm in that as you pointed out earlier also in mail chain that it is not good.
Because such a parameter can make the flushing by BGwriter more valuable.

I'm not sure how to devise a test case to prove that this can be important, 
though.

To start with, can't we do simple test where all (most) of the pages are in 
shared buffers and then run pg_bench select only test?
This test we can run with various configurations of shared buffers.

I have done the tests similar to above, and it shows good perf. improvement for 
shared buffers conf. as(25% of RAM).


 Robert wrote an accounting patch a while ago that tallied how often a
 buffer was cleaned but then reclaimed for the same page before being
 evicted.  But now I can't find it.  If you can find that thread, there
 might be some benchmarks posted to it that would be useful.

In my first level search, I am also not able to find it. But now I am planning 
to check all
mails of Robert Haas on PostgreSQL site (which are approximately 13,000).
If you can tell me how long ago approximately (last year, 2 yrs back, ..) or 
whether such a patch is submitted to any CF or was just discussed in mail 
chain, then it will be little easier for me.


Thank you for doing the initial review of work.

With Regards,
Amit Kapila.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-20 Thread Jeff Janes
On Fri, Sep 7, 2012 at 6:14 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Thursday, September 06, 2012 2:38 PM Amit kapila wrote:
 On Tuesday, September 04, 2012 6:55 PM Amit kapila wrote:
 On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
 On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list

 The results for the updated code is attached with this mail.
 The scenario is same as in original mail.
1. Load all the files in to OS buffers (using pg_prewarm with 'read' 
 operation) of all tables and indexes.
2. Try to load all buffers with pgbench_accounts table and 
 pgbench_accounts_pkey pages (using pg_prewarm with 'buffers' operation).
3. Run the pgbench with select only for 20 minutes.

 Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

 Server Configuration:
shared_buffers = 5GB (1/4 th of RAM size)
Total data size = 16GB
 Pgbench configuration:
transaction type: SELECT only
scaling factor: 1200
query mode: simple
number of clients: varying from 8 to 64 
number of threads: varying from 8 to 64 
duration: 1200 s

 I shall take further readings for following configurations and post the same:
 1. The intention for taking with below configuration is that, with the 
 defined testcase, there will be some cases where I/O can happen. So I wanted 
 to check the
 impact of it.

 Shared_buffers - 7 GB
 number of clients: varying from 8 to 64 
 number of threads: varying from 8 to 64 
 transaction type: SELECT only

 The data for shared_buffers = 7GB is attached with this mail. I have also 
 attached scripts used to take this data.

Is this result reproducible?  Did you monitor IO (with something like
vmstat) to make sure there was no IO going on during the runs?  Run
the modes in reciprocating order?

If you have 7GB of shared_buffers and 16GB of database, that comes out
to 23GB of data to be held in 24GB of RAM.  In my experience it is
hard to get that much data cached by simple prewarm. the newer data
will drive out the older data even if technically there is room.  So
then when you start running the benchmark, you still have to read in
some of the data which dramatically slows down the benchmark.

I haven't been able to detect any reliable difference in performance
with this patch.  I've been testing with 150 scale factor with 4GB of
ram and 4 cores, over a variety of shared_buffers and concurrencies.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-20 Thread Jeff Janes
On Fri, Oct 19, 2012 at 11:00 PM, Amit kapila amit.kap...@huawei.com wrote:

 Robert wrote an accounting patch a while ago that tallied how often a
 buffer was cleaned but then reclaimed for the same page before being
 evicted.  But now I can't find it.  If you can find that thread, there
 might be some benchmarks posted to it that would be useful.

 In my first level search, I am also not able to find it. But now I am 
 planning to check all
 mails of Robert Haas on PostgreSQL site (which are approximately 13,000).
 If you can tell me how long ago approximately (last year, 2 yrs back, ..) or 
 whether such a patch is submitted
 to any CF or was just discussed in mail chain, then it will be little easier 
 for me.

It was just an instrumentation patch for doing experiments, not
intended for commit.

I've tracked it down to the thread Initial 9.2 pgbench write
results.  But I don't think it applies to the -S benchmark, because
it records when the background writer cleaned a buffer by finding it
dirty and writing it out to make it clean, while in this situation we
would need something more like either made the buffer clean and
reusable, observed the buffer to already be clean and reusable


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-10-19 Thread Jeff Janes
On Tue, Sep 4, 2012 at 6:25 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
 On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list



 I have tried implementing it and taken the readings for Select when all the
 data is in either OS buffers

 or Shared Buffers.



 The Patch has simple implementation for  bgwriter or checkpoint process
 moving the unused buffers (unpinned with ZERO usage_count buffers) into
 freelist.

 I don't think InvalidateBuffer can be safely used in this way.  It
 says We assume
 that no other backend could possibly be interested in using the page,
 which is not true here.

 As I understood and anlyzed based on above, that there is problem in attached 
 patch such that in function
 InvalidateBuffer(), after UnlockBufHdr() and before PartitionLock if some 
 backend uses that buffer and increase the usage count to 1, still
 InvalidateBuffer() will remove the buffer from hash table and put it in 
 Freelist.
 I have modified the code to address above by checking refcount  usage_count  
 inside Partition Lock
 , LockBufHdr and only after that move it to freelist which is similar to 
 InvalidateBuffer.
 In actual code we can optimize the current code by using extra parameter in 
 InvalidateBuffer.

 Please let me know if I understood you correctly or you want to say something 
 else by above comment?

Yes, I think that this is part of the risk I was hinting at.  I
haven't evaluated your fix to it.  But assuming it is now safe, I
still think it is a bad idea to invalidate a perfectly good buffer.
Now a process that wants that page will have to read it in again, even
though it is still sitting there.  This is particularly bad because
the background writer is coded to always circle the buffer pool every
2 minutes, whether that many clean buffers are needed or not.  I think
that that is a bad idea, but having it invalidate buffers as it goes
is even worse.

I think the code for the free-list linked list is written so that it
performs correctly for a valid buffer to be on the freelist, even
though that does not happen under current implementations.  If you
find that a buffer on the freelist has become pinned, used, or dirty
since it was added (which can only happen if it is still valid), you
just remove it and try again.


 Also, do we want to actually invalidate the buffers?  If someone does
 happen to want one after it is put on the freelist, making it read it
 in again into a different buffer doesn't seem like a nice thing to do,
 rather than just letting it reclaim it.

 But even if bgwriter/checkpoint don't do, Backend needing new buffer will do 
 similar things (remove from hash table) for this buffer as this is nextvictim 
 buffer.

Right, but only if it is the nextvictim, here we do it if it is
nextvictim+N, for some largish values of N.  (And due to the 2 minutes
rule, sometimes for very large values of N)

I'm not sure how to devise a test case to prove that this can be
important, though.

Robert wrote an accounting patch a while ago that tallied how often a
buffer was cleaned but then reclaimed for the same page before being
evicted.  But now I can't find it.  If you can find that thread, there
might be some benchmarks posted to it that would be useful.


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-09-06 Thread Amit kapila

On Tuesday, September 04, 2012 6:55 PM Amit kapila wrote:
On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list



 I have tried implementing it and taken the readings for Select when all the
 data is in either OS buffers

 or Shared Buffers.



 The Patch has simple implementation for  bgwriter or checkpoint process
 moving the unused buffers (unpinned with ZERO usage_count buffers) into
 freelist.

 I don't think InvalidateBuffer can be safely used in this way.  It
 says We assume
 that no other backend could possibly be interested in using the page,
 which is not true here.

 As I understood and anlyzed based on above, that there is problem in attached 
 patch such that in function
 InvalidateBuffer(), after UnlockBufHdr() and before PartitionLock if some 
 backend uses that buffer and 
 increase the usage count to 1, still
 InvalidateBuffer() will remove the buffer from hash table and put it in 
 Freelist.
 I have modified the code to address above by checking refcount  usage_count  
 inside Partition Lock
 , LockBufHdr and only after that move it to freelist which is similar to 
 InvalidateBuffer.
 In actual code we can optimize the current code by using extra parameter in 
 InvalidateBuffer.

 Please let me know if I understood you correctly or you want to say something 
 else by above comment?

The results for the updated code is attached with this mail.
The scenario is same as in original mail.
1. Load all the files in to OS buffers (using pg_prewarm with 'read' 
operation) of all tables and indexes. 
2. Try to load all buffers with pgbench_accounts table and 
pgbench_accounts_pkey pages (using pg_prewarm with 'buffers' operation). 
3. Run the pgbench with select only for 20 minutes. 

Platform details: 
Operating System: Suse-Linux 10.2 x86_64 
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) 
RAM : 24GB 

Server Configuration: 
shared_buffers = 5GB (1/4 th of RAM size) 
Total data size = 16GB
Pgbench configuration: 
transaction type: SELECT only 
scaling factor: 1200 
query mode: simple 
number of clients: varying from 8 to 64  
number of threads: varying from 8 to 64  
duration: 1200 s

I shall take further readings for following configurations and post the same:
1. The intention for taking with below configuration is that, with the defined 
testcase, there will be some cases where I/O can happen. So I wanted to check 
the impact of it.

Shared_buffers - 7 GB
number of clients: varying from 8 to 64  
 number of threads: varying from 8 to 64  
transaction type: SELECT only 


2.The intention for taking with below configuration is that, with the defined 
testcase, memory kept for shared buffers is less then the recommended. So I 
wanted to check the impact of it.
Shared_buffers - 2 GB
number of clients: varying from 8 to 64  
number of threads: varying from 8 to 64  
transaction type: SELECT only 


3. The intention for taking with below configuration is that, with the defined 
testcase, it will test mix of dml operations where there will be I/O due to dml 
operations. So I wanted to check the impact of it.
Shared_buffers - 5GB
number of clients: varying from 8 to 64  
number of threads: varying from 8 to 64  
transaction type: tpc_b

 One problem I could see with proposed change is that in some cases the usage 
 count will get decrement for  a buffer allocated
 from free list immediately as it can be nextvictimbuffer.
 However there can be solution to this problem.


With Regards,
Amit Kapila.














 
 
  Original Postgres 9.3devel
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  SIZE
  16GB-5GB
  16GB-5GB
  16GB - 5GB
  16GB - 5GB
 
 
  Clients
  8C / 8T
  16C / 16T
  32C / 32T
  64C / 64T
 
 
  RUN-1
  60269
  72325329
  52853
  63425001
  32562
  39096275
  15375
  18502725
 
 
  RUN-2
  60370
  72451857
  58453
  70151866
  33181
  39841490
  16348
  19670518
 
 
  RUN-3
  59292
  71159080
  58976
  70782600
  33584
  40344977
  16469
  19801260
 
 
  Average
  59977
  71978755
  56761
  68119822
  33109
  39760914
  16064
  19324834
 
 
  
  
  
  
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  Bgwriter/Checkpoint
  process moving unused bufferes to Free List modification
 
 
  
  
  
  
  
  
  
  
  
 
 
  SIZE
  16GB-5GB
  16GB-5GB
  16GB - 5GB
  16GB - 5GB
 
 
  Clients
  8C / 8T
  16C / 16T
  32C / 32T
  64C / 64T
 
 
  RUN-1
  60020
  72025508
  59394
  71297211
  56372
  67676069
  26572
  31987430
 
 
  RUN-2
  60178
  72218315
  59069
  7085
  56079
  67317399
  28143
  33804132
 
 
  RUN-3
  59827
  

Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-09-04 Thread Amit kapila
On Tuesday, September 04, 2012 12:42 AM Jeff Janes wrote:
On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list



 I have tried implementing it and taken the readings for Select when all the
 data is in either OS buffers

 or Shared Buffers.



 The Patch has simple implementation for  bgwriter or checkpoint process
 moving the unused buffers (unpinned with ZERO usage_count buffers) into
 freelist.

 I don't think InvalidateBuffer can be safely used in this way.  It
 says We assume
 that no other backend could possibly be interested in using the page,
 which is not true here.

As I understood and anlyzed based on above, that there is problem in attached 
patch such that in function
InvalidateBuffer(), after UnlockBufHdr() and before PartitionLock if some 
backend uses that buffer and increase the usage count to 1, still
InvalidateBuffer() will remove the buffer from hash table and put it in 
Freelist. 
I have modified the code to address above by checking refcount  usage_count  
inside Partition Lock 
, LockBufHdr and only after that move it to freelist which is similar to 
InvalidateBuffer. 
In actual code we can optimize the current code by using extra parameter in 
InvalidateBuffer. 

Please let me know if I understood you correctly or you want to say something 
else by above comment?

 Also, do we want to actually invalidate the buffers?  If someone does
 happen to want one after it is put on the freelist, making it read it
 in again into a different buffer doesn't seem like a nice thing to do,
 rather than just letting it reclaim it.

But even if bgwriter/checkpoint don't do, Backend needing new buffer will do 
similar things (remove from hash table) for this buffer as this is nextvictim 
buffer. 
The main intention of doing the MoveBufferToFreeList is to avoid contention of 
Partition Locks and BufFreeListLock among backends, which 
has given Performance improvement in high contention scenarios.

One problem I could see with proposed change is that in some cases the usage 
count will get decrement for a buffer allocated 
from free list immediately as it can be nextvictimbuffer.
However there can be solution to this problem.

Can you suggest some scenario's where I should do more performance test?

With Regards,
Amit Kapila.diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index dba19eb..87446cb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -955,6 +955,88 @@ retry:
StrategyFreeBuffer(buf);
 }
 
+
+/*
+ * MoveBufferToFreeList -- mark a shared buffer invalid and return it to the
+ * freelist. which is similar to InvalidateBuffer function.
+ */
+static void
+MoveBufferToFreeList(volatile BufferDesc *buf)
+{
+   BufferTag   oldTag;
+   uint32  oldHash;/* hash value 
for oldTag */
+   LWLockIdoldPartitionLock;   /* buffer partition 
lock for it */
+   BufFlagsoldFlags;
+
+   /* Save the original buffer tag before dropping the spinlock */
+   oldTag = buf-tag;
+
+   UnlockBufHdr(buf);
+
+   /*
+* Need to compute the old tag's hashcode and partition lock ID. XXX is 
it
+* worth storing the hashcode in BufferDesc so we need not recompute it
+* here?  Probably not.
+*/
+   oldHash = BufTableHashCode(oldTag);
+   oldPartitionLock = BufMappingPartitionLock(oldHash);
+
+
+   /*
+* Acquire exclusive mapping lock in preparation for changing the 
buffer's
+* association.
+*/
+   LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE);
+
+   /* Re-lock the buffer header */
+   LockBufHdr(buf);
+
+   /* If it's changed while we were waiting for lock, do nothing */
+   if (!BUFFERTAGS_EQUAL(buf-tag, oldTag))
+   {
+   UnlockBufHdr(buf);
+   LWLockRelease(oldPartitionLock);
+   return;
+   }
+
+   /*
+* Validate wheather we can add the buffer into freelist or not
+*/
+   if ((buf-refcount != 0) || (buf-usage_count != 0))
+   {
+   UnlockBufHdr(buf);
+   LWLockRelease(oldPartitionLock);
+   return;
+   }
+
+   /*
+* Clear out the buffer's tag and flags.  We must do this to ensure that
+* linear scans of the buffer array don't think the buffer is valid.
+*/
+   oldFlags = buf-flags;
+   CLEAR_BUFFERTAG(buf-tag);
+   buf-flags = 0;
+   buf-usage_count = 0;
+
+   UnlockBufHdr(buf);
+
+   /*
+* Remove the buffer from the lookup hashtable, if it was in there.
+*/
+   if (oldFlags  BM_TAG_VALID)
+   BufTableDelete(oldTag, oldHash);
+
+   /*
+* Done with mapping lock.
+*/
+   

[HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-09-03 Thread Amit kapila
This patch is based on below Todo Item:

Consider adding buffers the background writer finds reusable to the free list



I have tried implementing it and taken the readings for Select when all the 
data is in either OS buffers

or Shared Buffers.



The Patch has simple implementation for  bgwriter or checkpoint process moving 
the unused buffers (unpinned with ZERO usage_count buffers) into freelist.

Results (Results.html attached with mail) are taken with following 
configuration.

Current scenario is
1. Load all the files in to OS buffers (using pg_prewarm with 'read' 
operation) of all
   tables and indexes.
2. Try to load all buffers with pgbench_accounts table and 
pgbench_accounts_pkey
   pages (using pg_prewarm with 'buffers' operation).
3. Run the pgbench with select only for 20 minutes.

Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

Server Configuration:
shared_buffers = 6GB (1/4 th of RAM size)

Pgbench configuration:
transaction type: SELECT only
scaling factor: 1200
query mode: simple
number of clients: varying from 8 to 64 
number of threads: varying from 8 to 64 
duration: 1200 s





Comments or suggestions?



I am still collecting data for update and other operations performance results 
with different database configuration.



With Regards,

Amit Kapila.
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index dba19eb..2b9cfbb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1660,8 +1660,20 @@ SyncOneBuffer(int buf_id, bool skip_recently_used)
 
if (!(bufHdr-flags  BM_VALID) || !(bufHdr-flags  BM_DIRTY))
{
-   /* It's clean, so nothing to do */
-   UnlockBufHdr(bufHdr);
+   /*
+* If the buffer is unused then move it to freelist
+*/
+   if ((bufHdr-flags  BM_VALID)
+(bufHdr-refcount == 0  bufHdr-usage_count == 0)
+(bufHdr-freeNext == FREENEXT_NOT_IN_LIST))
+   {
+   InvalidateBuffer(bufHdr);
+   }
+   else
+   {
+   /* It's clean, so nothing to do */
+   UnlockBufHdr(bufHdr);
+   }
return result;
}
 
@@ -1677,6 +1689,20 @@ SyncOneBuffer(int buf_id, bool skip_recently_used)
LWLockRelease(bufHdr-content_lock);
UnpinBuffer(bufHdr, true);
 
+
+   /*
+* If the buffer is unused then move it to freelist
+*/
+   LockBufHdr(bufHdr);
+   if (bufHdr-refcount == 0  bufHdr-usage_count == 0)
+   {
+   InvalidateBuffer(bufHdr);
+   }
+   else
+   {
+   UnlockBufHdr(bufHdr);
+   }
+
return result | BUF_WRITTEN;
 }
 















 
 
  
  
  
  
  
  
  
  
  
 
 
  Original Postgres 9.3devel
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  SIZE
  16GB-5GB
  16GB-5GB
  16GB - 5GB
  16GB - 5GB
 
 
  Clients
  8C / 8T
  16C / 16T
  32C / 32T
  64C / 64T
 
 
  RUN-1
  60269
  72325329
  52853
  63425001
  32562
  39096275
  15375
  18502725
 
 
  RUN-2
  60370
  72451857
  58453
  70151866
  33181
  39841490
  16348
  19670518
 
 
  RUN-3
  59292
  71159080
  58976
  70782600
  33584
  40344977
  16469
  19801260
 
 
  Average
  59977
  71978755
  56761
  68119822
  33109
  39760914
  16064
  19324834
 
 
  
  
  
  
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  Bgwriter/Checkpoint
  process moving unused bufferes to Free List modification
 
 
  
  
  
  
  
  
  
  
  
 
 
  SIZE
  16GB-5GB
  16GB-5GB
  16GB - 5GB
  16GB - 5GB
 
 
  Clients
  16C / 16T
  16C / 16T
  32C / 32T
  64C / 64T
 
 
  RUN-1
  57152
  68591311
  60072
  72096257
  57957
  69574459
  50240
  60363537
 
 
  RUN-2
  60707
  72858156
  60013
  72026319
  57939
  69566401
  50090
  60115068
 
 
  RUN-3
  60567
  72689308
  59853
  71832898
  57925
  69546383
  50297
  60360896
 
 
  Average
  59475
  71379592
  59979
  71985158
  57940
  69562414
  50209
  60279834
 
 
  
  
  
  
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  Diff in %
  
  
  
  
  
 
 
  
  
  
  
  
  
  
  
  
 
 
  Difference
  -0.837
  -0.8324
  5.6694
  5.6743
  74.998
  74.952
  212.56
  211.93
 
 
 
  
  
  
  
  
  
  
  
  
 
 












-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-09-03 Thread Jeff Janes
On Mon, Sep 3, 2012 at 7:15 AM, Amit kapila amit.kap...@huawei.com wrote:
 This patch is based on below Todo Item:

 Consider adding buffers the background writer finds reusable to the free
 list



 I have tried implementing it and taken the readings for Select when all the
 data is in either OS buffers

 or Shared Buffers.



 The Patch has simple implementation for  bgwriter or checkpoint process
 moving the unused buffers (unpinned with ZERO usage_count buffers) into
 freelist.

I don't think InvalidateBuffer can be safely used in this way.  It
says We assume
that no other backend could possibly be interested in using the page,
which is not true here.

Also, do we want to actually invalidate the buffers?  If someone does
happen to want one after it is put on the freelist, making it read it
in again into a different buffer doesn't seem like a nice thing to do,
rather than just letting it reclaim it.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers