Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-20 Thread Craig Ringer
On 01/18/2013 11:50 PM, Robert Haas wrote:
 On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Any scenario that involves non-trivial amount of investigation or
 development should result in us pulling the patch for rework and
 resubmission in later 'festit's closing time as they say :-).
 Amen.

OK, bumped to the next CF.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-18 Thread Craig Ringer
On 12/14/2012 09:57 PM, Amit Kapila wrote:

 I need to validate the vacuum results. It's possible that this is
 solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
 the question stands: do the results justify the change?  I'd argue
 'maybe' 
 We can try with change (assuming change is small) and see if the performance
 gain is good, then discuss whether it really justifies.
 I think the main reason for Vacuum performance hit is that in the test pages
 are getting dirty only due to setting of hint bit
 by Vacuum. 

 -- I'd like to see the bulk insert performance hit reduced if
 possible.
 I think if we can improve performance for bulk-insert case, then this patch
 has much more value. 
Has there been any movement in this - more benchmarks and data showing
that it really does improve performance, or that it really isn't helpful?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-18 Thread Merlin Moncure
On Fri, Jan 18, 2013 at 5:34 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 12/14/2012 09:57 PM, Amit Kapila wrote:

 I need to validate the vacuum results. It's possible that this is
 solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
 the question stands: do the results justify the change?  I'd argue
 'maybe'
 We can try with change (assuming change is small) and see if the performance
 gain is good, then discuss whether it really justifies.
 I think the main reason for Vacuum performance hit is that in the test pages
 are getting dirty only due to setting of hint bit
 by Vacuum.

 -- I'd like to see the bulk insert performance hit reduced if
 possible.
 I think if we can improve performance for bulk-insert case, then this patch
 has much more value.
 Has there been any movement in this - more benchmarks and data showing
 that it really does improve performance, or that it really isn't helpful?

Atri is working on that.  I don't know if he's going to pull it out
though in time so we may have to pull the patch from this fest.  My
take on the current patch is that the upside case is pretty clear but
the bulk insert performance impact needs to be figured out and
mitigated (that's what Atri is working on).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-18 Thread Atri Sharma


On 18-Jan-2013, at 17:04, Craig Ringer cr...@2ndquadrant.com wrote:

 On 12/14/2012 09:57 PM, Amit Kapila wrote:
 
 I need to validate the vacuum results. It's possible that this is
 solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
 the question stands: do the results justify the change?  I'd argue
 'maybe'
 We can try with change (assuming change is small) and see if the performance
 gain is good, then discuss whether it really justifies.
 I think the main reason for Vacuum performance hit is that in the test pages
 are getting dirty only due to setting of hint bit
 by Vacuum. 
 
 -- I'd like to see the bulk insert performance hit reduced if
 possible.
 I think if we can improve performance for bulk-insert case, then this patch
 has much more value.
 Has there been any movement in this - more benchmarks and data showing
 that it really does improve performance, or that it really isn't helpful?
 
 -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Hello all,

Sorry for the delay in updating the hackers list with the current status.

I recently did some profiling using perf on PostgreSQL 9.2 with and without our 
patch.

I noticed that maximum time is being spent on heapgettup function. Further 
investigation and a bit of a hunch leads me to believe that we may be adversely 
affecting the visibility map optimisation that directly interact with the 
visibility functions, that our patch straight away affects.

If this is the case, we may really need to get down to the design of our patch, 
and maybe see which visibility function/functions we are affecting, and see if 
we can mitigate the affect.

Please let me know your inputs on this.

Regards,

Atri


 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-18 Thread Merlin Moncure
On Fri, Jan 18, 2013 at 8:57 AM, Atri Sharma atri.j...@gmail.com wrote:

 Hello all,

 Sorry for the delay in updating the hackers list with the current status.

 I recently did some profiling using perf on PostgreSQL 9.2 with and without 
 our patch.

 I noticed that maximum time is being spent on heapgettup function. Further 
 investigation and a bit of a hunch leads me to believe that we may be 
 adversely affecting the visibility map optimisation that directly interact 
 with the visibility functions, that our patch straight away affects.

 If this is the case, we may really need to get down to the design of our 
 patch, and maybe see which visibility function/functions we are affecting, 
 and see if we can mitigate the affect.

 Please let me know your inputs on this.

Any scenario that involves non-trivial amount of investigation or
development should result in us pulling the patch for rework and
resubmission in later 'festit's closing time as they say :-).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2013-01-18 Thread Robert Haas
On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Any scenario that involves non-trivial amount of investigation or
 development should result in us pulling the patch for rework and
 resubmission in later 'festit's closing time as they say :-).

Amen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-14 Thread Amit Kapila
On Thursday, December 13, 2012 8:02 PM Merlin Moncure wrote:
 On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu haribabu.ko...@huawei.com
 wrote:
  Please find the review of the patch.
 
 
 Thanks for detailed review!
 
  Basic stuff:
  
  - Patch applies with offsets.
  - Compiles cleanly with no warnings
  - Regression Test pass.
 
  Code Review:
  -
  1. Better to set the hint bits for the tuples in a page, if
 the page
  is already dirty.
 
 This is true today but likely less true if/when page checksums come
 out.  Also it complicates the code a little bit.
 
  Default tables select :  64980.73614964550.118693
  Unlogged tables select:  64874.97433464550.118693
 
 So it looks like the extra tests visibility routines are causing 0.7%
 performance hit.
 
  Multiple transaction bulk inserts: Select performance (refer script -1
  2
  which attached)
  sequential scan:  6.4926806.382014
  Index scan:   1.3868511.36234
 
  Single transaction bulk inserts: Select performance  (refer script - 3
  4
  which attached)
  sequential scan:  6.49319 6.3800147
  Index scan:   1.3841211.3615277
 
 The performance hit is higher  here.  Almost 2%.   This is troubling.
 
  Long transaction open then Vacuum  select performance in milli
 seconds.
  (refer reports output)
  Testcase - 3:
  Single Vacuum Perf   : 128.302 ms  181.292 ms
  Single select perf   : 214.107 ms  177.057 ms
  Total: 342.409 ms  358.349 ms
 
  I was not able to find the reason why in some of cases results are low
 so
  please use the attached scripts to validate the same.
 
 I need to validate the vacuum results. It's possible that this is
 solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
 the question stands: do the results justify the change?  I'd argue
 'maybe' 

We can try with change (assuming change is small) and see if the performance
gain is good, then discuss whether it really justifies.
I think the main reason for Vacuum performance hit is that in the test pages
are getting dirty only due to setting of hint bit
by Vacuum. 

-- I'd like to see the bulk insert performance hit reduced if
 possible.

I think if we can improve performance for bulk-insert case, then this patch
has much more value. 

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-13 Thread Hari Babu
On Thu, Dec 7, 2012 at 7:56 PM, Hari babu
haribabu(dot)kommi(at)Huawei(dot)com wrote:

On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com wrote:

Thanks for that -- that's fairly comprehensive I'd say.  I'm quite 

 

interested in that benchmarking framework as well.  Do you need help 

setting up the scripts?

 

Presently I am testing with pgbench custom query option  taking IO  VM
statistics in parallel. 

Following way I had written the test script for case -1.

 

./psql -f bench_test_1_init.sql postgres iostat -t 1 -d 
hint.test1.iostat.reading_3.txt  vmstat -n 1 
hint.test1.vmstat.reading_3.txt  ./pgbench -f bench_test_1.sql -T 300 -c 8
-j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat

 

Where the sql files are as follows:

-- bench_test_1.sql

select count(*) from  bench where f1 is not null;

 

--bench_test_1_init.sql

drop table if exists bench;

create table bench(f0 int primary key, f1 char(50)); insert into bench
values (generate_series(1, 10), 'a'); insert into bench values
(generate_series(11, 20), 'a'); ...

insert into bench values (generate_series(981, 990), 'a'); insert
into bench values (generate_series(991, 1000), 'a'); checkpoint;

 

I will provide the test results later.

 

Please find the review of the patch. 

Basic stuff: 
 
- Patch applies with offsets. 
- Compiles cleanly with no warnings 
- Regression Test pass. 

Code Review: 
- 
1. Better to set the hint bits for the tuples in a page, if the page
is already dirty. 

Test cases: 
-- 
Test cases are already described in the following link. 
 
http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@k
ap...@huawei.com
http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@ka
p...@huawei.com 

Test Results: 
 
Machine details: 
CPU cores : 4 
RAM   : 24GB 
OS: Suse Linux 10 SP2 



Configuration: 
shared_buffers = 500MB 
autovacuum = off 
checkpoint_segments = 256 
checkpoint_timeout = 10min 

Following result are average of 3 runs each run is of 5min through pgbench
custom query. 

Test case - 1: 
Init: 
None 
Run: 
create temp table atri1 as select v from
generate_series(1,100) v; 
select * from atri1; 
VACUUM atri1; 
DROP TABLE atri1; 

Test case - 2: 
   Init: 
create table atri1 as select v from
generate_series(1,100) v;   
Run: 
select * from atri1; 

Test case - 3: (without pgbench) 
connect two sessions s1, s2 
s1 : start transaction; 
s2 : 
create table atri1 as select v from
generate_series(1,100) v; 
\timing 
select * from atri1; 
VACUUM atri1; 

Results: 
  9.3devel(Tps)HintbitIO(Tps) 
--- 
Test case - 1: 1.7629461.922219 
Test case - 2: 4.0386734.044356 

Pgbench without vacuum scale factor 75, 8 threads  8 client. (refer reports
attached) 
Default tables select :  64980.73614964550.118693 
Unlogged tables select:  64874.97433464550.118693 

Multiple transaction bulk inserts: Select performance (refer script -1  2
which attached) 
sequential scan:  6.4926806.382014 
Index scan:   1.3868511.36234   

Single transaction bulk inserts: Select performance  (refer script - 3  4
which attached) 
sequential scan:  6.49319 6.3800147 
Index scan:   1.3841211.3615277 

Long transaction open then Vacuum  select performance in milli seconds.
(refer reports output) 
Testcase - 3: 
Single Vacuum Perf   : 128.302 ms  181.292 ms   
Single select perf   : 214.107 ms  177.057 ms   
Total: 342.409 ms  358.349 ms 

I was not able to find the reason why in some of cases results are low so
please use the attached scripts to validate the same. 

I was not able to provide the IO statistics as IOSTAT  VMSTAT was giving
the current snapshot 
but not the cumulative from start to end of test execution. 

Documentation: 
- 
No user visible changes, so no documentation is need to update. 




Regards,

Hari babu.



results.tar.gz
Description: Binary data


scripts.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-13 Thread Atri Sharma
On Thu, Dec 13, 2012 at 6:36 PM, Hari Babu haribabu.ko...@huawei.comwrote:

 On Thu, Dec 7, 2012 at 7:56 PM, Hari babu
 haribabu(dot)kommi(at)Huawei(dot)com wrote:

 On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com
 wrote:

 Thanks for that -- that's fairly comprehensive I'd say.  I'm quite 

 ** **

 interested in that benchmarking framework as well.  Do you need help ***
 *

 setting up the scripts?

 ** **

 Presently I am testing with pgbench custom query option  taking IO  VM
 statistics in parallel. 

 Following way I had written the test script for case -1.

 ** **

 ./psql -f bench_test_1_init.sql postgres iostat -t 1 -d 
 hint.test1.iostat.reading_3.txt  vmstat -n 1 
 hint.test1.vmstat.reading_3.txt  ./pgbench -f bench_test_1.sql -T 300 -c
 8 -j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat

 ** **

 Where the sql files are as follows:

 -- bench_test_1.sql

 select count(*) from  bench where f1 is not null;

 ** **

 --bench_test_1_init.sql

 drop table if exists bench;

 create table bench(f0 int primary key, f1 char(50)); insert into bench
 values (generate_series(1, 10), 'a'); insert into bench values
 (generate_series(11, 20), 'a'); ...

 insert into bench values (generate_series(981, 990), 'a'); insert
 into bench values (generate_series(991, 1000), 'a'); checkpoint;*
 ***

 ** **

 I will provide the test results later.

 ** **

 Please find the review of the patch.

 *Basic stuff:*
 
 - Patch applies with offsets.
 - Compiles cleanly with no warnings
 - Regression Test pass.

 *Code Review:*
 -
 1. Better to set the hint bits for the tuples in a page, if the
 page is already dirty.

 *Test cases:*
 *--*
 **Test cases are already described in the following link.

 http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kap...@huawei.com

 *Test Results:*
 **
 Machine details:
 CPU cores : 4
 RAM   : 24GB
 OS: Suse Linux 10 SP2

 

 Configuration:
 shared_buffers = 500MB
 autovacuum = off
 checkpoint_segments = 256
 checkpoint_timeout = 10min

 Following result are average of 3 runs each run is of 5min through pgbench
 custom query.

 *Test case - 1**:*
 Init:
 None
 Run:
 create temp table atri1 as select v from
 generate_series(1,100) v;
 select * from atri1;
 VACUUM atri1;
 DROP TABLE atri1;

 *Test case - 2**:*
Init:
 create table atri1 as select v from
 generate_series(1,100) v;
 Run:
 select * from atri1;

 *Test case - 3**: (without pgbench)*
 connect two sessions s1, s2
 s1 : start transaction;
 s2 :
 create table atri1 as select v from
 generate_series(1,100) v;
 \timing
 select * from atri1;
 VACUUM atri1;

 *Results:*
   9.3devel(Tps)HintbitIO(Tps)
 ---
 Test case - 1: 1.7629461.922219
 Test case - 2: 4.0386734.044356

 *Pgbench without vacuum scale factor 75, 8 threads  8 client. (refer
 reports attached)*
 Default tables select :  64980.73614964550.118693
 Unlogged tables select:  64874.97433464550.118693

 *Multiple transaction bulk inserts: Select performance (refer script -1 
 2 which attached)*
 sequential scan:  6.4926806.382014
 Index scan:   1.3868511.36234

 *Single transaction bulk inserts: Select performance  (refer script - 3 
 4 which attached)*
 sequential scan:  6.49319 6.3800147
 Index scan:   1.3841211.3615277

 *Long transaction open then Vacuum  select performance in milli seconds.
 (refer reports output)*
 Testcase - 3:
 Single Vacuum Perf   : 128.302 ms  181.292 ms
 Single select perf   : 214.107 ms  177.057 ms
 Total: 342.409 ms  358.349 ms

 I was not able to find the reason why in some of cases results are low so
 please use the attached scripts to validate the same.

 I was not able to provide the IO statistics as IOSTAT  VMSTAT was giving
 the current snapshot
 but not the cumulative from start to end of test execution.

 *Documentation:*
 *-*
 No user visible changes, so no documentation is need to update.


 

 Regards,

 Hari babu.


Thanks for the review and tests.

The remarkable difference between 9.3devel and Hint Bit IO is present only
in test -3,right? I have the feeling that the original case we
discussed(vacuum setting the hint bits) is taking place here and hence the
decrease in performance.

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-13 Thread Merlin Moncure
On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu haribabu.ko...@huawei.com wrote:
 Please find the review of the patch.


Thanks for detailed review!

 Basic stuff:
 
 - Patch applies with offsets.
 - Compiles cleanly with no warnings
 - Regression Test pass.

 Code Review:
 -
 1. Better to set the hint bits for the tuples in a page, if the page
 is already dirty.

This is true today but likely less true if/when page checksums come
out.  Also it complicates the code a little bit.

 Default tables select :  64980.73614964550.118693
 Unlogged tables select:  64874.97433464550.118693

So it looks like the extra tests visibility routines are causing 0.7%
performance hit.

 Multiple transaction bulk inserts: Select performance (refer script -1  2
 which attached)
 sequential scan:  6.4926806.382014
 Index scan:   1.3868511.36234

 Single transaction bulk inserts: Select performance  (refer script - 3  4
 which attached)
 sequential scan:  6.49319 6.3800147
 Index scan:   1.3841211.3615277

The performance hit is higher  here.  Almost 2%.   This is troubling.

 Long transaction open then Vacuum  select performance in milli seconds.
 (refer reports output)
 Testcase - 3:
 Single Vacuum Perf   : 128.302 ms  181.292 ms
 Single select perf   : 214.107 ms  177.057 ms
 Total: 342.409 ms  358.349 ms

 I was not able to find the reason why in some of cases results are low so
 please use the attached scripts to validate the same.

I need to validate the vacuum results. It's possible that this is
solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
the question stands: do the results justify the change?  I'd argue
'maybe' -- I'd like to see the bulk insert performance hit reduced if
possible.  Let's see what we can do in the short term (and, if no
improvement can be found, I think this patch should be marked
'returned with feedback').

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-07 Thread Hari Babu
On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com wrote:
Thanks for that -- that's fairly comprehensive I'd say.  I'm quite
interested in that benchmarking framework as well.  Do you need help
setting up the scripts?

Presently I am testing with pgbench custom query option  taking IO  VM
statistics in parallel. 

Following way I had written the test script for case -1.
 
./psql -f bench_test_1_init.sql postgres 
iostat -t 1 -d  hint.test1.iostat.reading_3.txt  
vmstat -n 1  hint.test1.vmstat.reading_3.txt  
./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres 
killall -s SIGINT iostat 
killall -s SIGINT vmstat

Where the sql files are as follows:

-- bench_test_1.sql
select count(*) from  bench where f1 is not null;

--bench_test_1_init.sql

drop table if exists bench;
create table bench(f0 int primary key, f1 char(50));
insert into bench values (generate_series(1, 10), 'a');
insert into bench values (generate_series(11, 20), 'a');
...
insert into bench values (generate_series(981, 990), 'a');
insert into bench values (generate_series(991, 1000), 'a');
checkpoint;

I will provide the test results later.

Any suggestions/comments?

Regards,
Hari babu.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-06 Thread Amit Kapila
On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
 On 11/16/12 9:03 AM, Merlin Moncure wrote:
  Atri ran some quick n dirty tests to see if there
  were any regressions.  He benched a large scan followed by vacuum.  So
  far, results are inconclusive so better testing methodologies will
  definitely be greatly appreciated.  One of the challenges with working
  in this part of the code is that it's quite difficult to make changes
  without impacting at least some workloads.
 
 I'm adding something to pgbench-tools to test for some types of vacuum
 regressions that I've seen before.  And the checksum benchmarking I've
 already signed up for overlaps with this one quite a bit.  I'd suggest
 reviewers here focus on code quality, correctness, and targeted
 optimization testing.  I'm working heavily on a larger benchmarking
 framework that both this and checksums will fit into right now.

We are planning below performance tests for hint-bit I/O mitigation patch:

Test case -1: Select performance in sequential scan and vacuum operation
with I/O statistics
Bulk operations are happened in multiple transactions. 
1. Stop the auto-vacuum. 
2. Create table 
3. Insert 1 records in one transaction for 1000 times. 
4A. Use pgbench to select all the records using sequential scan for 5
min  - 8 threads. 
4B. Record the IO statistics. 
5. After completion of test-case check VACUUM performance. 

Test case -2: 
Select performance in index scan and vacuum operation with I/O
statistics 
Same as testcase - 1 change the 4A as follows 
4A. Use pgbench with -F option to select random records using
index scan for 5 min  - 8 threads. 

Test case -3: 
Select performance in sequential scan and vacuum operation with I/O
statistics 
When bulk operations are happened in one transaction. 
1. Stop the auto-vacuum. 
2. Create table 
3. Insert 10,000,000 times. 
4A. Use pgbench to select all the records using sequential scan for 5
min  - 8 threads. 
4B. Record the IO statistics. 
5. After completion of test-case check VACUUM performance. 

Test case -4: 
Select performance in index scan and vacuum operation with I/O
statistics 
Same as testcase - 3 change the 4A as follows 
4A. Use pgbench to select random records using index scan for 5
min  - 8 threads. 

Test case -5: 
Check original pgbench performance  vacuum operation 
1. For  select only and tcp_b  performance suites with scale factor of
75  150, threads 8 16

Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
only for setting the hit bits. )
1. Session - 1 : Open a some long transaction

2. Session - 2 : Insert some records  commit 
Do the select - all the tuples. 
Checkpoint;  
Vacuum the table 
Checkpoint; 
3. Record the IO statistics  time taken for Vacuum  2nd
Checkpoint.

Test case -7: (This is also to check Vacuum I/O)
1. Have replication setup. 
2. Insert some records  commit 
3. Vacuum the table 
4. Upgrade the standby. 
5. Select the all the tuples on new master 
6. Vacuum the tbl on new master. 
6B. Record the IO statistics  time taken for  Vacuum on new
master.

Suggestions/Feedback

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-06 Thread Merlin Moncure
On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila amit.kap...@huawei.com wrote:
 On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
 On 11/16/12 9:03 AM, Merlin Moncure wrote:
  Atri ran some quick n dirty tests to see if there
  were any regressions.  He benched a large scan followed by vacuum.  So
  far, results are inconclusive so better testing methodologies will
  definitely be greatly appreciated.  One of the challenges with working
  in this part of the code is that it's quite difficult to make changes
  without impacting at least some workloads.

 I'm adding something to pgbench-tools to test for some types of vacuum
 regressions that I've seen before.  And the checksum benchmarking I've
 already signed up for overlaps with this one quite a bit.  I'd suggest
 reviewers here focus on code quality, correctness, and targeted
 optimization testing.  I'm working heavily on a larger benchmarking
 framework that both this and checksums will fit into right now.

 We are planning below performance tests for hint-bit I/O mitigation patch:

 Test case -1: Select performance in sequential scan and vacuum operation
 with I/O statistics
 Bulk operations are happened in multiple transactions.
 1. Stop the auto-vacuum.
 2. Create table
 3. Insert 1 records in one transaction for 1000 times.
 4A. Use pgbench to select all the records using sequential scan for 5
 min  - 8 threads.
 4B. Record the IO statistics.
 5. After completion of test-case check VACUUM performance.

 Test case -2:
 Select performance in index scan and vacuum operation with I/O
 statistics
 Same as testcase - 1 change the 4A as follows
 4A. Use pgbench with -F option to select random records using
 index scan for 5 min  - 8 threads.

 Test case -3:
 Select performance in sequential scan and vacuum operation with I/O
 statistics
 When bulk operations are happened in one transaction.
 1. Stop the auto-vacuum.
 2. Create table
 3. Insert 10,000,000 times.
 4A. Use pgbench to select all the records using sequential scan for 5
 min  - 8 threads.
 4B. Record the IO statistics.
 5. After completion of test-case check VACUUM performance.

 Test case -4:
 Select performance in index scan and vacuum operation with I/O
 statistics
 Same as testcase - 3 change the 4A as follows
 4A. Use pgbench to select random records using index scan for 5
 min  - 8 threads.

 Test case -5:
 Check original pgbench performance  vacuum operation
 1. For  select only and tcp_b  performance suites with scale factor of
 75  150, threads 8 16

 Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
 only for setting the hit bits. )
 1. Session - 1 : Open a some long transaction

 2. Session - 2 : Insert some records  commit
 Do the select - all the tuples.
 Checkpoint;
 Vacuum the table
 Checkpoint;
 3. Record the IO statistics  time taken for Vacuum  2nd
 Checkpoint.

 Test case -7: (This is also to check Vacuum I/O)
 1. Have replication setup.
 2. Insert some records  commit
 3. Vacuum the table
 4. Upgrade the standby.
 5. Select the all the tuples on new master
 6. Vacuum the tbl on new master.
 6B. Record the IO statistics  time taken for  Vacuum on new
 master.

Thanks for that -- that's fairly comprehensive I'd say.  I'm quite
interested in that benchmarking framework as well.  Do you need help
setting up the scripts?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-28 Thread Alvaro Herrera
Merlin Moncure escribió:

 Maybe abstracting 'last xid cache'  along with hint bit management out
 of both transam.c and tqual.c into something like 'hints.c' is
 appropriate, but that's a more invasive change.

It would be good to have such a patch to measure/compare performance of
both approaches.  It does seem like the more invasive change might be
more expensive for both the transam.c and the tqual.c uses, though; and
it's not clear that there's anything to be gained by having them be
together in a single module.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-21 Thread Greg Smith

On 11/16/12 9:03 AM, Merlin Moncure wrote:

Atri ran some quick n dirty tests to see if there
were any regressions.  He benched a large scan followed by vacuum.  So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated.  One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.


I'm adding something to pgbench-tools to test for some types of vacuum 
regressions that I've seen before.  And the checksum benchmarking I've 
already signed up for overlaps with this one quite a bit.  I'd suggest 
reviewers here focus on code quality, correctness, and targeted 
optimization testing.  I'm working heavily on a larger benchmarking 
framework that both this and checksums will fit into right now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-16 Thread Amit Kapila
On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
 On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com
 wrote:

 
 Sure, although in scans we are using ring buffer as well so in
 practical sense the results are pretty close.
 
  b. Considering sometimes people want VACUUM to run when system is not
 busy,
  the chances of generating more overall I/O in system can be
 more.
 
 It's hard to imagine overall i/o load increasing.  However, longer
 vacuum times should be considered.   A bigger  issue is that we are
 missing an early opportunity to set the all visible bit. vacuum is
 doing:
 
 if (all_visible)
   {
   TransactionId xmin;
 
   if (!(tuple.t_data-t_infomask  HEAP_XMIN_COMMITTED))
   {
   all_visible = false;
   break;
   }
 
 assuming the cache is working and vacuum rolls around after a scan,
 you lost the opportunity to set all_visible flag where previously it
 would have been set, thereby dismantling the positive effect of an
 index only scan.  AFAICT, this is the only case where vaccum is
 directly interfacing with hint bits.  This could be construed as a
 violation of heapam API?  Maybe if that's an issue we could proxy that
 check to a heaptuple/tqual.c maintained function (in the same manner
 as HeapTupleSetHintBits) so that the cache bit could be uniformly
 checked.

I think we need to think of some tests to check if Vacuum or any other
impact has not been created due to this change.
I will devise tests during review of this patch. 
However if you have more ideas then share the same which will make tests of
this patch more strong.
For functional/performance test of this patch, one of my colleague Hari Babu
will also work along with me on it.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-16 Thread Merlin Moncure
On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila amit.kap...@huawei.com wrote:
 On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
 On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com
 wrote:


 Sure, although in scans we are using ring buffer as well so in
 practical sense the results are pretty close.

  b. Considering sometimes people want VACUUM to run when system is not
 busy,
  the chances of generating more overall I/O in system can be
 more.

 It's hard to imagine overall i/o load increasing.  However, longer
 vacuum times should be considered.   A bigger  issue is that we are
 missing an early opportunity to set the all visible bit. vacuum is
 doing:

 if (all_visible)
   {
   TransactionId xmin;

   if (!(tuple.t_data-t_infomask  HEAP_XMIN_COMMITTED))
   {
   all_visible = false;
   break;
   }

 assuming the cache is working and vacuum rolls around after a scan,
 you lost the opportunity to set all_visible flag where previously it
 would have been set, thereby dismantling the positive effect of an
 index only scan.  AFAICT, this is the only case where vaccum is
 directly interfacing with hint bits.  This could be construed as a
 violation of heapam API?  Maybe if that's an issue we could proxy that
 check to a heaptuple/tqual.c maintained function (in the same manner
 as HeapTupleSetHintBits) so that the cache bit could be uniformly
 checked.

 I think we need to think of some tests to check if Vacuum or any other
 impact has not been created due to this change.
 I will devise tests during review of this patch.
 However if you have more ideas then share the same which will make tests of
 this patch more strong.
 For functional/performance test of this patch, one of my colleague Hari Babu
 will also work along with me on it.

Thanks.  So far, Atri ran some quick n dirty tests to see if there
were any regressions.  He benched a large scan followed by vacuum.  So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated.  One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-16 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 6:42 PM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
 So given that -- the patch simple adds an extra check when/where hint
 bit status is checked in the visibility routines (currently, only
 HeapTupleSatisfiesMVCC is done but all the applicable visibility
 routines should be done).  Basically, the way it works is like this:

 *) is hint bit set?
 *) if not? does the examined xid match the last examined one?
 *) if so, and the cached hint bit matches the one want, proceeed as if
 hint bit was set

 Can you clarify the difference between this and
 cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
 patch is accepted?

Very little, except:
*) transam.c managed cache is unable to influence the specific code
path through the visibility routines, at least not without significant
refactoring -- everything that happens in tqual.c should be inlined.
I played with doing it all in transam.c a while back and didn't much
like how it turned out.  That doesn't mean it can't work though.

*) There are a couple of important looking code paths that communicate
directly with transam.c.   For example, in procarray.c
ProcArrayApplyRecoveryInfo().  Removing transam.c cache could turn
into fairly significant regression if that code is performance
sensitive -- that would have to be studied before doing that.

Maybe abstracting 'last xid cache'  along with hint bit management out
of both transam.c and tqual.c into something like 'hints.c' is
appropriate, but that's a more invasive change.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-16 Thread Atri Sharma
On Fri, Nov 16, 2012 at 7:33 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
  On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com
  wrote:
 
 
  Sure, although in scans we are using ring buffer as well so in
  practical sense the results are pretty close.
 
   b. Considering sometimes people want VACUUM to run when system is not
  busy,
   the chances of generating more overall I/O in system can be
  more.
 
  It's hard to imagine overall i/o load increasing.  However, longer
  vacuum times should be considered.   A bigger  issue is that we are
  missing an early opportunity to set the all visible bit. vacuum is
  doing:
 
  if (all_visible)
{
TransactionId xmin;
 
if (!(tuple.t_data-t_infomask  HEAP_XMIN_COMMITTED))
{
all_visible = false;
break;
}
 
  assuming the cache is working and vacuum rolls around after a scan,
  you lost the opportunity to set all_visible flag where previously it
  would have been set, thereby dismantling the positive effect of an
  index only scan.  AFAICT, this is the only case where vaccum is
  directly interfacing with hint bits.  This could be construed as a
  violation of heapam API?  Maybe if that's an issue we could proxy that
  check to a heaptuple/tqual.c maintained function (in the same manner
  as HeapTupleSetHintBits) so that the cache bit could be uniformly
  checked.
 
  I think we need to think of some tests to check if Vacuum or any other
  impact has not been created due to this change.
  I will devise tests during review of this patch.
  However if you have more ideas then share the same which will make tests
 of
  this patch more strong.
  For functional/performance test of this patch, one of my colleague Hari
 Babu
  will also work along with me on it.

 Thanks.  So far, Atri ran some quick n dirty tests to see if there
 were any regressions.  He benched a large scan followed by vacuum.  So
 far, results are inconclusive so better testing methodologies will
 definitely be greatly appreciated.  One of the challenges with working
 in this part of the code is that it's quite difficult to make changes
 without impacting at least some workloads.

 merlin


Thanks a ton Amit and your colleague Hari for volunteering to review the
patch.

I ran some benchmarks and came up with the following results:

With our code

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1
-n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 412
tps = 1.366142 (including connections establishing)
tps = 1.366227 (excluding connections establishing)


Without our code

atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1
-n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 378
tps = 1.244333 (including connections establishing)
tps = 1.27 (excluding connections establishing)

The SQL file is attached.

Please let us know if you need any more details.

Atri
-- 
Regards,

Atri
*l'apprenant*


bench2.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-15 Thread Amit Kapila
On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote:
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma atri.j...@gmail.com wrote:

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote:

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote:
  Following the sig is a first cut at a patch (written by Atri) that




PFA below the sig the updated patch for the same.It maintains a cache
cachedVisibilityXid which holds the results of clog visibility check.
cachedVisibilityXidStatus 

holds the commit status of the XID in cachedVisibilityXid.

In each visibility function (except HeapTupleSatisfiesVacuum() ), an
addition check has been added to check if the commit status of Xmin or Xmax
of a tuple can be retrieved from the cache.



 

1.  From your explanation and code, it is quite clear that it will
certainly give performance benefits in the scenario's mentioned by you.

   I can once validate the performance numbers again and do the code
review for this patch during CF-3. 

   However I am just not very sure about the use case, such that whether
it is a sufficient use case. 

   So I would like to ask opinion of other people as well.

 


2.  After this patch, tuple hint bit is not set by Select operations after
data populated by one transaction. 

  This appears to be good as it will save many ops (page dirty followed
by flush , clog inquiry). 

  Though it is no apparent, however we should see whether it can cause
any other impact due to this:

a.like may be now VACUUM needs set the hint bit which may cause more
I/O during Vacuum.

 

   Hackers, any opinion/suggestions about the use case?

With Regards,

Amit Kapila.

 



Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com wrote:
In each visibility function (except HeapTupleSatisfiesVacuum() ), an
 addition check has been added to check if the commit status of Xmin or Xmax
 of a tuple can be retrieved from the cache.



 1.  From your explanation and code, it is quite clear that it will
 certainly give performance benefits in the scenario’s mentioned by you.

I can once validate the performance numbers again and do the code
 review for this patch during CF-3.

However I am just not very sure about the use case, such that whether
 it is a sufficient use case.

So I would like to ask opinion of other people as well.

sure.  I'd like to note though that hint bit i/o is a somewhat common
complaint.  it tends to most affect OLAP style workloads.  in
pathological workloads, it can really burn you -- it's not fun when
you are i/o starved via sequential scan.  This can still happen when
sweeping dead records (which this patch doesn't deal with, though
maybe it should).

 2.  After this patch, tuple hint bit is not set by Select operations after
 data populated by one transaction.

   This appears to be good as it will save many ops (page dirty followed
 by flush , clog inquiry).

Technically it does not save clog fetch as transam.c has a very
similar cache mechanism.  However, it does save a page write i/o and a
lock on the page header, as well as a couple of other minor things.
In the best case, the page write is completely masked as the page gets
dirty for other reasons.  I think this is going to become more
important in checksum enabled scenarios.

   Though it is no apparent, however we should see whether it can cause
 any other impact due to this:

 a.like may be now VACUUM needs set the hint bit which may cause more
 I/O during Vacuum.

IMNSHO. deferring non-critical i/o from foreground process to
background process is generally good.  VACUUM has nice features like
i/o throttling and in place cancel so latent management of internal
page optimization flags really belong there ideally.  Also, the longer
you defer such I/O the more opportunity there is for it to be masked
off by some other page dirtying operation (again, this is more
important in the face of having to log hint bit changes).

There could be some good rebuttal analysis though.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-15 Thread Amit Kapila
On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote:
 On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
 In each visibility function (except HeapTupleSatisfiesVacuum() ), an
  addition check has been added to check if the commit status of Xmin
 or Xmax
  of a tuple can be retrieved from the cache.
 
 
 
  1.  From your explanation and code, it is quite clear that it will
  certainly give performance benefits in the scenario's mentioned by
 you.
 
 I can once validate the performance numbers again and do the
 code
  review for this patch during CF-3.
 
 However I am just not very sure about the use case, such that
 whether
  it is a sufficient use case.
 
 So I would like to ask opinion of other people as well.
 
 sure.  I'd like to note though that hint bit i/o is a somewhat common
 complaint.  it tends to most affect OLAP style workloads.  in
 pathological workloads, it can really burn you -- it's not fun when
 you are i/o starved via sequential scan.  This can still happen when
 sweeping dead records (which this patch doesn't deal with, though
 maybe it should).
 
  2.  After this patch, tuple hint bit is not set by Select operations
 after
  data populated by one transaction.
 
This appears to be good as it will save many ops (page dirty
 followed
  by flush , clog inquiry).
 
 Technically it does not save clog fetch as transam.c has a very
 similar cache mechanism.  However, it does save a page write i/o and a
 lock on the page header, as well as a couple of other minor things.
 In the best case, the page write is completely masked as the page gets
 dirty for other reasons.  I think this is going to become more
 important in checksum enabled scenarios.
 
Though it is no apparent, however we should see whether it can
 cause
  any other impact due to this:
 
  a.like may be now VACUUM needs set the hint bit which may
 cause more
  I/O during Vacuum.
 
 IMNSHO. deferring non-critical i/o from foreground process to
 background process is generally good.  

Yes, in regard of deferring you are right. 
However in this case may be when foreground process has to mark page dirty
due to hint-bit, it was already dirty so no extra I/O, but when it is done
by VACUUM, page may not be dirty. 

Also due to below points, doing it in VACUUM may cost more:
a. VACUUM has ring-buffer of fixed size and if such pages are many then
write of page needs to be done by VACUUM to replace existing page 
   in ring.
b. Considering sometimes people want VACUUM to run when system is not busy,
the chances of generating more overall I/O in system can be  
   more.

Why we can't avoid setting hint-bit in VACUUM?
Is it due to reason that it has to be done in some way, so that hint-bits
are properly set.
Or may be I am missing something trivial?

Though the case VACUUM, I am talking occurs very less in practical, but the
point came to my mind, 
so I thought of sharing with you.

 VACUUM has nice features like
 i/o throttling and in place cancel so latent management of internal
 page optimization flags really belong there ideally.  Also, the longer
 you defer such I/O the more opportunity there is for it to be masked
 off by some other page dirtying operation (again, this is more
 important in the face of having to log hint bit changes).
 
 There could be some good rebuttal analysis though.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-15 Thread Merlin Moncure
On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote:
 IMNSHO. deferring non-critical i/o from foreground process to
 background process is generally good.

 Yes, in regard of deferring you are right.
 However in this case may be when foreground process has to mark page dirty
 due to hint-bit, it was already dirty so no extra I/O, but when it is done
 by VACUUM, page may not be dirty.

Yeah.  We can try to be smart and set the hint bits in that case.  Not
sure that will work out with checksum having to wal log hint bits
though (which by reading the checksum threads seems likely to happen).

 Also due to below points, doing it in VACUUM may cost more:
 a. VACUUM has ring-buffer of fixed size and if such pages are many then
 write of page needs to be done by VACUUM to replace existing page
in ring.

Sure, although in scans we are using ring buffer as well so in
practical sense the results are pretty close.

 b. Considering sometimes people want VACUUM to run when system is not busy,
 the chances of generating more overall I/O in system can be
more.

It's hard to imagine overall i/o load increasing.  However, longer
vacuum times should be considered.   A bigger  issue is that we are
missing an early opportunity to set the all visible bit. vacuum is
doing:

if (all_visible)
  {
  TransactionId xmin;

  if (!(tuple.t_data-t_infomask  HEAP_XMIN_COMMITTED))
  {
  all_visible = false;
  break;
  }

assuming the cache is working and vacuum rolls around after a scan,
you lost the opportunity to set all_visible flag where previously it
would have been set, thereby dismantling the positive effect of an
index only scan.  AFAICT, this is the only case where vaccum is
directly interfacing with hint bits.  This could be construed as a
violation of heapam API?  Maybe if that's an issue we could proxy that
check to a heaptuple/tqual.c maintained function (in the same manner
as HeapTupleSetHintBits) so that the cache bit could be uniformly
checked.

All other *setting* of hint bits is running through SetHintBits(), so
I think we are safe from vacuum point of view.  That's another thing
to test for though.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-15 Thread Jeff Davis
On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
 So given that -- the patch simple adds an extra check when/where hint
 bit status is checked in the visibility routines (currently, only
 HeapTupleSatisfiesMVCC is done but all the applicable visibility
 routines should be done).  Basically, the way it works is like this:
 
 *) is hint bit set?
 *) if not? does the examined xid match the last examined one?
 *) if so, and the cached hint bit matches the one want, proceeed as if
 hint bit was set

Can you clarify the difference between this and
cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
patch is accepted?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-14 Thread Atri Sharma
On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
   Following the sig is a first cut at a patch (written by Atri) that
   attempts to mitigate hint bit i/o penalty when many pages worth of
   tuples are sequentially written out with the same transaction id.
   There have been other attempts to deal with this problem that fit
   niche cases (especially those that create the table in the same
   transaction as the one inserting) that work but don't really solve
  the
   problem generally.
 
  As we are now dealing with only the last xid(please refer to the details
  of the patch attached), the invalidation issues are not significant any
  more.
 
  I think you are right, but please check below the scenario I have in mind
  due to which I got confused:
 
  Session -1
  1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
 inside SetHinbits and set it to xid of tuple which is let's say = 708
  2. now for all consecutive tuples which have same xmin (708), it can
  directly refer
 cached id and cached status, even if hint bit is not set.
 
  Other Sessions
  3. now from other sessions, large operations happened on all other tables
  except this table.
  4. The situation can reach where xid can wrap around.
 
  Session -1
  5. It again tries to query the same table, at this point it will compare
 the xid in tuple with cachedVisibilityXid.
 
  Now if all tuple's xid's for that particular table are frozen in all
 cases
  then it seems to be okay, otherwise it might be problem.
  I am not fully aware about this wrap around and frozed xid concept, thats
  why I had doubted
  that it might create problem, on further thought, it appears that I was
  wrong.

 Well there's that.  But more to the point for the cache to fail you'd
 have to have a scenario where a table didn't scan any records for 1
 billion+ transactions.   See note [3] above for reasoning why this is
 implausible.  Also we're already relying on this effect in transam.c.

 merlin


PFA below the sig the updated patch for the same.It maintains a cache
cachedVisibilityXid which holds the results of clog visibility check.
cachedVisibilityXidStatus holds the commit status of the XID in
cachedVisibilityXid.

In each visibility function (except HeapTupleSatisfiesVacuum() ), an
addition check has been added to check if the commit status of Xmin or Xmax
of a tuple can be retrieved from the cache.

So, in place of

  if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED))

the condition is now

if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED)
 !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
 cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))

This checks if the commit status can be known from the cache or not before
proceeding.

I will be posting the patch to commit fest.

Thoughts/Feedback?

Atri

-- 
Regards,

Atri
*l'apprenant


*patch:

*** tqual--unchanged.c2012-09-20 03:17:58.0 +0530
--- tqual.c2012-11-14 23:27:30.470499857 +0530
***
*** 75,80 
--- 75,88 
  /* local functions */
  static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

+ /*
+  * Single-item cache for results of clog visibility check.  It's worth
having
+  * such a cache to help reduce the amount of hint bit traffic when
+  * many sequentially touched tuples have the same XID.
+  */
+ static TransactionId cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in
cachedVisibilityXid */
+ static uint16 cachedVisibilityXidStatus;

  /*
   * SetHintBits()
***
*** 117,122 
--- 125,133 

  if (XLogNeedsFlush(commitLSN)  BufferIsPermanent(buffer))
  return;/* not flushed yet, so don't set hint
*/
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
  }

  tuple-t_infomask |= infomask;
***
*** 164,170 
  bool
  HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
buffer)
  {
! if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED))
  {
  if (tuple-t_infomask  HEAP_XMIN_INVALID)
  return false;
--- 175,183 
  bool
  HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
buffer)
  {
! if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED)
!  !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
!  cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
  {
  if (tuple-t_infomask  HEAP_XMIN_INVALID)
  return false;
***
*** 247,253 
  if (tuple-t_infomask  HEAP_XMAX_INVALID)/* xid invalid or
aborted */
  return true;

! if (tuple-t_infomask  HEAP_XMAX_COMMITTED)
  {
  if (tuple-t_infomask  HEAP_IS_LOCKED)
  return true;
--- 260,268 
  if (tuple-t_infomask  HEAP_XMAX_INVALID)/* xid invalid or
aborted 

Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-14 Thread Atri Sharma
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma atri.j...@gmail.com wrote:




 On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com
 wrote:
   Following the sig is a first cut at a patch (written by Atri) that
   attempts to mitigate hint bit i/o penalty when many pages worth of
   tuples are sequentially written out with the same transaction id.
   There have been other attempts to deal with this problem that fit
   niche cases (especially those that create the table in the same
   transaction as the one inserting) that work but don't really solve
  the
   problem generally.
 
  As we are now dealing with only the last xid(please refer to the
 details
  of the patch attached), the invalidation issues are not significant any
  more.
 
  I think you are right, but please check below the scenario I have in
 mind
  due to which I got confused:
 
  Session -1
  1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it
 go
 inside SetHinbits and set it to xid of tuple which is let's say = 708
  2. now for all consecutive tuples which have same xmin (708), it can
  directly refer
 cached id and cached status, even if hint bit is not set.
 
  Other Sessions
  3. now from other sessions, large operations happened on all other
 tables
  except this table.
  4. The situation can reach where xid can wrap around.
 
  Session -1
  5. It again tries to query the same table, at this point it will compare
 the xid in tuple with cachedVisibilityXid.
 
  Now if all tuple's xid's for that particular table are frozen in all
 cases
  then it seems to be okay, otherwise it might be problem.
  I am not fully aware about this wrap around and frozed xid concept,
 thats
  why I had doubted
  that it might create problem, on further thought, it appears that I was
  wrong.

 Well there's that.  But more to the point for the cache to fail you'd
 have to have a scenario where a table didn't scan any records for 1
 billion+ transactions.   See note [3] above for reasoning why this is
 implausible.  Also we're already relying on this effect in transam.c.

 merlin


 PFA below the sig the updated patch for the same.It maintains a cache
 cachedVisibilityXid which holds the results of clog visibility check.
 cachedVisibilityXidStatus holds the commit status of the XID in
 cachedVisibilityXid.

 In each visibility function (except HeapTupleSatisfiesVacuum() ), an
 addition check has been added to check if the commit status of Xmin or Xmax
 of a tuple can be retrieved from the cache.

 So, in place of

   if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED))

 the condition is now


 if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED)
  !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
  cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))

 This checks if the commit status can be known from the cache or not before
 proceeding.

 I will be posting the patch to commit fest.

 Thoughts/Feedback?

 Atri

 --
 Regards,

 Atri
 *l'apprenant


 *patch:

 *** tqual--unchanged.c2012-09-20 03:17:58.0 +0530
 --- tqual.c2012-11-14 23:27:30.470499857 +0530
 ***
 *** 75,80 
 --- 75,88 

   /* local functions */
   static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

 + /*
 +  * Single-item cache for results of clog visibility check.  It's worth
 having
 +  * such a cache to help reduce the amount of hint bit traffic when
 +  * many sequentially touched tuples have the same XID.
 +  */
 + static TransactionId cachedVisibilityXid;
 + /* Visibility status cache stores the commit status of the XID in
 cachedVisibilityXid */
 + static uint16 cachedVisibilityXidStatus;

   /*
* SetHintBits()
 ***
 *** 117,122 
 --- 125,133 


   if (XLogNeedsFlush(commitLSN)  BufferIsPermanent(buffer))
   return;/* not flushed yet, so don't set hint
 */
 +
 + cachedVisibilityXid = xid;
 + cachedVisibilityXidStatus = infomask;
   }

   tuple-t_infomask |= infomask;
 ***
 *** 164,170 
   bool
   HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
 buffer)

   {
 ! if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED))
   {
   if (tuple-t_infomask  HEAP_XMIN_INVALID)
   return false;
 --- 175,183 
   bool
   HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
 buffer)

   {
 ! if (!(tuple-t_infomask  HEAP_XMIN_COMMITTED)
 !  !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
 !  cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
   {
   if (tuple-t_infomask  HEAP_XMIN_INVALID)
   return false;
 ***
 *** 247,253 
   if (tuple-t_infomask  HEAP_XMAX_INVALID)/* xid invalid or
 aborted */
   return true;

 ! if (tuple-t_infomask  HEAP_XMAX_COMMITTED)
   {
 

Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-07 Thread Amit Kapila
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Merlin Moncure
 Sent: Wednesday, November 07, 2012 5:26 AM
 To: PostgreSQL-development
 Cc: Atri Sharma
 Subject: [HACKERS] WIP patch for hint bit i/o mitigation
 
 Following the sig is a first cut at a patch (written by Atri) that
 attempts to mitigate hint bit i/o penalty when many pages worth of
 tuples are sequentially written out with the same transaction id.
 There have been other attempts to deal with this problem that fit
 niche cases (especially those that create the table in the same
 transaction as the one inserting) that work but don't really solve the
 problem generally.
 
 I previously attacked this problem ([1], [2]) and came up with a patch
 that cached hint bits inside tqual.c.  The patch was pulled for a few
 reasons:
 
 1) a lot of complexity without proper justification
 2) sketchy cache replacement algorithm
 3) I manged to misspell 'committed' just about everywhere
 4) invalidation?
 
 Issues 1-3 could have been worked out but #4 was making me think the
 problem was a nonstarter, or at least, 'too much too soon'.  The tuple
 visibility routines are in a very tight code path and having to deal
 with various things in the backend that could cause the xid to become
 stale were making me nervous.   A smaller, simpler patch might be the
 ticket.

About invalidation, I think the cached xid can become invalid due to xid
wraparound.
So for that one way could be to invalidate it through Vacuum.

Though I am not sure what all other things can make cached id as invalid,
but I think once we
can think what other ways can make cached id invalid, then we can see if
there is a solution to address
them.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-07 Thread Atri Sharma
On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote:

 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Merlin Moncure
 Sent: Wednesday, November 07, 2012 5:26 AM
 To: PostgreSQL-development
 Cc: Atri Sharma
 Subject: [HACKERS] WIP patch for hint bit i/o mitigation
 
 Following the sig is a first cut at a patch (written by Atri) that
 attempts to mitigate hint bit i/o penalty when many pages worth of
 tuples are sequentially written out with the same transaction id.
 There have been other attempts to deal with this problem that fit
 niche cases (especially those that create the table in the same
 transaction as the one inserting) that work but don't really solve the
 problem generally.
 
 I previously attacked this problem ([1], [2]) and came up with a patch
 that cached hint bits inside tqual.c.  The patch was pulled for a few
 reasons:
 
 1) a lot of complexity without proper justification
 2) sketchy cache replacement algorithm
 3) I manged to misspell 'committed' just about everywhere
 4) invalidation?
 
 Issues 1-3 could have been worked out but #4 was making me think the
 problem was a nonstarter, or at least, 'too much too soon'.  The tuple
 visibility routines are in a very tight code path and having to deal
 with various things in the backend that could cause the xid to become
 stale were making me nervous.   A smaller, simpler patch might be the
 ticket.
 
 About invalidation, I think the cached xid can become invalid due to xid
 wraparound.
 So for that one way could be to invalidate it through Vacuum.
 
 Though I am not sure what all other things can make cached id as invalid,
 but I think once we
 can think what other ways can make cached id invalid, then we can see if
 there is a solution to address
 them.
 
 
 With Regards,
 Amit Kapila.
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

As we are now dealing with only the last xid(please refer to the details of the 
patch attached), the invalidation issues are not significant any more.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-07 Thread Amit Kapila


 -Original Message-
 From: Atri Sharma [mailto:atri.j...@gmail.com]
 Sent: Wednesday, November 07, 2012 4:02 PM
 To: Amit Kapila
 Cc: Merlin Moncure; PostgreSQL-development
 Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
 
 On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote:
 
  From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
  ow...@postgresql.org] On Behalf Of Merlin Moncure
  Sent: Wednesday, November 07, 2012 5:26 AM
  To: PostgreSQL-development
  Cc: Atri Sharma
  Subject: [HACKERS] WIP patch for hint bit i/o mitigation
 
  Following the sig is a first cut at a patch (written by Atri) that
  attempts to mitigate hint bit i/o penalty when many pages worth of
  tuples are sequentially written out with the same transaction id.
  There have been other attempts to deal with this problem that fit
  niche cases (especially those that create the table in the same
  transaction as the one inserting) that work but don't really solve
 the
  problem generally.
 
 As we are now dealing with only the last xid(please refer to the details
 of the patch attached), the invalidation issues are not significant any
 more.

I think you are right, but please check below the scenario I have in mind
due to which I got confused:

Session -1
1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
   inside SetHinbits and set it to xid of tuple which is let's say = 708
2. now for all consecutive tuples which have same xmin (708), it can
directly refer
   cached id and cached status, even if hint bit is not set.

Other Sessions
3. now from other sessions, large operations happened on all other tables
except this table.
4. The situation can reach where xid can wrap around.

Session -1
5. It again tries to query the same table, at this point it will compare 
   the xid in tuple with cachedVisibilityXid.
  
Now if all tuple's xid's for that particular table are frozen in all cases
then it seems to be okay, otherwise it might be problem.
I am not fully aware about this wrap around and frozed xid concept, thats
why I had doubted
that it might create problem, on further thought, it appears that I was
wrong.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-07 Thread Atri Sharma
On Wed, Nov 7, 2012 at 5:31 PM, Amit Kapila amit.kap...@huawei.com wrote:



  -Original Message-
  From: Atri Sharma [mailto:atri.j...@gmail.com]
  Sent: Wednesday, November 07, 2012 4:02 PM
  To: Amit Kapila
  Cc: Merlin Moncure; PostgreSQL-development
  Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
 
  On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote:
 
   From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
   ow...@postgresql.org] On Behalf Of Merlin Moncure
   Sent: Wednesday, November 07, 2012 5:26 AM
   To: PostgreSQL-development
   Cc: Atri Sharma
   Subject: [HACKERS] WIP patch for hint bit i/o mitigation
  
   Following the sig is a first cut at a patch (written by Atri) that
   attempts to mitigate hint bit i/o penalty when many pages worth of
   tuples are sequentially written out with the same transaction id.
   There have been other attempts to deal with this problem that fit
   niche cases (especially those that create the table in the same
   transaction as the one inserting) that work but don't really solve
  the
   problem generally.
 
  As we are now dealing with only the last xid(please refer to the details
  of the patch attached), the invalidation issues are not significant any
  more.

 I think you are right, but please check below the scenario I have in mind
 due to which I got confused:

 Session -1
 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
inside SetHinbits and set it to xid of tuple which is let's say = 708
 2. now for all consecutive tuples which have same xmin (708), it can
 directly refer
cached id and cached status, even if hint bit is not set.

 Other Sessions
 3. now from other sessions, large operations happened on all other tables
 except this table.
 4. The situation can reach where xid can wrap around.

 Session -1
 5. It again tries to query the same table, at this point it will compare
the xid in tuple with cachedVisibilityXid.

 Now if all tuple's xid's for that particular table are frozen in all cases
 then it seems to be okay, otherwise it might be problem.
 I am not fully aware about this wrap around and frozed xid concept, thats
 why I had doubted
 that it might create problem, on further thought, it appears that I was
 wrong.

 With Regards,
 Amit Kapila.


AFAIK, xid are managed by reference xids, that have a range of +- 2 billion
xids. Once this limit is reached, then reference xids are moved forward,
and the xids that do not fall in the reference xid +- 2 billion are
freezed.Hence, in the given scenario, I believe once the wrap around
happens, since the xmin is same for all the tuples in session-1's table,
there should no be no problem and all tuple's xid for that particular table
will be frozen.

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-07 Thread Merlin Moncure
On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote:
  Following the sig is a first cut at a patch (written by Atri) that
  attempts to mitigate hint bit i/o penalty when many pages worth of
  tuples are sequentially written out with the same transaction id.
  There have been other attempts to deal with this problem that fit
  niche cases (especially those that create the table in the same
  transaction as the one inserting) that work but don't really solve
 the
  problem generally.

 As we are now dealing with only the last xid(please refer to the details
 of the patch attached), the invalidation issues are not significant any
 more.

 I think you are right, but please check below the scenario I have in mind
 due to which I got confused:

 Session -1
 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
inside SetHinbits and set it to xid of tuple which is let's say = 708
 2. now for all consecutive tuples which have same xmin (708), it can
 directly refer
cached id and cached status, even if hint bit is not set.

 Other Sessions
 3. now from other sessions, large operations happened on all other tables
 except this table.
 4. The situation can reach where xid can wrap around.

 Session -1
 5. It again tries to query the same table, at this point it will compare
the xid in tuple with cachedVisibilityXid.

 Now if all tuple's xid's for that particular table are frozen in all cases
 then it seems to be okay, otherwise it might be problem.
 I am not fully aware about this wrap around and frozed xid concept, thats
 why I had doubted
 that it might create problem, on further thought, it appears that I was
 wrong.

Well there's that.  But more to the point for the cache to fail you'd
have to have a scenario where a table didn't scan any records for 1
billion+ transactions.   See note [3] above for reasoning why this is
implausible.  Also we're already relying on this effect in transam.c.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers