Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 01/18/2013 11:50 PM, Robert Haas wrote: On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure mmonc...@gmail.com wrote: Any scenario that involves non-trivial amount of investigation or development should result in us pulling the patch for rework and resubmission in later 'festit's closing time as they say :-). Amen. OK, bumped to the next CF. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 12/14/2012 09:57 PM, Amit Kapila wrote: I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' We can try with change (assuming change is small) and see if the performance gain is good, then discuss whether it really justifies. I think the main reason for Vacuum performance hit is that in the test pages are getting dirty only due to setting of hint bit by Vacuum. -- I'd like to see the bulk insert performance hit reduced if possible. I think if we can improve performance for bulk-insert case, then this patch has much more value. Has there been any movement in this - more benchmarks and data showing that it really does improve performance, or that it really isn't helpful? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Fri, Jan 18, 2013 at 5:34 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 12/14/2012 09:57 PM, Amit Kapila wrote: I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' We can try with change (assuming change is small) and see if the performance gain is good, then discuss whether it really justifies. I think the main reason for Vacuum performance hit is that in the test pages are getting dirty only due to setting of hint bit by Vacuum. -- I'd like to see the bulk insert performance hit reduced if possible. I think if we can improve performance for bulk-insert case, then this patch has much more value. Has there been any movement in this - more benchmarks and data showing that it really does improve performance, or that it really isn't helpful? Atri is working on that. I don't know if he's going to pull it out though in time so we may have to pull the patch from this fest. My take on the current patch is that the upside case is pretty clear but the bulk insert performance impact needs to be figured out and mitigated (that's what Atri is working on). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 18-Jan-2013, at 17:04, Craig Ringer cr...@2ndquadrant.com wrote: On 12/14/2012 09:57 PM, Amit Kapila wrote: I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' We can try with change (assuming change is small) and see if the performance gain is good, then discuss whether it really justifies. I think the main reason for Vacuum performance hit is that in the test pages are getting dirty only due to setting of hint bit by Vacuum. -- I'd like to see the bulk insert performance hit reduced if possible. I think if we can improve performance for bulk-insert case, then this patch has much more value. Has there been any movement in this - more benchmarks and data showing that it really does improve performance, or that it really isn't helpful? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Hello all, Sorry for the delay in updating the hackers list with the current status. I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch. I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me to believe that we may be adversely affecting the visibility map optimisation that directly interact with the visibility functions, that our patch straight away affects. If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functions we are affecting, and see if we can mitigate the affect. Please let me know your inputs on this. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Fri, Jan 18, 2013 at 8:57 AM, Atri Sharma atri.j...@gmail.com wrote: Hello all, Sorry for the delay in updating the hackers list with the current status. I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch. I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me to believe that we may be adversely affecting the visibility map optimisation that directly interact with the visibility functions, that our patch straight away affects. If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functions we are affecting, and see if we can mitigate the affect. Please let me know your inputs on this. Any scenario that involves non-trivial amount of investigation or development should result in us pulling the patch for rework and resubmission in later 'festit's closing time as they say :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure mmonc...@gmail.com wrote: Any scenario that involves non-trivial amount of investigation or development should result in us pulling the patch for rework and resubmission in later 'festit's closing time as they say :-). Amen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, December 13, 2012 8:02 PM Merlin Moncure wrote: On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu haribabu.ko...@huawei.com wrote: Please find the review of the patch. Thanks for detailed review! Basic stuff: - Patch applies with offsets. - Compiles cleanly with no warnings - Regression Test pass. Code Review: - 1. Better to set the hint bits for the tuples in a page, if the page is already dirty. This is true today but likely less true if/when page checksums come out. Also it complicates the code a little bit. Default tables select : 64980.73614964550.118693 Unlogged tables select: 64874.97433464550.118693 So it looks like the extra tests visibility routines are causing 0.7% performance hit. Multiple transaction bulk inserts: Select performance (refer script -1 2 which attached) sequential scan: 6.4926806.382014 Index scan: 1.3868511.36234 Single transaction bulk inserts: Select performance (refer script - 3 4 which attached) sequential scan: 6.49319 6.3800147 Index scan: 1.3841211.3615277 The performance hit is higher here. Almost 2%. This is troubling. Long transaction open then Vacuum select performance in milli seconds. (refer reports output) Testcase - 3: Single Vacuum Perf : 128.302 ms 181.292 ms Single select perf : 214.107 ms 177.057 ms Total: 342.409 ms 358.349 ms I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same. I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' We can try with change (assuming change is small) and see if the performance gain is good, then discuss whether it really justifies. I think the main reason for Vacuum performance hit is that in the test pages are getting dirty only due to setting of hint bit by Vacuum. -- I'd like to see the bulk insert performance hit reduced if possible. I think if we can improve performance for bulk-insert case, then this patch has much more value. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Dec 7, 2012 at 7:56 PM, Hari babu haribabu(dot)kommi(at)Huawei(dot)com wrote: On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com wrote: Thanks for that -- that's fairly comprehensive I'd say. I'm quite interested in that benchmarking framework as well. Do you need help setting up the scripts? Presently I am testing with pgbench custom query option taking IO VM statistics in parallel. Following way I had written the test script for case -1. ./psql -f bench_test_1_init.sql postgres iostat -t 1 -d hint.test1.iostat.reading_3.txt vmstat -n 1 hint.test1.vmstat.reading_3.txt ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat Where the sql files are as follows: -- bench_test_1.sql select count(*) from bench where f1 is not null; --bench_test_1_init.sql drop table if exists bench; create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 10), 'a'); insert into bench values (generate_series(11, 20), 'a'); ... insert into bench values (generate_series(981, 990), 'a'); insert into bench values (generate_series(991, 1000), 'a'); checkpoint; I will provide the test results later. Please find the review of the patch. Basic stuff: - Patch applies with offsets. - Compiles cleanly with no warnings - Regression Test pass. Code Review: - 1. Better to set the hint bits for the tuples in a page, if the page is already dirty. Test cases: -- Test cases are already described in the following link. http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@k ap...@huawei.com http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@ka p...@huawei.com Test Results: Machine details: CPU cores : 4 RAM : 24GB OS: Suse Linux 10 SP2 Configuration: shared_buffers = 500MB autovacuum = off checkpoint_segments = 256 checkpoint_timeout = 10min Following result are average of 3 runs each run is of 5min through pgbench custom query. Test case - 1: Init: None Run: create temp table atri1 as select v from generate_series(1,100) v; select * from atri1; VACUUM atri1; DROP TABLE atri1; Test case - 2: Init: create table atri1 as select v from generate_series(1,100) v; Run: select * from atri1; Test case - 3: (without pgbench) connect two sessions s1, s2 s1 : start transaction; s2 : create table atri1 as select v from generate_series(1,100) v; \timing select * from atri1; VACUUM atri1; Results: 9.3devel(Tps)HintbitIO(Tps) --- Test case - 1: 1.7629461.922219 Test case - 2: 4.0386734.044356 Pgbench without vacuum scale factor 75, 8 threads 8 client. (refer reports attached) Default tables select : 64980.73614964550.118693 Unlogged tables select: 64874.97433464550.118693 Multiple transaction bulk inserts: Select performance (refer script -1 2 which attached) sequential scan: 6.4926806.382014 Index scan: 1.3868511.36234 Single transaction bulk inserts: Select performance (refer script - 3 4 which attached) sequential scan: 6.49319 6.3800147 Index scan: 1.3841211.3615277 Long transaction open then Vacuum select performance in milli seconds. (refer reports output) Testcase - 3: Single Vacuum Perf : 128.302 ms 181.292 ms Single select perf : 214.107 ms 177.057 ms Total: 342.409 ms 358.349 ms I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same. I was not able to provide the IO statistics as IOSTAT VMSTAT was giving the current snapshot but not the cumulative from start to end of test execution. Documentation: - No user visible changes, so no documentation is need to update. Regards, Hari babu. results.tar.gz Description: Binary data scripts.tar.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Dec 13, 2012 at 6:36 PM, Hari Babu haribabu.ko...@huawei.comwrote: On Thu, Dec 7, 2012 at 7:56 PM, Hari babu haribabu(dot)kommi(at)Huawei(dot)com wrote: On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com wrote: Thanks for that -- that's fairly comprehensive I'd say. I'm quite ** ** interested in that benchmarking framework as well. Do you need help *** * setting up the scripts? ** ** Presently I am testing with pgbench custom query option taking IO VM statistics in parallel. Following way I had written the test script for case -1. ** ** ./psql -f bench_test_1_init.sql postgres iostat -t 1 -d hint.test1.iostat.reading_3.txt vmstat -n 1 hint.test1.vmstat.reading_3.txt ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat ** ** Where the sql files are as follows: -- bench_test_1.sql select count(*) from bench where f1 is not null; ** ** --bench_test_1_init.sql drop table if exists bench; create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 10), 'a'); insert into bench values (generate_series(11, 20), 'a'); ... insert into bench values (generate_series(981, 990), 'a'); insert into bench values (generate_series(991, 1000), 'a'); checkpoint;* *** ** ** I will provide the test results later. ** ** Please find the review of the patch. *Basic stuff:* - Patch applies with offsets. - Compiles cleanly with no warnings - Regression Test pass. *Code Review:* - 1. Better to set the hint bits for the tuples in a page, if the page is already dirty. *Test cases:* *--* **Test cases are already described in the following link. http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kap...@huawei.com *Test Results:* ** Machine details: CPU cores : 4 RAM : 24GB OS: Suse Linux 10 SP2 Configuration: shared_buffers = 500MB autovacuum = off checkpoint_segments = 256 checkpoint_timeout = 10min Following result are average of 3 runs each run is of 5min through pgbench custom query. *Test case - 1**:* Init: None Run: create temp table atri1 as select v from generate_series(1,100) v; select * from atri1; VACUUM atri1; DROP TABLE atri1; *Test case - 2**:* Init: create table atri1 as select v from generate_series(1,100) v; Run: select * from atri1; *Test case - 3**: (without pgbench)* connect two sessions s1, s2 s1 : start transaction; s2 : create table atri1 as select v from generate_series(1,100) v; \timing select * from atri1; VACUUM atri1; *Results:* 9.3devel(Tps)HintbitIO(Tps) --- Test case - 1: 1.7629461.922219 Test case - 2: 4.0386734.044356 *Pgbench without vacuum scale factor 75, 8 threads 8 client. (refer reports attached)* Default tables select : 64980.73614964550.118693 Unlogged tables select: 64874.97433464550.118693 *Multiple transaction bulk inserts: Select performance (refer script -1 2 which attached)* sequential scan: 6.4926806.382014 Index scan: 1.3868511.36234 *Single transaction bulk inserts: Select performance (refer script - 3 4 which attached)* sequential scan: 6.49319 6.3800147 Index scan: 1.3841211.3615277 *Long transaction open then Vacuum select performance in milli seconds. (refer reports output)* Testcase - 3: Single Vacuum Perf : 128.302 ms 181.292 ms Single select perf : 214.107 ms 177.057 ms Total: 342.409 ms 358.349 ms I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same. I was not able to provide the IO statistics as IOSTAT VMSTAT was giving the current snapshot but not the cumulative from start to end of test execution. *Documentation:* *-* No user visible changes, so no documentation is need to update. Regards, Hari babu. Thanks for the review and tests. The remarkable difference between 9.3devel and Hint Bit IO is present only in test -3,right? I have the feeling that the original case we discussed(vacuum setting the hint bits) is taking place here and hence the decrease in performance. Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu haribabu.ko...@huawei.com wrote: Please find the review of the patch. Thanks for detailed review! Basic stuff: - Patch applies with offsets. - Compiles cleanly with no warnings - Regression Test pass. Code Review: - 1. Better to set the hint bits for the tuples in a page, if the page is already dirty. This is true today but likely less true if/when page checksums come out. Also it complicates the code a little bit. Default tables select : 64980.73614964550.118693 Unlogged tables select: 64874.97433464550.118693 So it looks like the extra tests visibility routines are causing 0.7% performance hit. Multiple transaction bulk inserts: Select performance (refer script -1 2 which attached) sequential scan: 6.4926806.382014 Index scan: 1.3868511.36234 Single transaction bulk inserts: Select performance (refer script - 3 4 which attached) sequential scan: 6.49319 6.3800147 Index scan: 1.3841211.3615277 The performance hit is higher here. Almost 2%. This is troubling. Long transaction open then Vacuum select performance in milli seconds. (refer reports output) Testcase - 3: Single Vacuum Perf : 128.302 ms 181.292 ms Single select perf : 214.107 ms 177.057 ms Total: 342.409 ms 358.349 ms I was not able to find the reason why in some of cases results are low so please use the attached scripts to validate the same. I need to validate the vacuum results. It's possible that this is solvable by tweaking xmin check inside vacuum. Assuming that's fixed, the question stands: do the results justify the change? I'd argue 'maybe' -- I'd like to see the bulk insert performance hit reduced if possible. Let's see what we can do in the short term (and, if no improvement can be found, I think this patch should be marked 'returned with feedback'). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure mmonc...@gmail.com wrote: Thanks for that -- that's fairly comprehensive I'd say. I'm quite interested in that benchmarking framework as well. Do you need help setting up the scripts? Presently I am testing with pgbench custom query option taking IO VM statistics in parallel. Following way I had written the test script for case -1. ./psql -f bench_test_1_init.sql postgres iostat -t 1 -d hint.test1.iostat.reading_3.txt vmstat -n 1 hint.test1.vmstat.reading_3.txt ./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres killall -s SIGINT iostat killall -s SIGINT vmstat Where the sql files are as follows: -- bench_test_1.sql select count(*) from bench where f1 is not null; --bench_test_1_init.sql drop table if exists bench; create table bench(f0 int primary key, f1 char(50)); insert into bench values (generate_series(1, 10), 'a'); insert into bench values (generate_series(11, 20), 'a'); ... insert into bench values (generate_series(981, 990), 'a'); insert into bench values (generate_series(991, 1000), 'a'); checkpoint; I will provide the test results later. Any suggestions/comments? Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 22, 2012 3:00 AM Greg Smith wrote: On 11/16/12 9:03 AM, Merlin Moncure wrote: Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. I'm adding something to pgbench-tools to test for some types of vacuum regressions that I've seen before. And the checksum benchmarking I've already signed up for overlaps with this one quite a bit. I'd suggest reviewers here focus on code quality, correctness, and targeted optimization testing. I'm working heavily on a larger benchmarking framework that both this and checksums will fit into right now. We are planning below performance tests for hint-bit I/O mitigation patch: Test case -1: Select performance in sequential scan and vacuum operation with I/O statistics Bulk operations are happened in multiple transactions. 1. Stop the auto-vacuum. 2. Create table 3. Insert 1 records in one transaction for 1000 times. 4A. Use pgbench to select all the records using sequential scan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -2: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 1 change the 4A as follows 4A. Use pgbench with -F option to select random records using index scan for 5 min - 8 threads. Test case -3: Select performance in sequential scan and vacuum operation with I/O statistics When bulk operations are happened in one transaction. 1. Stop the auto-vacuum. 2. Create table 3. Insert 10,000,000 times. 4A. Use pgbench to select all the records using sequential scan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -4: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 3 change the 4A as follows 4A. Use pgbench to select random records using index scan for 5 min - 8 threads. Test case -5: Check original pgbench performance vacuum operation 1. For select only and tcp_b performance suites with scale factor of 75 150, threads 8 16 Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty only for setting the hit bits. ) 1. Session - 1 : Open a some long transaction 2. Session - 2 : Insert some records commit Do the select - all the tuples. Checkpoint; Vacuum the table Checkpoint; 3. Record the IO statistics time taken for Vacuum 2nd Checkpoint. Test case -7: (This is also to check Vacuum I/O) 1. Have replication setup. 2. Insert some records commit 3. Vacuum the table 4. Upgrade the standby. 5. Select the all the tuples on new master 6. Vacuum the tbl on new master. 6B. Record the IO statistics time taken for Vacuum on new master. Suggestions/Feedback With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 22, 2012 3:00 AM Greg Smith wrote: On 11/16/12 9:03 AM, Merlin Moncure wrote: Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. I'm adding something to pgbench-tools to test for some types of vacuum regressions that I've seen before. And the checksum benchmarking I've already signed up for overlaps with this one quite a bit. I'd suggest reviewers here focus on code quality, correctness, and targeted optimization testing. I'm working heavily on a larger benchmarking framework that both this and checksums will fit into right now. We are planning below performance tests for hint-bit I/O mitigation patch: Test case -1: Select performance in sequential scan and vacuum operation with I/O statistics Bulk operations are happened in multiple transactions. 1. Stop the auto-vacuum. 2. Create table 3. Insert 1 records in one transaction for 1000 times. 4A. Use pgbench to select all the records using sequential scan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -2: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 1 change the 4A as follows 4A. Use pgbench with -F option to select random records using index scan for 5 min - 8 threads. Test case -3: Select performance in sequential scan and vacuum operation with I/O statistics When bulk operations are happened in one transaction. 1. Stop the auto-vacuum. 2. Create table 3. Insert 10,000,000 times. 4A. Use pgbench to select all the records using sequential scan for 5 min - 8 threads. 4B. Record the IO statistics. 5. After completion of test-case check VACUUM performance. Test case -4: Select performance in index scan and vacuum operation with I/O statistics Same as testcase - 3 change the 4A as follows 4A. Use pgbench to select random records using index scan for 5 min - 8 threads. Test case -5: Check original pgbench performance vacuum operation 1. For select only and tcp_b performance suites with scale factor of 75 150, threads 8 16 Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty only for setting the hit bits. ) 1. Session - 1 : Open a some long transaction 2. Session - 2 : Insert some records commit Do the select - all the tuples. Checkpoint; Vacuum the table Checkpoint; 3. Record the IO statistics time taken for Vacuum 2nd Checkpoint. Test case -7: (This is also to check Vacuum I/O) 1. Have replication setup. 2. Insert some records commit 3. Vacuum the table 4. Upgrade the standby. 5. Select the all the tuples on new master 6. Vacuum the tbl on new master. 6B. Record the IO statistics time taken for Vacuum on new master. Thanks for that -- that's fairly comprehensive I'd say. I'm quite interested in that benchmarking framework as well. Do you need help setting up the scripts? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
Merlin Moncure escribió: Maybe abstracting 'last xid cache' along with hint bit management out of both transam.c and tqual.c into something like 'hints.c' is appropriate, but that's a more invasive change. It would be good to have such a patch to measure/compare performance of both approaches. It does seem like the more invasive change might be more expensive for both the transam.c and the tqual.c uses, though; and it's not clear that there's anything to be gained by having them be together in a single module. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 11/16/12 9:03 AM, Merlin Moncure wrote: Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. I'm adding something to pgbench-tools to test for some types of vacuum regressions that I've seen before. And the checksum benchmarking I've already signed up for overlaps with this one quite a bit. I'd suggest reviewers here focus on code quality, correctness, and targeted optimization testing. I'm working heavily on a larger benchmarking framework that both this and checksums will fit into right now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote: On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote: Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data-t_infomask HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. I think we need to think of some tests to check if Vacuum or any other impact has not been created due to this change. I will devise tests during review of this patch. However if you have more ideas then share the same which will make tests of this patch more strong. For functional/performance test of this patch, one of my colleague Hari Babu will also work along with me on it. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote: On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote: Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data-t_infomask HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. I think we need to think of some tests to check if Vacuum or any other impact has not been created due to this change. I will devise tests during review of this patch. However if you have more ideas then share the same which will make tests of this patch more strong. For functional/performance test of this patch, one of my colleague Hari Babu will also work along with me on it. Thanks. So far, Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 6:42 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote: So given that -- the patch simple adds an extra check when/where hint bit status is checked in the visibility routines (currently, only HeapTupleSatisfiesMVCC is done but all the applicable visibility routines should be done). Basically, the way it works is like this: *) is hint bit set? *) if not? does the examined xid match the last examined one? *) if so, and the cached hint bit matches the one want, proceeed as if hint bit was set Can you clarify the difference between this and cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your patch is accepted? Very little, except: *) transam.c managed cache is unable to influence the specific code path through the visibility routines, at least not without significant refactoring -- everything that happens in tqual.c should be inlined. I played with doing it all in transam.c a while back and didn't much like how it turned out. That doesn't mean it can't work though. *) There are a couple of important looking code paths that communicate directly with transam.c. For example, in procarray.c ProcArrayApplyRecoveryInfo(). Removing transam.c cache could turn into fairly significant regression if that code is performance sensitive -- that would have to be studied before doing that. Maybe abstracting 'last xid cache' along with hint bit management out of both transam.c and tqual.c into something like 'hints.c' is appropriate, but that's a more invasive change. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Fri, Nov 16, 2012 at 7:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote: On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote: Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data-t_infomask HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. I think we need to think of some tests to check if Vacuum or any other impact has not been created due to this change. I will devise tests during review of this patch. However if you have more ideas then share the same which will make tests of this patch more strong. For functional/performance test of this patch, one of my colleague Hari Babu will also work along with me on it. Thanks. So far, Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. merlin Thanks a ton Amit and your colleague Hari for volunteering to review the patch. I ran some benchmarks and came up with the following results: With our code atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300 transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 8 number of threads: 1 duration: 300 s number of transactions actually processed: 412 tps = 1.366142 (including connections establishing) tps = 1.366227 (excluding connections establishing) Without our code atri@atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1 -n -f bench2.sql -c 8 -T 300 transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 8 number of threads: 1 duration: 300 s number of transactions actually processed: 378 tps = 1.244333 (including connections establishing) tps = 1.27 (excluding connections establishing) The SQL file is attached. Please let us know if you need any more details. Atri -- Regards, Atri *l'apprenant* bench2.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote: On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma atri.j...@gmail.com wrote: On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote: Following the sig is a first cut at a patch (written by Atri) that PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid. In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario's mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. Hackers, any opinion/suggestions about the use case? With Regards, Amit Kapila.
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com wrote: In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario’s mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. sure. I'd like to note though that hint bit i/o is a somewhat common complaint. it tends to most affect OLAP style workloads. in pathological workloads, it can really burn you -- it's not fun when you are i/o starved via sequential scan. This can still happen when sweeping dead records (which this patch doesn't deal with, though maybe it should). 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Technically it does not save clog fetch as transam.c has a very similar cache mechanism. However, it does save a page write i/o and a lock on the page header, as well as a couple of other minor things. In the best case, the page write is completely masked as the page gets dirty for other reasons. I think this is going to become more important in checksum enabled scenarios. Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. VACUUM has nice features like i/o throttling and in place cancel so latent management of internal page optimization flags really belong there ideally. Also, the longer you defer such I/O the more opportunity there is for it to be masked off by some other page dirtying operation (again, this is more important in the face of having to log hint bit changes). There could be some good rebuttal analysis though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote: On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila amit.kap...@huawei.com wrote: In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. 1. From your explanation and code, it is quite clear that it will certainly give performance benefits in the scenario's mentioned by you. I can once validate the performance numbers again and do the code review for this patch during CF-3. However I am just not very sure about the use case, such that whether it is a sufficient use case. So I would like to ask opinion of other people as well. sure. I'd like to note though that hint bit i/o is a somewhat common complaint. it tends to most affect OLAP style workloads. in pathological workloads, it can really burn you -- it's not fun when you are i/o starved via sequential scan. This can still happen when sweeping dead records (which this patch doesn't deal with, though maybe it should). 2. After this patch, tuple hint bit is not set by Select operations after data populated by one transaction. This appears to be good as it will save many ops (page dirty followed by flush , clog inquiry). Technically it does not save clog fetch as transam.c has a very similar cache mechanism. However, it does save a page write i/o and a lock on the page header, as well as a couple of other minor things. In the best case, the page write is completely masked as the page gets dirty for other reasons. I think this is going to become more important in checksum enabled scenarios. Though it is no apparent, however we should see whether it can cause any other impact due to this: a.like may be now VACUUM needs set the hint bit which may cause more I/O during Vacuum. IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. Yes, in regard of deferring you are right. However in this case may be when foreground process has to mark page dirty due to hint-bit, it was already dirty so no extra I/O, but when it is done by VACUUM, page may not be dirty. Also due to below points, doing it in VACUUM may cost more: a. VACUUM has ring-buffer of fixed size and if such pages are many then write of page needs to be done by VACUUM to replace existing page in ring. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. Why we can't avoid setting hint-bit in VACUUM? Is it due to reason that it has to be done in some way, so that hint-bits are properly set. Or may be I am missing something trivial? Though the case VACUUM, I am talking occurs very less in practical, but the point came to my mind, so I thought of sharing with you. VACUUM has nice features like i/o throttling and in place cancel so latent management of internal page optimization flags really belong there ideally. Also, the longer you defer such I/O the more opportunity there is for it to be masked off by some other page dirtying operation (again, this is more important in the face of having to log hint bit changes). There could be some good rebuttal analysis though. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila amit.kap...@huawei.com wrote: IMNSHO. deferring non-critical i/o from foreground process to background process is generally good. Yes, in regard of deferring you are right. However in this case may be when foreground process has to mark page dirty due to hint-bit, it was already dirty so no extra I/O, but when it is done by VACUUM, page may not be dirty. Yeah. We can try to be smart and set the hint bits in that case. Not sure that will work out with checksum having to wal log hint bits though (which by reading the checksum threads seems likely to happen). Also due to below points, doing it in VACUUM may cost more: a. VACUUM has ring-buffer of fixed size and if such pages are many then write of page needs to be done by VACUUM to replace existing page in ring. Sure, although in scans we are using ring buffer as well so in practical sense the results are pretty close. b. Considering sometimes people want VACUUM to run when system is not busy, the chances of generating more overall I/O in system can be more. It's hard to imagine overall i/o load increasing. However, longer vacuum times should be considered. A bigger issue is that we are missing an early opportunity to set the all visible bit. vacuum is doing: if (all_visible) { TransactionId xmin; if (!(tuple.t_data-t_infomask HEAP_XMIN_COMMITTED)) { all_visible = false; break; } assuming the cache is working and vacuum rolls around after a scan, you lost the opportunity to set all_visible flag where previously it would have been set, thereby dismantling the positive effect of an index only scan. AFAICT, this is the only case where vaccum is directly interfacing with hint bits. This could be construed as a violation of heapam API? Maybe if that's an issue we could proxy that check to a heaptuple/tqual.c maintained function (in the same manner as HeapTupleSetHintBits) so that the cache bit could be uniformly checked. All other *setting* of hint bits is running through SetHintBits(), so I think we are safe from vacuum point of view. That's another thing to test for though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote: So given that -- the patch simple adds an extra check when/where hint bit status is checked in the visibility routines (currently, only HeapTupleSatisfiesMVCC is done but all the applicable visibility routines should be done). Basically, the way it works is like this: *) is hint bit set? *) if not? does the examined xid match the last examined one? *) if so, and the cached hint bit matches the one want, proceeed as if hint bit was set Can you clarify the difference between this and cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your patch is accepted? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote: Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple which is let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. Well there's that. But more to the point for the cache to fail you'd have to have a scenario where a table didn't scan any records for 1 billion+ transactions. See note [3] above for reasoning why this is implausible. Also we're already relying on this effect in transam.c. merlin PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid. In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. So, in place of if (!(tuple-t_infomask HEAP_XMIN_COMMITTED)) the condition is now if (!(tuple-t_infomask HEAP_XMIN_COMMITTED) !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple) cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED)) This checks if the commit status can be known from the cache or not before proceeding. I will be posting the patch to commit fest. Thoughts/Feedback? Atri -- Regards, Atri *l'apprenant *patch: *** tqual--unchanged.c2012-09-20 03:17:58.0 +0530 --- tqual.c2012-11-14 23:27:30.470499857 +0530 *** *** 75,80 --- 75,88 /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot); + /* + * Single-item cache for results of clog visibility check. It's worth having + * such a cache to help reduce the amount of hint bit traffic when + * many sequentially touched tuples have the same XID. + */ + static TransactionId cachedVisibilityXid; + /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */ + static uint16 cachedVisibilityXidStatus; /* * SetHintBits() *** *** 117,122 --- 125,133 if (XLogNeedsFlush(commitLSN) BufferIsPermanent(buffer)) return;/* not flushed yet, so don't set hint */ + + cachedVisibilityXid = xid; + cachedVisibilityXidStatus = infomask; } tuple-t_infomask |= infomask; *** *** 164,170 bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple-t_infomask HEAP_XMIN_COMMITTED)) { if (tuple-t_infomask HEAP_XMIN_INVALID) return false; --- 175,183 bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple-t_infomask HEAP_XMIN_COMMITTED) ! !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple) ! cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED)) { if (tuple-t_infomask HEAP_XMIN_INVALID) return false; *** *** 247,253 if (tuple-t_infomask HEAP_XMAX_INVALID)/* xid invalid or aborted */ return true; ! if (tuple-t_infomask HEAP_XMAX_COMMITTED) { if (tuple-t_infomask HEAP_IS_LOCKED) return true; --- 260,268 if (tuple-t_infomask HEAP_XMAX_INVALID)/* xid invalid or aborted
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma atri.j...@gmail.com wrote: On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote: Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple which is let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. Well there's that. But more to the point for the cache to fail you'd have to have a scenario where a table didn't scan any records for 1 billion+ transactions. See note [3] above for reasoning why this is implausible. Also we're already relying on this effect in transam.c. merlin PFA below the sig the updated patch for the same.It maintains a cache cachedVisibilityXid which holds the results of clog visibility check. cachedVisibilityXidStatus holds the commit status of the XID in cachedVisibilityXid. In each visibility function (except HeapTupleSatisfiesVacuum() ), an addition check has been added to check if the commit status of Xmin or Xmax of a tuple can be retrieved from the cache. So, in place of if (!(tuple-t_infomask HEAP_XMIN_COMMITTED)) the condition is now if (!(tuple-t_infomask HEAP_XMIN_COMMITTED) !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple) cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED)) This checks if the commit status can be known from the cache or not before proceeding. I will be posting the patch to commit fest. Thoughts/Feedback? Atri -- Regards, Atri *l'apprenant *patch: *** tqual--unchanged.c2012-09-20 03:17:58.0 +0530 --- tqual.c2012-11-14 23:27:30.470499857 +0530 *** *** 75,80 --- 75,88 /* local functions */ static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot); + /* + * Single-item cache for results of clog visibility check. It's worth having + * such a cache to help reduce the amount of hint bit traffic when + * many sequentially touched tuples have the same XID. + */ + static TransactionId cachedVisibilityXid; + /* Visibility status cache stores the commit status of the XID in cachedVisibilityXid */ + static uint16 cachedVisibilityXidStatus; /* * SetHintBits() *** *** 117,122 --- 125,133 if (XLogNeedsFlush(commitLSN) BufferIsPermanent(buffer)) return;/* not flushed yet, so don't set hint */ + + cachedVisibilityXid = xid; + cachedVisibilityXidStatus = infomask; } tuple-t_infomask |= infomask; *** *** 164,170 bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple-t_infomask HEAP_XMIN_COMMITTED)) { if (tuple-t_infomask HEAP_XMIN_INVALID) return false; --- 175,183 bool HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer buffer) { ! if (!(tuple-t_infomask HEAP_XMIN_COMMITTED) ! !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple) ! cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED)) { if (tuple-t_infomask HEAP_XMIN_INVALID) return false; *** *** 247,253 if (tuple-t_infomask HEAP_XMAX_INVALID)/* xid invalid or aborted */ return true; ! if (tuple-t_infomask HEAP_XMAX_COMMITTED) {
Re: [HACKERS] WIP patch for hint bit i/o mitigation
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, November 07, 2012 5:26 AM To: PostgreSQL-development Cc: Atri Sharma Subject: [HACKERS] WIP patch for hint bit i/o mitigation Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. I previously attacked this problem ([1], [2]) and came up with a patch that cached hint bits inside tqual.c. The patch was pulled for a few reasons: 1) a lot of complexity without proper justification 2) sketchy cache replacement algorithm 3) I manged to misspell 'committed' just about everywhere 4) invalidation? Issues 1-3 could have been worked out but #4 was making me think the problem was a nonstarter, or at least, 'too much too soon'. The tuple visibility routines are in a very tight code path and having to deal with various things in the backend that could cause the xid to become stale were making me nervous. A smaller, simpler patch might be the ticket. About invalidation, I think the cached xid can become invalid due to xid wraparound. So for that one way could be to invalidate it through Vacuum. Though I am not sure what all other things can make cached id as invalid, but I think once we can think what other ways can make cached id invalid, then we can see if there is a solution to address them. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote: From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, November 07, 2012 5:26 AM To: PostgreSQL-development Cc: Atri Sharma Subject: [HACKERS] WIP patch for hint bit i/o mitigation Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. I previously attacked this problem ([1], [2]) and came up with a patch that cached hint bits inside tqual.c. The patch was pulled for a few reasons: 1) a lot of complexity without proper justification 2) sketchy cache replacement algorithm 3) I manged to misspell 'committed' just about everywhere 4) invalidation? Issues 1-3 could have been worked out but #4 was making me think the problem was a nonstarter, or at least, 'too much too soon'. The tuple visibility routines are in a very tight code path and having to deal with various things in the backend that could cause the xid to become stale were making me nervous. A smaller, simpler patch might be the ticket. About invalidation, I think the cached xid can become invalid due to xid wraparound. So for that one way could be to invalidate it through Vacuum. Though I am not sure what all other things can make cached id as invalid, but I think once we can think what other ways can make cached id invalid, then we can see if there is a solution to address them. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
-Original Message- From: Atri Sharma [mailto:atri.j...@gmail.com] Sent: Wednesday, November 07, 2012 4:02 PM To: Amit Kapila Cc: Merlin Moncure; PostgreSQL-development Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote: From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, November 07, 2012 5:26 AM To: PostgreSQL-development Cc: Atri Sharma Subject: [HACKERS] WIP patch for hint bit i/o mitigation Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple which is let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Wed, Nov 7, 2012 at 5:31 PM, Amit Kapila amit.kap...@huawei.com wrote: -Original Message- From: Atri Sharma [mailto:atri.j...@gmail.com] Sent: Wednesday, November 07, 2012 4:02 PM To: Amit Kapila Cc: Merlin Moncure; PostgreSQL-development Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation On 07-Nov-2012, at 15:46, Amit Kapila amit.kap...@huawei.com wrote: From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, November 07, 2012 5:26 AM To: PostgreSQL-development Cc: Atri Sharma Subject: [HACKERS] WIP patch for hint bit i/o mitigation Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple which is let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. With Regards, Amit Kapila. AFAIK, xid are managed by reference xids, that have a range of +- 2 billion xids. Once this limit is reached, then reference xids are moved forward, and the xids that do not fall in the reference xid +- 2 billion are freezed.Hence, in the given scenario, I believe once the wrap around happens, since the xmin is same for all the tuples in session-1's table, there should no be no problem and all tuple's xid for that particular table will be frozen. Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila amit.kap...@huawei.com wrote: Following the sig is a first cut at a patch (written by Atri) that attempts to mitigate hint bit i/o penalty when many pages worth of tuples are sequentially written out with the same transaction id. There have been other attempts to deal with this problem that fit niche cases (especially those that create the table in the same transaction as the one inserting) that work but don't really solve the problem generally. As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more. I think you are right, but please check below the scenario I have in mind due to which I got confused: Session -1 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go inside SetHinbits and set it to xid of tuple which is let's say = 708 2. now for all consecutive tuples which have same xmin (708), it can directly refer cached id and cached status, even if hint bit is not set. Other Sessions 3. now from other sessions, large operations happened on all other tables except this table. 4. The situation can reach where xid can wrap around. Session -1 5. It again tries to query the same table, at this point it will compare the xid in tuple with cachedVisibilityXid. Now if all tuple's xid's for that particular table are frozen in all cases then it seems to be okay, otherwise it might be problem. I am not fully aware about this wrap around and frozed xid concept, thats why I had doubted that it might create problem, on further thought, it appears that I was wrong. Well there's that. But more to the point for the cache to fail you'd have to have a scenario where a table didn't scan any records for 1 billion+ transactions. See note [3] above for reasoning why this is implausible. Also we're already relying on this effect in transam.c. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers