Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, The attached patches are revised ones according to the latest custom-plan interface patch (v11). The cache-scan module was re-implemented on the newer interface, and also I noticed the extension does not handle the tuples being redirected correctly, So, I revised the logic in ccache_vacuum_page() totally. It now becomes to synchronize the cached tuples per page, not per tuple, basic and tries to merge t-tree chunks per page basis also. Also, I split the patches again because *demonstration* part is much larger than the patches to the core backend. It will help reviewing. * pgsql-v9.4-vacuum_page_hook.v11.patch - It adds a hook for each page being vacuumed; that needs to synchronize the status of in-memory cache managed by extension. * pgsql-v9.4-mvcc_allows_cache.v11.patch - It allows to run HeapTupleSatisfiesVisibility() towards the tuples on the in-memory cache, not on the heap. * pgsql-v9.4-example-cache_scan.v11.patch - It demonstrates the usage of above two patches. It allows to scan a relation without storage access if possible. All the patches are good. The cache scan extension patch may need further refinement in terms of performance improvement but the same can be handled later also. So I am marking the patch as ready for committer. Thanks for the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
Also, I split the patches again because *demonstration* part is much larger than the patches to the core backend. It will help reviewing. * pgsql-v9.4-vacuum_page_hook.v11.patch - It adds a hook for each page being vacuumed; that needs to synchronize the status of in-memory cache managed by extension. * pgsql-v9.4-mvcc_allows_cache.v11.patch - It allows to run HeapTupleSatisfiesVisibility() towards the tuples on the in-memory cache, not on the heap. * pgsql-v9.4-example-cache_scan.v11.patch - It demonstrates the usage of above two patches. It allows to scan a relation without storage access if possible. All the patches are good. The cache scan extension patch may need further refinement in terms of performance improvement but the same can be handled later also. So I am marking the patch as ready for committer. Thanks for the patch. Thanks for your dedicated efforts on these patches. The smaller portions of above submission can be applied independent from the custom-plan interface, and its scale is much smaller than contrib/ portion (about 30lines in total). If someone can pick them up individually from the extension portion, it also makes sense. I intended to implement the extension portion as simple as I can, for the demonstration purpose rather than performance, however, its scale is about 2.5KL. :-( Yes, I know the time pressure towards v9.4 final feature freeze Best regards, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Haribabu Kommi [mailto:kommi.harib...@gmail.com] Sent: Tuesday, March 18, 2014 11:14 AM To: Kaigai Kouhei(海外 浩平) Cc: Kohei KaiGai; Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, The attached patches are revised ones according to the latest custom-plan interface patch (v11). The cache-scan module was re-implemented on the newer interface, and also I noticed the extension does not handle the tuples being redirected correctly, So, I revised the logic in ccache_vacuum_page() totally. It now becomes to synchronize the cached tuples per page, not per tuple, basic and tries to merge t-tree chunks per page basis also. Also, I split the patches again because *demonstration* part is much larger than the patches to the core backend. It will help reviewing. * pgsql-v9.4-vacuum_page_hook.v11.patch - It adds a hook for each page being vacuumed; that needs to synchronize the status of in-memory cache managed by extension. * pgsql-v9.4-mvcc_allows_cache.v11.patch - It allows to run HeapTupleSatisfiesVisibility() towards the tuples on the in-memory cache, not on the heap. * pgsql-v9.4-example-cache_scan.v11.patch - It demonstrates the usage of above two patches. It allows to scan a relation without storage access if possible. All the patches are good. The cache scan extension patch may need further refinement in terms of performance improvement but the same can be handled later also. So I am marking the patch as ready for committer. Thanks for the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) This cache is designed to hide latency of disk accesses, so this test case does not fit its intention. (Also, the primary purpose of this module is a demonstration for heap_page_prune_hook to hook vacuuming, so simple code was preferred than complicated implementation but better performance.) I could reproduce the overall trend, no cache scan is faster than cached scan if buffer is in memory. Probably, it comes from the cost to walk down T-tree index using ctid per reference. Performance penalty around UPDATE and DELETE likely come from trigger invocation per row. I could observe performance gain on INSERT a little bit. It's strange for me, also. :-( On the other hand, the discussion around custom-plan interface effects this module because it uses this API as foundation. Please wait for a few days to rebase the cache_scan module onto the newer custom-plan interface; that I submitted just a moment before. Also, is it really necessary to tune the performance stuff in this example module of the heap_page_prune_hook? Even though I have a few ideas to improve the cache performance, like insertion of multiple rows at once or local chunk copy instead of t-tree walk down, I'm not sure whether it is productive in the current v9.4 timeframe. ;-( Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi Sent: Wednesday, March 12, 2014 1:14 PM To: Kohei KaiGai Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I will update you later regarding the performance test results. I ran the performance test on the cache scan patch and below are the readings. Configuration: Shared_buffers - 512MB cache_scan.num_blocks - 600 checkpoint_segments - 255 Machine: OS - centos - 6.4 CPU - 4 core 2.5 GHZ Memory - 4GB Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) Thanks for the information and a small correction. The Total number of records are 5 million. The select operation is selecting 500K records. The total table size is around 1GB. Once I get your new patch re-based on the custom scan patch, I will test the performance again by increasing my database size more than the RAM size. And also I will make sure that memory available for disk cache is less. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I will update you later regarding the performance test results. I ran the performance test on the cache scan patch and below are the readings. Configuration: Shared_buffers - 512MB cache_scan.num_blocks - 600 checkpoint_segments - 255 Machine: OS - centos - 6.4 CPU - 4 core 2.5 GHZ Memory - 4GB Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. Regards, Hari Babu Fujitsu Australia shared_buffers = 512MB shared_preload_libraries = 'cache_scan' cache_scan.num_blocks = 600 checkpoint_segments - 255 \timing --cache scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR ROW EXECUTE PROCEDURE cache_scan_synchronizer(); CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT EXECUTE PROCEDURE cache_scan_synchronizer(); vacuum analyze test; explain select count(*) from test where f1 0; select count(*) from test where f1 0; select count(*) from test where f1 0; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --sequence scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; vacuum analyze test; set cache_scan.enabled = off; explain select count(*) from test where f1 0; select count(*) from test where f1 0; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --cache scan select 500K create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR ROW EXECUTE PROCEDURE cache_scan_synchronizer(); CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT EXECUTE PROCEDURE cache_scan_synchronizer(); vacuum analyze test; explain select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --sequence scan select 500K create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; vacuum analyze test; set cache_scan.enabled = off; explain select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: 4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid); + if (pchunk != NULL pchunk != cchunk) + ccache_merge_chunk(ccache, pchunk); + pchunk = cchunk; The merge_chunk is called only when the heap tuples are spread across two cache chunks. Actually one cache chunk can accommodate one or more than heap pages. it needs some other way of handling. I adjusted the logic to merge the chunks as follows: Once a tuple is vacuumed from a chunk, it also checks whether it can be merged with its child leafs. A chunk has up to two child leafs; left one has less ctid that the parent, and right one has greater ctid. It means a chunk without right child in the left sub-tree or a chunk without left child in the right sub-tree are neighbor of the chunk being vacuumed. In addition, if vacuumed chunk does not have either (or both) of children, it can be merged with parent node. I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, if vacuumed chunk has enough free space. Patch looks good. Regarding merging of the nodes, instead of checking whether merge is possible or not for every tuple which is vacuumed, can we put some kind of threshold as whenever the node is 50% free then try to merge it from leaf nodes until 90% is full. The rest of the 10% will be left for the next inserts on the node. what do you say? I will update you later regarding the performance test results. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: 4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid); + if (pchunk != NULL pchunk != cchunk) + ccache_merge_chunk(ccache, pchunk); + pchunk = cchunk; The merge_chunk is called only when the heap tuples are spread across two cache chunks. Actually one cache chunk can accommodate one or more than heap pages. it needs some other way of handling. I adjusted the logic to merge the chunks as follows: Once a tuple is vacuumed from a chunk, it also checks whether it can be merged with its child leafs. A chunk has up to two child leafs; left one has less ctid that the parent, and right one has greater ctid. It means a chunk without right child in the left sub-tree or a chunk without left child in the right sub-tree are neighbor of the chunk being vacuumed. In addition, if vacuumed chunk does not have either (or both) of children, it can be merged with parent node. I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, if vacuumed chunk has enough free space. Patch looks good. Thanks for your volunteering. Regarding merging of the nodes, instead of checking whether merge is possible or not for every tuple which is vacuumed, can we put some kind of threshold as whenever the node is 50% free then try to merge it from leaf nodes until 90% is full. The rest of the 10% will be left for the next inserts on the node. what do you say? Hmm. Indeed, it makes sense. How about an idea that kicks chunk merging if expected free space of merged chunk is less than 50%? If threshold depends on the (expected) usage of merged chunk, it can avoid over-merging. I will update you later regarding the performance test results. Thhanks, Also, I'll rebase the patch on top of the new custom-scan interfaces according to Tom's suggestion, even though main logic of cache_scan is not changed. Best regards, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
Thanks for the information, I will apply other patches also and start testing. When try to run the pgbench test, by default the cache-scan plan is not chosen because of more cost. So I increased the cpu_index_tuple_cost to a maximum value or by turning off index_scan, so that the plan can chose the cache_scan as the least cost. It's expected. In case of index-scan is available, its cost is obviously cheaper than cache-scan, even if it does not issue disk-i/o. The configuration parameters changed during the test are, shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB, checkpoint_segments - 255 checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or enable_indexscan=off Test procedure: 1. Initialize the database with pgbench with 75 scale factor. 2. Create the triggers on pgbench_accounts 3. Use a select query to load all the data into cache. 4. Run a simple update pgbench test. Plan details of pgbench simple update queries: postgres=# explain update pgbench_accounts set abalance = abalance where aid = 10; QUERY PLAN -- - Update on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) - Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) Index Cond: (aid = 10) Planning time: 0.045 ms (4 rows) postgres=# explain select abalance from pgbench_accounts where aid = 10; QUERY PLAN -- -- Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99 rows=1 width=4) Filter: (aid = 10) Planning time: 0.042 ms (3 rows) I am observing a too much delay in performance results. The performance test script is attached in the mail. I want you to compare two different cases between sequential scan but part of buffers have to be loaded from storage and cache-only scan. It probably takes a difference. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi kommi.harib...@gmail.comwrote: Thanks for the information, I will apply other patches also and start testing. When try to run the pgbench test, by default the cache-scan plan is not chosen because of more cost. So I increased the cpu_index_tuple_cost to a maximum value or by turning off index_scan, so that the plan can chose the cache_scan as the least cost. The configuration parameters changed during the test are, shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB, checkpoint_segments - 255 checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or enable_indexscan=off Test procedure: 1. Initialize the database with pgbench with 75 scale factor. 2. Create the triggers on pgbench_accounts 3. Use a select query to load all the data into cache. 4. Run a simple update pgbench test. Plan details of pgbench simple update queries: postgres=# explain update pgbench_accounts set abalance = abalance where aid = 10; QUERY PLAN --- Update on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) - Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) Index Cond: (aid = 10) Planning time: 0.045 ms (4 rows) postgres=# explain select abalance from pgbench_accounts where aid = 10; QUERY PLAN Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99 rows=1 width=4) Filter: (aid = 10) Planning time: 0.042 ms (3 rows) I am observing a too much delay in performance results. The performance test script is attached in the mail. please let me know if you find any problem in the test. Regards, Hari Babu Fujitsu Australia run_reading.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai kai...@ak.jp.nec.comwrote: Thanks for your testing, Getting some compilation warnings while compiling the extension and also I am not able to load the extension because of undefined symbol get_restriction_qual_cost. It seems to me you applied only part-1 portion of the custom-scan patches. The get_restriction_qual_cost() is re-defined as extern function, from static one, on the part-2 portion (ctidscan) to estimate cost value of the qualifiers in extension. Also, bms_to_string() and bms_from_string() are added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set according to the copyObject manner. It may make sense to include the above supplemental changes in the cache- scan feature also, until either of them getting upstreamed. Thanks for the information, I will apply other patches also and start testing. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Hello, The attached patch is a revised one for cache-only scan module on top of custom-scan interface. Please check it. Thanks for the revised patch. Please find some minor comments. 1. memcpy(dest, tuple, HEAPTUPLESIZE); + memcpy((char *)dest + HEAPTUPLESIZE, + tuple-t_data, tuple-t_len); For a normal tuple these two addresses are different but in case of ccache, it is a continuous memory. Better write a comment as even if it continuous memory, it is treated as different only. 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple-t_len); t_len is already maxaligned. No problem of using it again, The required length calculation is differing function to function. For example, in below part of the same function, the same t_len is used directly. It didn't generate any problem, but it may give some confusion. 4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid); + if (pchunk != NULL pchunk != cchunk) + ccache_merge_chunk(ccache, pchunk); + pchunk = cchunk; The merge_chunk is called only when the heap tuples are spread across two cache chunks. Actually one cache chunk can accommodate one or more than heap pages. it needs some other way of handling. 4. for (i=0; i 20; i++) Better to replace this magic number with a meaningful macro. 5. columner is present in sgml file. correct it. 6. max_cached_attnum value in the document saying as 128 by default but in the code it set as 256. I will start regress and performance tests. I will inform you the same once i finish. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches are fine. And I have comments in the third patch related to cache scan. Thanks for your volunteering. 1. +# contrib/dbcache/Makefile Makefile header comment is not matched with file name location. Ahh, it's an old name when I started to implement. 2.+ /* + * Estimation of average width of cached tuples - it does not make + * sense to construct a new cache if its average width is more than + * 30% of the raw data. + */ Move the estimation of average width calculation of cached tuples into the case where the cache is not found, otherwise it is an overhead for cache hit scenario. You are right. If and when existing cache is found and match, its width is obviously less than 30% of total width. 3. + if (old_cache) + attrs_used = bms_union(attrs_used, old_cache-attrs_used); can't we need the check to see the average width is more than 30%? During estimation it doesn't include the existing other attributes. Indeed. It should drop some attributes on the existing cache if total average width grows more than the threshold. Probably, we need to have a statistical variable to track how many times or how recently referenced. 4. + lchunk-right = cchunk; + lchunk-l_depth = TTREE_DEPTH(lchunk-right); I think it should be lchunk-r_depth needs to be set in a clock wise rotation. Oops, nice cache. 5. can you add some comments in the code with how the block is used? Sorry, I'll add it. A block is consumed from the head to store pointers of tuples, and from the tail to store contents of the tuples. A block can hold multiple tuples unless usage of tuple pointers from the head does not cross the area for tuple contents. Anyway, I'll put it on the source code. 6. In do_insert_tuple function I felt moving the tuples and rearranging their addresses is little bit costly. How about the following way? Always insert the tuple from the bottom of the block where the empty space is started and store their corresponding reference pointers in the starting of the block in an array. As and when the new tuple inserts this array increases from block start and tuples from block end. Just need to sort this array based on item pointers, no need to update their reference pointers. In this case the movement is required only when the tuple is moved from one block to another block and also whenever if the continuous free space is not available to insert the new tuple. you can decide based on how frequent the sorting will happen in general. It seems to me a reasonable suggestion. Probably, an easier implementation is replacing an old block with dead- spaces by a new block that contains only valid tuples, if and when dead- space grows threshold of block-usage. 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. 9. + /* merge chunks if this chunk has enough space to merge */ + ccache_merge_chunk(ccache, cchunk); calling the merge chunks for every call back of heap page prune is a overhead for vacuum. After the merge which may again leads to node splits because of new data. OK, I'll check the condition to merge the chunks, to prevent too frequent merge / split. 10. columner is present in some places of the patch. correct it. Ahh, it should be columnar. 11. In cache_scan_next function, incase of cache insert fails because of shared memory the tuple pointer is not reset and cache is NULL. Because of this during next record fetch it leads to assert as cache != NULL. You are right. I had to modify the state of scan as if normal-seqscan path, not just setting NULL on csstate-ccache. I left an older manner during try error during implementation. 12. + if (ccache-status != CCACHE_STATUS_IN_PROGRESS) + cs_put_ccache(ccache); The cache is created with refcnt as 2 and in some times two times put cache is called to eliminate it and in some times with a different approach. It is little bit confusing, can you explain in with comments with why 2 is required and how it
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. yes you are correct. sorry for the noise. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Well, concurrent updates towards a particular table may cause lock contention due to a giant lock. On the other hands, one my headache is how to avoid dead-locking if we try to implement it using finer granularity locking. Please assume per-chunk locking. It also needs to take a lock on the neighbor nodes when a record is moved out. Concurrently, some other process may try to move another record with inverse order. That is a ticket for dead-locking. Is there idea or reference to implement concurrent tree structure updating? Anyway, it is a good idea to measure the impact of concurrent updates on cached tables, to find out the significance of lock splitting. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Haribabu Kommi [mailto:kommi.harib...@gmail.com] Sent: Thursday, February 13, 2014 8:31 AM To: Kohei KaiGai Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. yes you are correct. sorry for the noise. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Hello, Because of time pressure in the commit-fest:Jan, I tried to simplifies the patch for cache-only scan into three portions; (1) add a hook on heap_page_prune for cache invalidation on vacuuming a particular page. (2) add a check to accept InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only scan. (1) pgsql-v9.4-heap_page_prune_hook.v1.patch Once on-memory columnar cache is constructed, then it needs to be invalidated if heap page on behalf of the cache is modified. In usual DML cases, extension can get control using row-level trigger functions for invalidation, however, we right now have no way to get control on a page is vacuumed, usually handled by autovacuum process. This patch adds a callback on heap_page_prune(), to allow extensions to prune dead entries on its cache, not only heap pages. I'd also like to see any other scenario we need to invalidate columnar cache entries, if exist. It seems to me object_access_hook makes sense to conver DDL and VACUUM FULL scenario... (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch In case when we want to check visibility of the tuples on cache entries (thus no particular shared buffer is associated) using HeapTupleSatisfiesVisibility, it internally tries to update hint bits of tuples. However, it does not make sense onto the tuples being not associated with a particular shared buffer. Due to its definition, tuple entries being on cache does not connected with a particular shared buffer. If we need to load whole of the buffer page to set hint bits, it is totally nonsense because the purpose of on-memory cache is to reduce disk accesses. This patch adds an exceptional condition on SetHintBits() to skip anything if the given buffer is InvalidBuffer. It allows to check tuple visibility using regular visibility check functions, without re-invention of the wheel by themselves. (3) pgsql-v9.4-contrib-cache-scan.v1.patch Unlike (1) and (2), this patch is just a proof of the concept to implement cache- only scan on top of the custom-scan interface. It tries to offer an alternative scan path on the table with row-level triggers for cache invalidation if total width of referenced columns are less than 30% of the total width of table definition. Thus, it can keep larger number of records with meaningful portion on the main memory. This cache shall be invalidated according to the main heap update. One is row-level trigger, second is object_access_hook on DDL, and the third is heap_page_prune hook. Once a columns reduced tuple gets cached, it is copied to the cache memory from the shared buffer, so it needs a feature to ignore InvalidBuffer for visibility check functions. I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches are fine. And I have comments in the third patch related to cache scan. 1. +# contrib/dbcache/Makefile Makefile header comment is not matched with file name location. 2.+ /* + * Estimation of average width of cached tuples - it does not make + * sense to construct a new cache if its average width is more than + * 30% of the raw data. + */ Move the estimation of average width calculation of cached tuples into the case where the cache is not found, otherwise it is an overhead for cache hit scenario. 3. + if (old_cache) + attrs_used = bms_union(attrs_used, old_cache-attrs_used); can't we need the check to see the average width is more than 30%? During estimation it doesn't include the existing other attributes. 4. + lchunk-right = cchunk; + lchunk-l_depth = TTREE_DEPTH(lchunk-right); I think it should be lchunk-r_depth needs to be set in a clock wise rotation. 5. can you add some comments in the code with how the block is used? 6. In do_insert_tuple function I felt moving the tuples and rearranging their addresses is little bit costly. How about the following way? Always insert the tuple from the bottom of the block where the empty space is started and store their corresponding reference pointers in the starting of the block in an array. As and when the new tuple inserts this array increases from block start and tuples from block end. Just need to sort this array based on item pointers, no need to update their reference pointers. In this case the movement is required only when the tuple is moved from one block to another block and also whenever if the continuous free space is not available to insert the new tuple. you can decide based on how frequent the sorting will happen in general. 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared