Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-17 Thread Haribabu Kommi
On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Hello,

 The attached patches are revised ones according to the latest custom-plan
 interface patch (v11).
 The cache-scan module was re-implemented on the newer interface, and also
 I noticed the extension does not handle the tuples being redirected correctly,
 So, I revised the logic in ccache_vacuum_page() totally. It now becomes to
 synchronize the cached tuples per page, not per tuple, basic and tries to
 merge t-tree chunks per page basis also.

 Also, I split the patches again because *demonstration* part is much larger
 than the patches to the core backend. It will help reviewing.
 * pgsql-v9.4-vacuum_page_hook.v11.patch
  - It adds a hook for each page being vacuumed; that needs to synchronize
 the status of in-memory cache managed by extension.
 * pgsql-v9.4-mvcc_allows_cache.v11.patch
  - It allows to run HeapTupleSatisfiesVisibility() towards the tuples
 on the in-memory cache, not on the heap.
 * pgsql-v9.4-example-cache_scan.v11.patch
  - It demonstrates the usage of above two patches. It allows to scan
 a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need
further refinement
in terms of performance improvement but the same can be handled later also.
So I am marking the patch as ready for committer. Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-17 Thread Kouhei Kaigai
  Also, I split the patches again because *demonstration* part is much
  larger than the patches to the core backend. It will help reviewing.
  * pgsql-v9.4-vacuum_page_hook.v11.patch
   - It adds a hook for each page being vacuumed; that needs to synchronize
  the status of in-memory cache managed by extension.
  * pgsql-v9.4-mvcc_allows_cache.v11.patch
   - It allows to run HeapTupleSatisfiesVisibility() towards the tuples
  on the in-memory cache, not on the heap.
  * pgsql-v9.4-example-cache_scan.v11.patch
   - It demonstrates the usage of above two patches. It allows to scan
  a relation without storage access if possible.
 
 All the patches are good. The cache scan extension patch may need further
 refinement in terms of performance improvement but the same can be handled
 later also.
 So I am marking the patch as ready for committer. Thanks for the patch.

Thanks for your dedicated efforts on these patches.

The smaller portions of above submission can be applied independent from the
custom-plan interface, and its scale is much smaller than contrib/ portion
(about 30lines in total).
If someone can pick them up individually from the extension portion, it also
makes sense. I intended to implement the extension portion as simple as I can,
for the demonstration purpose rather than performance, however, its scale is
about 2.5KL. :-(
Yes, I know the time pressure towards v9.4 final feature freeze

Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
 Sent: Tuesday, March 18, 2014 11:14 AM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Kohei KaiGai; Tom Lane; PgHacker; Robert Haas
 Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
 table scan?)
 
 On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
  Hello,
 
  The attached patches are revised ones according to the latest
  custom-plan interface patch (v11).
  The cache-scan module was re-implemented on the newer interface, and
  also I noticed the extension does not handle the tuples being
  redirected correctly, So, I revised the logic in ccache_vacuum_page()
  totally. It now becomes to synchronize the cached tuples per page, not
  per tuple, basic and tries to merge t-tree chunks per page basis also.
 
  Also, I split the patches again because *demonstration* part is much
  larger than the patches to the core backend. It will help reviewing.
  * pgsql-v9.4-vacuum_page_hook.v11.patch
   - It adds a hook for each page being vacuumed; that needs to synchronize
  the status of in-memory cache managed by extension.
  * pgsql-v9.4-mvcc_allows_cache.v11.patch
   - It allows to run HeapTupleSatisfiesVisibility() towards the tuples
  on the in-memory cache, not on the heap.
  * pgsql-v9.4-example-cache_scan.v11.patch
   - It demonstrates the usage of above two patches. It allows to scan
  a relation without storage access if possible.
 
 All the patches are good. The cache scan extension patch may need further
 refinement in terms of performance improvement but the same can be handled
 later also.
 So I am marking the patch as ready for committer. Thanks for the patch.
 
 Regards,
 Hari Babu
 Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-12 Thread Kouhei Kaigai
Thanks for your efforts!
  Head  patched
 Diff
 Select -  500K772ms2659ms-200%
 Insert - 400K   3429ms 1948ms  43% (I am
 not sure how it improved in this case)
 delete - 200K 2066ms 3978ms-92%
 update - 200K3915ms  5899ms-50%
 
 This patch shown how the custom scan can be used very well but coming to
 patch as It is having some performance problem which needs to be
 investigated.
 
 I attached the test script file used for the performance test.

First of all, it seems to me your test case has too small data set that
allows to hold all the data in memory - briefly 500K of 200bytes record
will consume about 100MB. Your configuration allocates 512MB of
shared_buffer, and about 3GB of OS-level page cache is available.
(Note that Linux uses free memory as disk cache adaptively.)

This cache is designed to hide latency of disk accesses, so this test
case does not fit its intention.
(Also, the primary purpose of this module is a demonstration for
heap_page_prune_hook to hook vacuuming, so simple code was preferred
than complicated implementation but better performance.)

I could reproduce the overall trend, no cache scan is faster than
cached scan if buffer is in memory. Probably, it comes from the
cost to walk down T-tree index using ctid per reference.
Performance penalty around UPDATE and DELETE likely come from
trigger invocation per row.
I could observe performance gain on INSERT a little bit.
It's strange for me, also. :-(

On the other hand, the discussion around custom-plan interface
effects this module because it uses this API as foundation.
Please wait for a few days to rebase the cache_scan module onto
the newer custom-plan interface; that I submitted just a moment
before.

Also, is it really necessary to tune the performance stuff in this
example module of the heap_page_prune_hook?
Even though I have a few ideas to improve the cache performance,
like insertion of multiple rows at once or local chunk copy instead
of t-tree walk down, I'm not sure whether it is productive in the
current v9.4 timeframe. ;-(

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi
 Sent: Wednesday, March 12, 2014 1:14 PM
 To: Kohei KaiGai
 Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas
 Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
 table scan?)
 
 On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
  I will update you later regarding the performance test results.
 
 
 I ran the performance test on the cache scan patch and below are the readings.
 
 Configuration:
 
 Shared_buffers - 512MB
 cache_scan.num_blocks - 600
 checkpoint_segments - 255
 
 Machine:
 OS - centos - 6.4
 CPU - 4 core 2.5 GHZ
 Memory - 4GB
 
  Head  patched
 Diff
 Select -  500K772ms2659ms-200%
 Insert - 400K   3429ms 1948ms  43% (I am
 not sure how it improved in this case)
 delete - 200K 2066ms 3978ms-92%
 update - 200K3915ms  5899ms-50%
 
 This patch shown how the custom scan can be used very well but coming to
 patch as It is having some performance problem which needs to be
 investigated.
 
 I attached the test script file used for the performance test.
 
 Regards,
 Hari Babu
 Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-12 Thread Haribabu Kommi
On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Thanks for your efforts!
  Head  patched
 Diff
 Select -  500K772ms2659ms-200%
 Insert - 400K   3429ms 1948ms  43% (I am
 not sure how it improved in this case)
 delete - 200K 2066ms 3978ms-92%
 update - 200K3915ms  5899ms-50%

 This patch shown how the custom scan can be used very well but coming to
 patch as It is having some performance problem which needs to be
 investigated.

 I attached the test script file used for the performance test.

 First of all, it seems to me your test case has too small data set that
 allows to hold all the data in memory - briefly 500K of 200bytes record
 will consume about 100MB. Your configuration allocates 512MB of
 shared_buffer, and about 3GB of OS-level page cache is available.
 (Note that Linux uses free memory as disk cache adaptively.)

Thanks for the information and a small correction. The Total number of
records are 5 million.
The select operation is selecting 500K records. The total table size
is around 1GB.

Once I get your new patch re-based on the custom scan patch, I will
test the performance
again by increasing my database size more than the RAM size. And also
I will make sure
that memory available for disk cache is less.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-11 Thread Haribabu Kommi
On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
 I will update you later regarding the performance test results.


I ran the performance test on the cache scan patch and below are the readings.

Configuration:

Shared_buffers - 512MB
cache_scan.num_blocks - 600
checkpoint_segments - 255

Machine:
OS - centos - 6.4
CPU - 4 core 2.5 GHZ
Memory - 4GB

 Head  patched  Diff
Select -  500K772ms2659ms-200%
Insert - 400K   3429ms 1948ms  43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms-92%
update - 200K3915ms  5899ms-50%

This patch shown how the custom scan can be used very well but coming
to patch as It is having
some performance problem which needs to be investigated.

I attached the test script file used for the performance test.

Regards,
Hari Babu
Fujitsu Australia
shared_buffers = 512MB
shared_preload_libraries = 'cache_scan'
cache_scan.num_blocks = 600
checkpoint_segments - 255

\timing

--cache scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1  0;
select count(*) from test where f1  0;
select count(*) from test where f1  0;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1  0;
select count(*) from test where f1  0;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;


--cache scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-06 Thread Haribabu Kommi
On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

   4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid);
   + if (pchunk != NULL  pchunk != cchunk)
  
   + ccache_merge_chunk(ccache, pchunk);
  
   + pchunk = cchunk;
  
  
 The merge_chunk is called only when the heap tuples are spread
   across two cache chunks. Actually one cache chunk can accommodate one
   or more than heap pages. it needs some other way of handling.
  
  I adjusted the logic to merge the chunks as follows:
 
  Once a tuple is vacuumed from a chunk, it also checks whether it can be
  merged with its child leafs. A chunk has up to two child leafs; left one
  has less ctid that the parent, and right one has greater ctid. It means
  a chunk without right child in the left sub-tree or a chunk without left
  child in the right sub-tree are neighbor of the chunk being vacuumed. In
  addition, if vacuumed chunk does not have either (or both) of children,
  it can be merged with parent node.
  I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down,
  if vacuumed chunk has enough free space.
 


Patch looks good.

Regarding merging of the nodes, instead of checking whether merge is
possible or not for every tuple which is vacuumed,
can we put some kind of threshold as whenever the node is 50% free then try
to merge it from leaf nodes until 90% is full.
The rest of the 10% will be left for the next inserts on the node. what do
you say?

I will update you later regarding the performance test results.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-06 Thread Kohei KaiGai
2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:

 On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

   4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid);
   + if (pchunk != NULL  pchunk != cchunk)
  
   + ccache_merge_chunk(ccache, pchunk);
  
   + pchunk = cchunk;
  
  
 The merge_chunk is called only when the heap tuples are spread
   across two cache chunks. Actually one cache chunk can accommodate one
   or more than heap pages. it needs some other way of handling.
  
  I adjusted the logic to merge the chunks as follows:
 
  Once a tuple is vacuumed from a chunk, it also checks whether it can be
  merged with its child leafs. A chunk has up to two child leafs; left one
  has less ctid that the parent, and right one has greater ctid. It means
  a chunk without right child in the left sub-tree or a chunk without left
  child in the right sub-tree are neighbor of the chunk being vacuumed. In
  addition, if vacuumed chunk does not have either (or both) of children,
  it can be merged with parent node.
  I modified ccache_vacuum_tuple() to merge chunks during t-tree
  walk-down,
  if vacuumed chunk has enough free space.
 


 Patch looks good.

Thanks for your volunteering.

 Regarding merging of the nodes, instead of checking whether merge is
 possible or not for every tuple which is vacuumed,
 can we put some kind of threshold as whenever the node is 50% free then try
 to merge it from leaf nodes until 90% is full.
 The rest of the 10% will be left for the next inserts on the node. what do
 you say?

Hmm. Indeed, it makes sense. How about an idea that kicks chunk merging
if expected free space of merged chunk is less than 50%?
If threshold depends on the (expected) usage of merged chunk, it can avoid
over-merging.

 I will update you later regarding the performance test results.

Thhanks,

Also, I'll rebase the patch on top of the new custom-scan interfaces
according to Tom's suggestion, even though main logic of cache_scan
is not changed.

Best regards,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-26 Thread Kouhei Kaigai
   Thanks for the information, I will apply other patches also and
 start testing.
 
 
 When try to run the pgbench test, by default the cache-scan plan is not
 chosen because of more cost. So I increased the cpu_index_tuple_cost to
 a maximum value or by turning off index_scan, so that the plan can chose
 the cache_scan as the least cost.
 
It's expected. In case of index-scan is available, its cost is obviously
cheaper than cache-scan, even if it does not issue disk-i/o.

 The configuration parameters changed during the test are,
 
 shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB,
 checkpoint_segments - 255 checkpoint_timeout - 15 min,
 cpu_index_tuple_cost - 10 or enable_indexscan=off
 
 Test procedure:
 1. Initialize the database with pgbench with 75 scale factor.
 2. Create the triggers on pgbench_accounts 3. Use a select query to load
 all the data into cache.
 4. Run  a simple update pgbench test.
 
 Plan details of pgbench simple update queries:
 
 postgres=# explain update pgbench_accounts set abalance = abalance where
 aid = 10;
 QUERY PLAN
 --
 -
  Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
-  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
  Index Cond: (aid = 10)
  Planning time: 0.045 ms
 (4 rows)
 
 postgres=# explain select abalance from pgbench_accounts where aid =
 10;
  QUERY PLAN
 --
 --
  Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99
 rows=1 width=4)
Filter: (aid = 10)
  Planning time: 0.042 ms
 (3 rows)
 
 I am observing a too much delay in performance results. The performance
 test script is attached in the mail.
 
I want you to compare two different cases between sequential scan but
part of buffers have to be loaded from storage and cache-only scan.
It probably takes a difference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-25 Thread Haribabu Kommi
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi
kommi.harib...@gmail.comwrote:

 Thanks for the information, I will apply other patches also and start
 testing.


When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to a
maximum value or by turning off index_scan, so that the plan can chose the
cache_scan as the least cost.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024
wal_buffers - 16MB, checkpoint_segments - 255
checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or
enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts
3. Use a select query to load all the data into cache.
4. Run  a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 10;
QUERY PLAN
---
 Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
 Index Cond: (aid = 10)
 Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid = 10;
 QUERY PLAN

 Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99 rows=1
width=4)
   Filter: (aid = 10)
 Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.
please let me know if you find any problem in the test.

Regards,
Hari Babu
Fujitsu Australia


run_reading.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-24 Thread Haribabu Kommi
On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai kai...@ak.jp.nec.comwrote:

 Thanks for your testing,

  Getting some compilation warnings while compiling the extension and also
  I am not able to load the extension because of undefined symbol
  get_restriction_qual_cost.
 
 It seems to me you applied only part-1 portion of the custom-scan patches.

 The get_restriction_qual_cost() is re-defined as extern function, from
 static one, on the part-2 portion (ctidscan) to estimate cost value of the
 qualifiers in extension. Also, bms_to_string() and bms_from_string() are
 added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set
 according to the copyObject manner.

 It may make sense to include the above supplemental changes in the cache-
 scan feature also, until either of them getting upstreamed.


Thanks for the information, I will apply other patches also and start
testing.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-23 Thread Haribabu Kommi
On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Hello,

 The attached patch is a revised one for cache-only scan module
 on top of custom-scan interface. Please check it.


Thanks for the revised patch.  Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,
+   tuple-t_data, tuple-t_len);

  For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
  Better write a comment as even if it continuous memory, it is treated as
different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple-t_len);

  t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
  For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid);
+ if (pchunk != NULL  pchunk != cchunk)
+ ccache_merge_chunk(ccache, pchunk);
+ pchunk = cchunk;

  The merge_chunk is called only when the heap tuples are spread across two
cache chunks. Actually one cache chunk can accommodate one or more than
 heap pages. it needs some other way of handling.

4. for (i=0; i  20; i++)

   Better to replace this magic number with a meaningful macro.

5. columner is present in sgml file. correct it.

6. max_cached_attnum value in the document saying as 128 by default but
in the code it set as 256.

I will start regress and performance tests. I will inform you the same once
i finish.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kohei KaiGai
2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
 I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
 are fine.
 And I have comments in the third patch related to cache scan.

Thanks for your volunteering.

 1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

Ahh, it's an old name when I started to implement.

 2.+   /*
 +   * Estimation of average width of cached tuples - it does not make
 +   * sense to construct a new cache if its average width is more than
 +   * 30% of the raw data.
 +   */

Move the estimation of average width calculation of cached tuples into
 the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

You are right. If and when existing cache is found and match, its width is
obviously less than 30% of total width.

 3. + if (old_cache)
 + attrs_used = bms_union(attrs_used, old_cache-attrs_used);

can't we need the check to see the average width is more than 30%? During
 estimation it doesn't
include the existing other attributes.

Indeed. It should drop some attributes on the existing cache if total average
width grows more than the threshold. Probably, we need to have a statistical
variable to track how many times or how recently referenced.

 4. + lchunk-right = cchunk;
 + lchunk-l_depth = TTREE_DEPTH(lchunk-right);

   I think it should be lchunk-r_depth needs to be set in a clock wise
 rotation.

Oops, nice cache.

 5. can you add some comments in the code with how the block is used?

Sorry, I'll add it. A block is consumed from the head to store pointers of
tuples, and from the tail to store contents of the tuples. A block can hold
multiple tuples unless usage of tuple pointers from the head does not cross
the area for tuple contents. Anyway, I'll put it on the source code.

 6. In do_insert_tuple function I felt moving the tuples and rearranging
 their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
 space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple
 inserts this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
 their reference pointers.

In this case the movement is required only when the tuple is moved from
 one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide based
 on how frequent the sorting will happen in general.

It seems to me a reasonable suggestion.
Probably, an easier implementation is replacing an old block with dead-
spaces by a new block that contains only valid tuples, if and when dead-
space grows threshold of block-usage.

 7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups); can
 go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
 direction.

It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk-ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal or
larger to the tuple being only cached, thus, it moves to the right or left node
according to the scan direction.

 8. I am not able to find a protection mechanism in insert/delete and etc of
 a tuple in Ttree. As this is a shared memory it can cause problems.

For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

 9. + /* merge chunks if this chunk has enough space to merge */
 + ccache_merge_chunk(ccache, cchunk);

   calling the merge chunks for every call back of heap page prune is a
 overhead for vacuum. After the merge which may again leads
   to node splits because of new data.

OK, I'll check the condition to merge the chunks, to prevent too frequent
merge / split.

 10. columner is present in some places of the patch. correct it.

Ahh, it should be columnar.

 11. In cache_scan_next function, incase of cache insert fails because of
 shared memory the tuple pointer is not reset and cache is NULL.
 Because of this during next record fetch it leads to assert as cache !=
 NULL.

You are right. I had to modify the state of scan as if normal-seqscan path,
not just setting NULL on csstate-ccache.
I left an older manner during try  error during implementation.

 12. + if (ccache-status != CCACHE_STATUS_IN_PROGRESS)
   + cs_put_ccache(ccache);

 The cache is created with refcnt as 2 and in some times two times put
 cache is called to eliminate it and in some times with a different approach.
 It is little bit confusing, can you explain in with comments with why 2
 is required and how it 

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Haribabu Kommi
On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
  7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups);
 can
  go wrong when only one tuple present in the block
 with the equal item pointer what we are searching in the forward scan
  direction.
 
 It shouldn't happen, because the first or second ItemPointerCompare will
 handle the condition. Please assume the cchunk-ntups == 1. In this case,
 any given ctid shall match either of them, because any ctid is less, equal
 or
 larger to the tuple being only cached, thus, it moves to the right or left
 node
 according to the scan direction.


yes you are correct. sorry for the noise.


   8. I am not able to find a protection mechanism in insert/delete and
 etc of
  a tuple in Ttree. As this is a shared memory it can cause problems.
 
 For design simplification, I put a giant lock per columnar-cache.
 So, routines in cscan.c acquires exclusive lwlock prior to invocation of
 ccache_insert_tuple / ccache_delete_tuple.


Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kouhei Kaigai
8. I am not able to find a protection mechanism in insert/delete
 and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.
   
 
   For design simplification, I put a giant lock per columnar-cache.
   So, routines in cscan.c acquires exclusive lwlock prior to
 invocation of
   ccache_insert_tuple / ccache_delete_tuple.
 
 
 Correct. But this lock can be a bottleneck for the concurrency. Better to
 analyze the same once we have the performance report.

Well, concurrent updates towards a particular table may cause lock contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try to
implement it using finer granularity locking. Please assume per-chunk locking.
It also needs to take a lock on the neighbor nodes when a record is moved out.
Concurrently, some other process may try to move another record with inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
 Sent: Thursday, February 13, 2014 8:31 AM
 To: Kohei KaiGai
 Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
 Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
 table scan?)
 
 On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 
 
   2014-02-12 14:59 GMT+09:00 Haribabu Kommi
 kommi.harib...@gmail.com:
 
7. In ccache_find_tuple function this Assert(i_min + 1 
 cchunk-ntups); can
 
go wrong when only one tuple present in the block
   with the equal item pointer what we are searching in the
 forward scan
direction.
   
 
   It shouldn't happen, because the first or second ItemPointerCompare
 will
   handle the condition. Please assume the cchunk-ntups == 1. In this
 case,
   any given ctid shall match either of them, because any ctid is less,
 equal or
   larger to the tuple being only cached, thus, it moves to the right
 or left node
   according to the scan direction.
 
 
 yes you are correct. sorry for the noise.
 
 
8. I am not able to find a protection mechanism in insert/delete
 and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.
   
 
   For design simplification, I put a giant lock per columnar-cache.
   So, routines in cscan.c acquires exclusive lwlock prior to
 invocation of
   ccache_insert_tuple / ccache_delete_tuple.
 
 
 Correct. But this lock can be a bottleneck for the concurrency. Better to
 analyze the same once we have the performance report.
 
 Regards,
 Hari Babu
 
 Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-11 Thread Haribabu Kommi
On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Hello,

 Because of time pressure in the commit-fest:Jan, I tried to simplifies the
 patch
 for cache-only scan into three portions; (1) add a hook on heap_page_prune
 for cache invalidation on vacuuming a particular page. (2) add a check to
 accept
 InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only
 scan.

 (1) pgsql-v9.4-heap_page_prune_hook.v1.patch
 Once on-memory columnar cache is constructed, then it needs to be
 invalidated
 if heap page on behalf of the cache is modified. In usual DML cases,
 extension
 can get control using row-level trigger functions for invalidation,
 however, we right
 now have no way to get control on a page is vacuumed, usually handled by
 autovacuum process.
 This patch adds a callback on heap_page_prune(), to allow extensions to
 prune
 dead entries on its cache, not only heap pages.
 I'd also like to see any other scenario we need to invalidate columnar
 cache
 entries, if exist. It seems to me object_access_hook makes sense to conver
 DDL and VACUUM FULL scenario...


 (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
 In case when we want to check visibility of the tuples on cache entries
 (thus
 no particular shared buffer is associated) using
 HeapTupleSatisfiesVisibility,
 it internally tries to update hint bits of tuples. However, it does
 not make sense
 onto the tuples being not associated with a particular shared buffer.
 Due to its definition, tuple entries being on cache does not connected with
 a particular shared buffer. If we need to load whole of the buffer page to
 set
 hint bits, it is totally nonsense because the purpose of on-memory cache is
 to reduce disk accesses.
 This patch adds an exceptional condition on SetHintBits() to skip anything
 if the given buffer is InvalidBuffer. It allows to check tuple
 visibility using regular
 visibility check functions, without re-invention of the wheel by
 themselves.


 (3) pgsql-v9.4-contrib-cache-scan.v1.patch
 Unlike (1) and (2), this patch is just a proof of the concept to
 implement cache-
 only scan on top of the custom-scan interface.
 It tries to offer an alternative scan path on the table with row-level
 triggers for
 cache invalidation if total width of referenced columns are less than 30%
 of the
 total width of table definition. Thus, it can keep larger number of
 records with
 meaningful portion on the main memory.
 This cache shall be invalidated according to the main heap update. One is
 row-level trigger, second is object_access_hook on DDL, and the third is
 heap_page_prune hook. Once a columns reduced tuple gets cached, it is
 copied to the cache memory from the shared buffer, so it needs a feature
 to ignore InvalidBuffer for visibility check functions.


I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

1. +# contrib/dbcache/Makefile

   Makefile header comment is not matched with file name location.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

   Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
   otherwise it is an overhead for cache hit scenario.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, old_cache-attrs_used);

   can't we need the check to see the average width is more than 30%?
During estimation it doesn't
   include the existing other attributes.

4. + lchunk-right = cchunk;
+ lchunk-l_depth = TTREE_DEPTH(lchunk-right);

  I think it should be lchunk-r_depth needs to be set in a clock wise
rotation.

5. can you add some comments in the code with how the block is used?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

   Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
   in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
   Just need to sort this array based on item pointers, no need to update
their reference pointers.

   In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
   free space is not available to insert the new tuple. you can decide
based on how frequent the sorting will happen in general.

7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups);
can go wrong when only one tuple present in the block
   with the equal item pointer what we are searching in the forward scan
direction.

8. I am not able to find a protection mechanism in insert/delete and etc of
a tuple in Ttree. As this is a shared