Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-17 Thread Kouhei Kaigai
> > Also, I split the patches again because *demonstration* part is much
> > larger than the patches to the core backend. It will help reviewing.
> > * pgsql-v9.4-vacuum_page_hook.v11.patch
> >  -> It adds a hook for each page being vacuumed; that needs to synchronize
> > the status of in-memory cache managed by extension.
> > * pgsql-v9.4-mvcc_allows_cache.v11.patch
> >  -> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
> > on the in-memory cache, not on the heap.
> > * pgsql-v9.4-example-cache_scan.v11.patch
> >  -> It demonstrates the usage of above two patches. It allows to scan
> > a relation without storage access if possible.
> 
> All the patches are good. The cache scan extension patch may need further
> refinement in terms of performance improvement but the same can be handled
> later also.
> So I am marking the patch as "ready for committer". Thanks for the patch.
>
Thanks for your dedicated efforts on these patches.

The smaller portions of above submission can be applied independent from the
custom-plan interface, and its scale is much smaller than contrib/ portion
(about 30lines in total).
If someone can pick them up individually from the extension portion, it also
makes sense. I intended to implement the extension portion as simple as I can,
for the demonstration purpose rather than performance, however, its scale is
about 2.5KL. :-(
Yes, I know the time pressure towards v9.4 final feature freeze

Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> Sent: Tuesday, March 18, 2014 11:14 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Kohei KaiGai; Tom Lane; PgHacker; Robert Haas
> Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
> table scan?)
> 
> On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai 
> wrote:
> > Hello,
> >
> > The attached patches are revised ones according to the latest
> > custom-plan interface patch (v11).
> > The cache-scan module was re-implemented on the newer interface, and
> > also I noticed the extension does not handle the tuples being
> > redirected correctly, So, I revised the logic in ccache_vacuum_page()
> > totally. It now becomes to synchronize the cached tuples per page, not
> > per tuple, basic and tries to merge t-tree chunks per page basis also.
> >
> > Also, I split the patches again because *demonstration* part is much
> > larger than the patches to the core backend. It will help reviewing.
> > * pgsql-v9.4-vacuum_page_hook.v11.patch
> >  -> It adds a hook for each page being vacuumed; that needs to synchronize
> > the status of in-memory cache managed by extension.
> > * pgsql-v9.4-mvcc_allows_cache.v11.patch
> >  -> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
> > on the in-memory cache, not on the heap.
> > * pgsql-v9.4-example-cache_scan.v11.patch
> >  -> It demonstrates the usage of above two patches. It allows to scan
> > a relation without storage access if possible.
> 
> All the patches are good. The cache scan extension patch may need further
> refinement in terms of performance improvement but the same can be handled
> later also.
> So I am marking the patch as "ready for committer". Thanks for the patch.
> 
> Regards,
> Hari Babu
> Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-17 Thread Haribabu Kommi
On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai  wrote:
> Hello,
>
> The attached patches are revised ones according to the latest custom-plan
> interface patch (v11).
> The cache-scan module was re-implemented on the newer interface, and also
> I noticed the extension does not handle the tuples being redirected correctly,
> So, I revised the logic in ccache_vacuum_page() totally. It now becomes to
> synchronize the cached tuples per page, not per tuple, basic and tries to
> merge t-tree chunks per page basis also.
>
> Also, I split the patches again because *demonstration* part is much larger
> than the patches to the core backend. It will help reviewing.
> * pgsql-v9.4-vacuum_page_hook.v11.patch
>  -> It adds a hook for each page being vacuumed; that needs to synchronize
> the status of in-memory cache managed by extension.
> * pgsql-v9.4-mvcc_allows_cache.v11.patch
>  -> It allows to run HeapTupleSatisfiesVisibility() towards the tuples
> on the in-memory cache, not on the heap.
> * pgsql-v9.4-example-cache_scan.v11.patch
>  -> It demonstrates the usage of above two patches. It allows to scan
> a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need
further refinement
in terms of performance improvement but the same can be handled later also.
So I am marking the patch as "ready for committer". Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-11 Thread Haribabu Kommi
On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai  wrote:
> Thanks for your efforts!
>>  Head  patched
>> Diff
>> Select -  500K772ms2659ms-200%
>> Insert - 400K   3429ms 1948ms  43% (I am
>> not sure how it improved in this case)
>> delete - 200K 2066ms 3978ms-92%
>> update - 200K3915ms  5899ms-50%
>>
>> This patch shown how the custom scan can be used very well but coming to
>> patch as It is having some performance problem which needs to be
>> investigated.
>>
>> I attached the test script file used for the performance test.
>>
> First of all, it seems to me your test case has too small data set that
> allows to hold all the data in memory - briefly 500K of 200bytes record
> will consume about 100MB. Your configuration allocates 512MB of
> shared_buffer, and about 3GB of OS-level page cache is available.
> (Note that Linux uses free memory as disk cache adaptively.)

Thanks for the information and a small correction. The Total number of
records are 5 million.
The select operation is selecting 500K records. The total table size
is around 1GB.

Once I get your new patch re-based on the custom scan patch, I will
test the performance
again by increasing my database size more than the RAM size. And also
I will make sure
that memory available for disk cache is less.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-11 Thread Kouhei Kaigai
Thanks for your efforts!
>  Head  patched
> Diff
> Select -  500K772ms2659ms-200%
> Insert - 400K   3429ms 1948ms  43% (I am
> not sure how it improved in this case)
> delete - 200K 2066ms 3978ms-92%
> update - 200K3915ms  5899ms-50%
> 
> This patch shown how the custom scan can be used very well but coming to
> patch as It is having some performance problem which needs to be
> investigated.
> 
> I attached the test script file used for the performance test.
>
First of all, it seems to me your test case has too small data set that
allows to hold all the data in memory - briefly 500K of 200bytes record
will consume about 100MB. Your configuration allocates 512MB of
shared_buffer, and about 3GB of OS-level page cache is available.
(Note that Linux uses free memory as disk cache adaptively.)

This cache is designed to hide latency of disk accesses, so this test
case does not fit its intention.
(Also, the primary purpose of this module is a demonstration for
heap_page_prune_hook to hook vacuuming, so simple code was preferred
than complicated implementation but better performance.)

I could reproduce the overall trend, no cache scan is faster than
cached scan if buffer is in memory. Probably, it comes from the
cost to walk down T-tree index using ctid per reference.
Performance penalty around UPDATE and DELETE likely come from
trigger invocation per row.
I could observe performance gain on INSERT a little bit.
It's strange for me, also. :-(

On the other hand, the discussion around custom-plan interface
effects this module because it uses this API as foundation.
Please wait for a few days to rebase the cache_scan module onto
the newer custom-plan interface; that I submitted just a moment
before.

Also, is it really necessary to tune the performance stuff in this
example module of the heap_page_prune_hook?
Even though I have a few ideas to improve the cache performance,
like insertion of multiple rows at once or local chunk copy instead
of t-tree walk down, I'm not sure whether it is productive in the
current v9.4 timeframe. ;-(

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi
> Sent: Wednesday, March 12, 2014 1:14 PM
> To: Kohei KaiGai
> Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas
> Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
> table scan?)
> 
> On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai  wrote:
> > 2014-03-06 18:17 GMT+09:00 Haribabu Kommi :
> >> I will update you later regarding the performance test results.
> >>
> 
> I ran the performance test on the cache scan patch and below are the readings.
> 
> Configuration:
> 
> Shared_buffers - 512MB
> cache_scan.num_blocks - 600
> checkpoint_segments - 255
> 
> Machine:
> OS - centos - 6.4
> CPU - 4 core 2.5 GHZ
> Memory - 4GB
> 
>  Head  patched
> Diff
> Select -  500K772ms2659ms-200%
> Insert - 400K   3429ms 1948ms  43% (I am
> not sure how it improved in this case)
> delete - 200K 2066ms 3978ms-92%
> update - 200K3915ms  5899ms-50%
> 
> This patch shown how the custom scan can be used very well but coming to
> patch as It is having some performance problem which needs to be
> investigated.
> 
> I attached the test script file used for the performance test.
> 
> Regards,
> Hari Babu
> Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-11 Thread Haribabu Kommi
On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai  wrote:
> 2014-03-06 18:17 GMT+09:00 Haribabu Kommi :
>> I will update you later regarding the performance test results.
>>

I ran the performance test on the cache scan patch and below are the readings.

Configuration:

Shared_buffers - 512MB
cache_scan.num_blocks - 600
checkpoint_segments - 255

Machine:
OS - centos - 6.4
CPU - 4 core 2.5 GHZ
Memory - 4GB

 Head  patched  Diff
Select -  500K772ms2659ms-200%
Insert - 400K   3429ms 1948ms  43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms-92%
update - 200K3915ms  5899ms-50%

This patch shown how the custom scan can be used very well but coming
to patch as It is having
some performance problem which needs to be investigated.

I attached the test script file used for the performance test.

Regards,
Hari Babu
Fujitsu Australia
shared_buffers = 512MB
shared_preload_libraries = 'cache_scan'
cache_scan.num_blocks = 600
checkpoint_segments - 255

\timing

--cache scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1 > 0;
select count(*) from test where f1 > 0;
select count(*) from test where f1 > 0;

delete from test where f1 > 100 and f1 <= 120;

update test set f1 = f1 + 300 where f1 > 120 and f1 <= 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1 > 0;
select count(*) from test where f1 > 0;

delete from test where f1 > 100 and f1 <= 120;

update test set f1 = f1 + 300 where f1 > 120 and f1 <= 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;


--cache scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1 > 100 and f1 <= 120;

update test set f1 = f1 + 300 where f1 > 120 and f1 <= 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1 > 100 and f1 <= 120;

update test set f1 = f1 + 300 where f1 > 120 and f1 <= 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-06 Thread Kohei KaiGai
2014-03-06 18:17 GMT+09:00 Haribabu Kommi :
>
> On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai  wrote:
>>
>> > > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
>> > > + if (pchunk != NULL && pchunk != cchunk)
>> > >
>> > > + ccache_merge_chunk(ccache, pchunk);
>> > >
>> > > + pchunk = cchunk;
>> > >
>> > >
>> > >   The merge_chunk is called only when the heap tuples are spread
>> > > across two cache chunks. Actually one cache chunk can accommodate one
>> > > or more than heap pages. it needs some other way of handling.
>> > >
>> > I adjusted the logic to merge the chunks as follows:
>> >
>> > Once a tuple is vacuumed from a chunk, it also checks whether it can be
>> > merged with its child leafs. A chunk has up to two child leafs; left one
>> > has less ctid that the parent, and right one has greater ctid. It means
>> > a chunk without right child in the left sub-tree or a chunk without left
>> > child in the right sub-tree are neighbor of the chunk being vacuumed. In
>> > addition, if vacuumed chunk does not have either (or both) of children,
>> > it can be merged with parent node.
>> > I modified ccache_vacuum_tuple() to merge chunks during t-tree
>> > walk-down,
>> > if vacuumed chunk has enough free space.
>> >
>
>
> Patch looks good.
>
Thanks for your volunteering.

> Regarding merging of the nodes, instead of checking whether merge is
> possible or not for every tuple which is vacuumed,
> can we put some kind of threshold as whenever the node is 50% free then try
> to merge it from leaf nodes until 90% is full.
> The rest of the 10% will be left for the next inserts on the node. what do
> you say?
>
Hmm. Indeed, it makes sense. How about an idea that kicks chunk merging
if "expected" free space of merged chunk is less than 50%?
If threshold depends on the (expected) usage of merged chunk, it can avoid
over-merging.

> I will update you later regarding the performance test results.
>
Thhanks,

Also, I'll rebase the patch on top of the new custom-scan interfaces
according to Tom's suggestion, even though main logic of cache_scan
is not changed.

Best regards,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-06 Thread Haribabu Kommi
On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai  wrote:

> > > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
> > > + if (pchunk != NULL && pchunk != cchunk)
> > >
> > > + ccache_merge_chunk(ccache, pchunk);
> > >
> > > + pchunk = cchunk;
> > >
> > >
> > >   The merge_chunk is called only when the heap tuples are spread
> > > across two cache chunks. Actually one cache chunk can accommodate one
> > > or more than heap pages. it needs some other way of handling.
> > >
> > I adjusted the logic to merge the chunks as follows:
> >
> > Once a tuple is vacuumed from a chunk, it also checks whether it can be
> > merged with its child leafs. A chunk has up to two child leafs; left one
> > has less ctid that the parent, and right one has greater ctid. It means
> > a chunk without right child in the left sub-tree or a chunk without left
> > child in the right sub-tree are neighbor of the chunk being vacuumed. In
> > addition, if vacuumed chunk does not have either (or both) of children,
> > it can be merged with parent node.
> > I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down,
> > if vacuumed chunk has enough free space.
> >
>

Patch looks good.

Regarding merging of the nodes, instead of checking whether merge is
possible or not for every tuple which is vacuumed,
can we put some kind of threshold as whenever the node is 50% free then try
to merge it from leaf nodes until 90% is full.
The rest of the 10% will be left for the next inserts on the node. what do
you say?

I will update you later regarding the performance test results.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-26 Thread Kouhei Kaigai
>   Thanks for the information, I will apply other patches also and
> start testing.
> 
> 
> When try to run the pgbench test, by default the cache-scan plan is not
> chosen because of more cost. So I increased the cpu_index_tuple_cost to
> a maximum value or by turning off index_scan, so that the plan can chose
> the cache_scan as the least cost.
> 
It's expected. In case of index-scan is available, its cost is obviously
cheaper than cache-scan, even if it does not issue disk-i/o.

> The configuration parameters changed during the test are,
> 
> shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB,
> checkpoint_segments - 255 checkpoint_timeout - 15 min,
> cpu_index_tuple_cost - 10 or enable_indexscan=off
> 
> Test procedure:
> 1. Initialize the database with pgbench with 75 scale factor.
> 2. Create the triggers on pgbench_accounts 3. Use a select query to load
> all the data into cache.
> 4. Run  a simple update pgbench test.
> 
> Plan details of pgbench simple update queries:
> 
> postgres=# explain update pgbench_accounts set abalance = abalance where
> aid = 10;
> QUERY PLAN
> --
> -
>  Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
>->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
> (cost=0.43..18.44 rows=1 width=103)
>  Index Cond: (aid = 10)
>  Planning time: 0.045 ms
> (4 rows)
> 
> postgres=# explain select abalance from pgbench_accounts where aid =
> 10;
>  QUERY PLAN
> --
> --
>  Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99
> rows=1 width=4)
>Filter: (aid = 10)
>  Planning time: 0.042 ms
> (3 rows)
> 
> I am observing a too much delay in performance results. The performance
> test script is attached in the mail.
> 
I want you to compare two different cases between sequential scan but
part of buffers have to be loaded from storage and cache-only scan.
It probably takes a difference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-25 Thread Haribabu Kommi
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi
wrote:

> Thanks for the information, I will apply other patches also and start
> testing.
>

When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to a
maximum value or by turning off index_scan, so that the plan can chose the
cache_scan as the least cost.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024
wal_buffers - 16MB, checkpoint_segments - 255
checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or
enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts
3. Use a select query to load all the data into cache.
4. Run  a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 10;
QUERY PLAN
---
 Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
 Index Cond: (aid = 10)
 Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid = 10;
 QUERY PLAN

 Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99 rows=1
width=4)
   Filter: (aid = 10)
 Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.
please let me know if you find any problem in the test.

Regards,
Hari Babu
Fujitsu Australia


run_reading.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-24 Thread Haribabu Kommi
On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai wrote:

> Thanks for your testing,
>
> > Getting some compilation warnings while compiling the extension and also
> > I am not able to load the extension because of undefined symbol
> > "get_restriction_qual_cost".
> >
> It seems to me you applied only part-1 portion of the custom-scan patches.
>
> The get_restriction_qual_cost() is re-defined as extern function, from
> static one, on the part-2 portion (ctidscan) to estimate cost value of the
> qualifiers in extension. Also, bms_to_string() and bms_from_string() are
> added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set
> according to the copyObject manner.
>
> It may make sense to include the above supplemental changes in the cache-
> scan feature also, until either of them getting upstreamed.


Thanks for the information, I will apply other patches also and start
testing.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-23 Thread Haribabu Kommi
On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai  wrote:

> Hello,
>
> The attached patch is a revised one for cache-only scan module
> on top of custom-scan interface. Please check it.
>

Thanks for the revised patch.  Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,
+   tuple->t_data, tuple->t_len);

  For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
  Better write a comment as even if it continuous memory, it is treated as
different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len);

  t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
  For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid);
+ if (pchunk != NULL && pchunk != cchunk)
+ ccache_merge_chunk(ccache, pchunk);
+ pchunk = cchunk;

  The merge_chunk is called only when the heap tuples are spread across two
cache chunks. Actually one cache chunk can accommodate one or more than
 heap pages. it needs some other way of handling.

4. for (i=0; i < 20; i++)

   Better to replace this magic number with a meaningful macro.

5. "columner" is present in sgml file. correct it.

6. "max_cached_attnum" value in the document saying as 128 by default but
in the code it set as 256.

I will start regress and performance tests. I will inform you the same once
i finish.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kouhei Kaigai
>   > 8. I am not able to find a protection mechanism in insert/delete
> and etc of
>   > a tuple in Ttree. As this is a shared memory it can cause problems.
>   >
> 
>   For design simplification, I put a giant lock per columnar-cache.
>   So, routines in cscan.c acquires exclusive lwlock prior to
> invocation of
>   ccache_insert_tuple / ccache_delete_tuple.
> 
> 
> Correct. But this lock can be a bottleneck for the concurrency. Better to
> analyze the same once we have the performance report.
>
Well, concurrent updates towards a particular table may cause lock contention
due to a giant lock.
On the other hands, one my headache is how to avoid dead-locking if we try to
implement it using finer granularity locking. Please assume per-chunk locking.
It also needs to take a lock on the neighbor nodes when a record is moved out.
Concurrently, some other process may try to move another record with inverse
order. That is a ticket for dead-locking.

Is there idea or reference to implement concurrent tree structure updating?

Anyway, it is a good idea to measure the impact of concurrent updates on
cached tables, to find out the significance of lock splitting.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> Sent: Thursday, February 13, 2014 8:31 AM
> To: Kohei KaiGai
> Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas
> Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
> table scan?)
> 
> On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai  wrote:
> 
> 
>   2014-02-12 14:59 GMT+09:00 Haribabu Kommi
> :
> 
>   > 7. In ccache_find_tuple function this Assert(i_min + 1 <
> cchunk->ntups); can
> 
>   > go wrong when only one tuple present in the block
>   >with the equal item pointer what we are searching in the
> forward scan
>   > direction.
>   >
> 
>   It shouldn't happen, because the first or second ItemPointerCompare
> will
>   handle the condition. Please assume the cchunk->ntups == 1. In this
> case,
>   any given ctid shall match either of them, because any ctid is less,
> equal or
>   larger to the tuple being only cached, thus, it moves to the right
> or left node
>   according to the scan direction.
> 
> 
> yes you are correct. sorry for the noise.
> 
> 
>   > 8. I am not able to find a protection mechanism in insert/delete
> and etc of
>   > a tuple in Ttree. As this is a shared memory it can cause problems.
>   >
> 
>   For design simplification, I put a giant lock per columnar-cache.
>   So, routines in cscan.c acquires exclusive lwlock prior to
> invocation of
>   ccache_insert_tuple / ccache_delete_tuple.
> 
> 
> Correct. But this lock can be a bottleneck for the concurrency. Better to
> analyze the same once we have the performance report.
> 
> Regards,
> Hari Babu
> 
> Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Haribabu Kommi
On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai  wrote:

> 2014-02-12 14:59 GMT+09:00 Haribabu Kommi :
> > 7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups);
> can
> > go wrong when only one tuple present in the block
> >with the equal item pointer what we are searching in the forward scan
> > direction.
> >
> It shouldn't happen, because the first or second ItemPointerCompare will
> handle the condition. Please assume the cchunk->ntups == 1. In this case,
> any given ctid shall match either of them, because any ctid is less, equal
> or
> larger to the tuple being only cached, thus, it moves to the right or left
> node
> according to the scan direction.


yes you are correct. sorry for the noise.


>  > 8. I am not able to find a protection mechanism in insert/delete and
> etc of
> > a tuple in Ttree. As this is a shared memory it can cause problems.
> >
> For design simplification, I put a giant lock per columnar-cache.
> So, routines in cscan.c acquires exclusive lwlock prior to invocation of
> ccache_insert_tuple / ccache_delete_tuple.


Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu
Fujitsu Australia


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Kohei KaiGai
2014-02-12 14:59 GMT+09:00 Haribabu Kommi :
> I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
> are fine.
> And I have comments in the third patch related to cache scan.
>
Thanks for your volunteering.

> 1. +# contrib/dbcache/Makefile
>
>Makefile header comment is not matched with file name location.
>
Ahh, it's an old name when I started to implement.

> 2.+   /*
> +   * Estimation of average width of cached tuples - it does not make
> +   * sense to construct a new cache if its average width is more than
> +   * 30% of the raw data.
> +   */
>
>Move the estimation of average width calculation of cached tuples into
> the case where the cache is not found,
>otherwise it is an overhead for cache hit scenario.
>
You are right. If and when existing cache is found and match, its width is
obviously less than 30% of total width.

> 3. + if (old_cache)
> + attrs_used = bms_union(attrs_used, &old_cache->attrs_used);
>
>can't we need the check to see the average width is more than 30%? During
> estimation it doesn't
>include the existing other attributes.
>
Indeed. It should drop some attributes on the existing cache if total average
width grows more than the threshold. Probably, we need to have a statistical
variable to track how many times or how recently referenced.

> 4. + lchunk->right = cchunk;
> + lchunk->l_depth = TTREE_DEPTH(lchunk->right);
>
>   I think it should be lchunk->r_depth needs to be set in a clock wise
> rotation.
>
Oops, nice cache.

> 5. can you add some comments in the code with how the block is used?
>
Sorry, I'll add it. A block is consumed from the head to store pointers of
tuples, and from the tail to store contents of the tuples. A block can hold
multiple tuples unless usage of tuple pointers from the head does not cross
the area for tuple contents. Anyway, I'll put it on the source code.

> 6. In do_insert_tuple function I felt moving the tuples and rearranging
> their addresses is little bit costly. How about the following way?
>
>Always insert the tuple from the bottom of the block where the empty
> space is started and store their corresponding reference pointers
>in the starting of the block in an array. As and when the new tuple
> inserts this array increases from block start and tuples from block end.
>Just need to sort this array based on item pointers, no need to update
> their reference pointers.
>
>In this case the movement is required only when the tuple is moved from
> one block to another block and also whenever if the continuous
>free space is not available to insert the new tuple. you can decide based
> on how frequent the sorting will happen in general.
>
It seems to me a reasonable suggestion.
Probably, an easier implementation is replacing an old block with dead-
spaces by a new block that contains only valid tuples, if and when dead-
space grows threshold of block-usage.

> 7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups); can
> go wrong when only one tuple present in the block
>with the equal item pointer what we are searching in the forward scan
> direction.
>
It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk->ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal or
larger to the tuple being only cached, thus, it moves to the right or left node
according to the scan direction.

> 8. I am not able to find a protection mechanism in insert/delete and etc of
> a tuple in Ttree. As this is a shared memory it can cause problems.
>
For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

> 9. + /* merge chunks if this chunk has enough space to merge */
> + ccache_merge_chunk(ccache, cchunk);
>
>   calling the merge chunks for every call back of heap page prune is a
> overhead for vacuum. After the merge which may again leads
>   to node splits because of new data.
>
OK, I'll check the condition to merge the chunks, to prevent too frequent
merge / split.

> 10. "columner" is present in some places of the patch. correct it.
>
Ahh, it should be "columnar".

> 11. In cache_scan_next function, incase of cache insert fails because of
> shared memory the tuple pointer is not reset and cache is NULL.
> Because of this during next record fetch it leads to assert as cache !=
> NULL.
>
You are right. I had to modify the state of scan as if normal-seqscan path,
not just setting NULL on csstate->ccache.
I left an older manner during try & error during implementation.

> 12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
>   + cs_put_ccache(ccache);
>
> The cache is created with refcnt as 2 and in some times two times put
> cache is called to eliminate it and in some times with a different approach.
> It is little bit confusing, can 

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-11 Thread Haribabu Kommi
On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai  wrote:

> Hello,
>
> Because of time pressure in the commit-fest:Jan, I tried to simplifies the
> patch
> for cache-only scan into three portions; (1) add a hook on heap_page_prune
> for cache invalidation on vacuuming a particular page. (2) add a check to
> accept
> InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only
> scan.
>
> (1) pgsql-v9.4-heap_page_prune_hook.v1.patch
> Once on-memory columnar cache is constructed, then it needs to be
> invalidated
> if heap page on behalf of the cache is modified. In usual DML cases,
> extension
> can get control using row-level trigger functions for invalidation,
> however, we right
> now have no way to get control on a page is vacuumed, usually handled by
> autovacuum process.
> This patch adds a callback on heap_page_prune(), to allow extensions to
> prune
> dead entries on its cache, not only heap pages.
> I'd also like to see any other scenario we need to invalidate columnar
> cache
> entries, if exist. It seems to me object_access_hook makes sense to conver
> DDL and VACUUM FULL scenario...
>
>
> (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
> In case when we want to check visibility of the tuples on cache entries
> (thus
> no particular shared buffer is associated) using
> HeapTupleSatisfiesVisibility,
> it internally tries to update hint bits of tuples. However, it does
> not make sense
> onto the tuples being not associated with a particular shared buffer.
> Due to its definition, tuple entries being on cache does not connected with
> a particular shared buffer. If we need to load whole of the buffer page to
> set
> hint bits, it is totally nonsense because the purpose of on-memory cache is
> to reduce disk accesses.
> This patch adds an exceptional condition on SetHintBits() to skip anything
> if the given buffer is InvalidBuffer. It allows to check tuple
> visibility using regular
> visibility check functions, without re-invention of the wheel by
> themselves.
>
>
> (3) pgsql-v9.4-contrib-cache-scan.v1.patch
> Unlike (1) and (2), this patch is just a proof of the concept to
> implement cache-
> only scan on top of the custom-scan interface.
> It tries to offer an alternative scan path on the table with row-level
> triggers for
> cache invalidation if total width of referenced columns are less than 30%
> of the
> total width of table definition. Thus, it can keep larger number of
> records with
> meaningful portion on the main memory.
> This cache shall be invalidated according to the main heap update. One is
> row-level trigger, second is object_access_hook on DDL, and the third is
> heap_page_prune hook. Once a columns reduced tuple gets cached, it is
> copied to the cache memory from the shared buffer, so it needs a feature
> to ignore InvalidBuffer for visibility check functions.
>

I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

1. +# contrib/dbcache/Makefile

   Makefile header comment is not matched with file name location.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

   Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
   otherwise it is an overhead for cache hit scenario.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, &old_cache->attrs_used);

   can't we need the check to see the average width is more than 30%?
During estimation it doesn't
   include the existing other attributes.

4. + lchunk->right = cchunk;
+ lchunk->l_depth = TTREE_DEPTH(lchunk->right);

  I think it should be lchunk->r_depth needs to be set in a clock wise
rotation.

5. can you add some comments in the code with how the block is used?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

   Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
   in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
   Just need to sort this array based on item pointers, no need to update
their reference pointers.

   In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
   free space is not available to insert the new tuple. you can decide
based on how frequent the sorting will happen in general.

7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups);
can go wrong when only one tuple present in the block
   with the equal item pointer what we are searching in the forward scan
direction.

8. I am not able to find a protection mechanism in insert/delete 

Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Kohei KaiGai
2013/11/12 Tom Lane :
> Kohei KaiGai  writes:
>> So, are you thinking it is a feasible approach to focus on custom-scan
>> APIs during the upcoming CF3, then table-caching feature as use-case
>> of this APIs on CF4?
>
> Sure.  If you work on this extension after CF3, and it reveals that the
> custom scan stuff needs some adjustments, there would be time to do that
> in CF4.  The policy about what can be submitted in CF4 is that we don't
> want new major features that no one has seen before, not that you can't
> make fixes to previously submitted stuff.  Something like a new hook
> in vacuum wouldn't be a "major feature", anyway.
>
Thanks for this clarification.
3 days are too short to write a patch, however, 2 month may be sufficient
to develop a feature on top of the scheme being discussed in the previous
comitfest.

Best regards,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Tom Lane
Kohei KaiGai  writes:
> So, are you thinking it is a feasible approach to focus on custom-scan
> APIs during the upcoming CF3, then table-caching feature as use-case
> of this APIs on CF4?

Sure.  If you work on this extension after CF3, and it reveals that the
custom scan stuff needs some adjustments, there would be time to do that
in CF4.  The policy about what can be submitted in CF4 is that we don't
want new major features that no one has seen before, not that you can't
make fixes to previously submitted stuff.  Something like a new hook
in vacuum wouldn't be a "major feature", anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Kohei KaiGai
2013/11/12 Tom Lane :
> Kohei KaiGai  writes:
>> 2013/11/12 Tom Lane :
>>> There's no possible way you'll finish this for 9.4.
>
>> Yes, I understand it is not possible to submit whole of the patch until
>> CF3 deadline. So, I'd like to find out a way to implement it as an
>> extension using facilities being supported or to be enhanced on v9.4.
>
> Oh!  Okay, I misunderstood the context --- you meant this as an example
> use-case for the custom plan feature, right?  Makes more sense now.
>
> I'm still dubious that it'd actually be very useful in itself, but it
> seems reasonable as a test case to make sure that a set of hooks for
> custom plans are sufficient to do something useful with.
>
Yes. I intend to put most of this table-caching feature on the custom-scan
APIs set, even though it may take additional hooks due to its nature,
independent from planner and executor structure.

So, are you thinking it is a feasible approach to focus on custom-scan
APIs during the upcoming CF3, then table-caching feature as use-case
of this APIs on CF4?

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Kohei KaiGai
2013/11/12 Claudio Freire :
> On Tue, Nov 12, 2013 at 11:45 AM, Kohei KaiGai  wrote:
>> Hello,
>>
>> It is a brief design proposal of a feature I'd like to implement on top of
>> custom-scan APIs. Because it (probably) requires a few additional base
>> features not only custom-scan, I'd like to see feedback from the hackers.
>>
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>> It shall allow to scan a particular table without storage access, thus
>> make scan performance improved.
>> So what? Which is difference from large shared_buffers configuration?
>> This mechanism intends to cache a part of columns being referenced
>> in the query, not whole of the records. It makes sense to the workloads
>> that scan a table with many columns but qualifier references just a few
>> columns, typically used to analytic queries, because it enables to
>> reduce memory consumption to be cached, thus more number of records
>> can be cached.
>> In addition, it has another role from my standpoint. It also performs as
>> fast data supplier towards GPU/MIC devices. When we move data to
>> GPU device, the source address has to be a region marked as "page-
>> locked" that is exempted from concurrent swap out, if we want CUDA
>> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>
>
> Wouldn't a columnar heap format be a better solution to this?
>
I've implemented using FDW, however, it requires application adjust its SQL
to replace "CREATE TABLE" by "CREATE FOREIGN TABLE". In addition,
it lost a good feature of regular heap, like index scan if its cost is smaller
than sequential columnar scan.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Claudio Freire
On Tue, Nov 12, 2013 at 11:45 AM, Kohei KaiGai  wrote:
> Hello,
>
> It is a brief design proposal of a feature I'd like to implement on top of
> custom-scan APIs. Because it (probably) requires a few additional base
> features not only custom-scan, I'd like to see feedback from the hackers.
>
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.
> It shall allow to scan a particular table without storage access, thus
> make scan performance improved.
> So what? Which is difference from large shared_buffers configuration?
> This mechanism intends to cache a part of columns being referenced
> in the query, not whole of the records. It makes sense to the workloads
> that scan a table with many columns but qualifier references just a few
> columns, typically used to analytic queries, because it enables to
> reduce memory consumption to be cached, thus more number of records
> can be cached.
> In addition, it has another role from my standpoint. It also performs as
> fast data supplier towards GPU/MIC devices. When we move data to
> GPU device, the source address has to be a region marked as "page-
> locked" that is exempted from concurrent swap out, if we want CUDA
> or OpenCL to run asynchronous DMA transfer mode; the fastest one.


Wouldn't a columnar heap format be a better solution to this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Tom Lane
Kohei KaiGai  writes:
> 2013/11/12 Tom Lane :
>> There's no possible way you'll finish this for 9.4.

> Yes, I understand it is not possible to submit whole of the patch until
> CF3 deadline. So, I'd like to find out a way to implement it as an
> extension using facilities being supported or to be enhanced on v9.4.

Oh!  Okay, I misunderstood the context --- you meant this as an example
use-case for the custom plan feature, right?  Makes more sense now.

I'm still dubious that it'd actually be very useful in itself, but it
seems reasonable as a test case to make sure that a set of hooks for
custom plans are sufficient to do something useful with.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Kohei KaiGai
2013/11/12 Robert Haas :
> On Tue, Nov 12, 2013 at 9:45 AM, Kohei KaiGai  wrote:
>> It is a brief design proposal of a feature I'd like to implement on top of
>> custom-scan APIs. Because it (probably) requires a few additional base
>> features not only custom-scan, I'd like to see feedback from the hackers.
>>
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>> It shall allow to scan a particular table without storage access, thus
>> make scan performance improved.
>> So what? Which is difference from large shared_buffers configuration?
>> This mechanism intends to cache a part of columns being referenced
>> in the query, not whole of the records. It makes sense to the workloads
>> that scan a table with many columns but qualifier references just a few
>> columns, typically used to analytic queries, because it enables to
>> reduce memory consumption to be cached, thus more number of records
>> can be cached.
>> In addition, it has another role from my standpoint. It also performs as
>> fast data supplier towards GPU/MIC devices. When we move data to
>> GPU device, the source address has to be a region marked as "page-
>> locked" that is exempted from concurrent swap out, if we want CUDA
>> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>>
>> Probably, here is no problem on construction of this table cache.
>> All we need to do is inject a custom-scan node instead of seq-scan,
>> then it can construct table cache in concurrence with regular seq-
>> scan, even though the first access become a little bit slow down.
>>
>> My concern is how to handle a case when table gets modified.
>> A straightforward idea is that each cached entries being modified
>> shall be invalidated by callback mechanism.
>> Trigger can help in case of INSERT, UPDATE, DELETE and
>> TRUNCATE. Furthermore, it's better if extension could inject
>> its own trigger definition at RelationBuildTriggers() on the fly,
>> to perform the feature transparently.
>> On the other hand, we have no way to get control around VACUUM.
>> I want to have a hook that allows extensions to get control when
>> a page got vacuumed. Once we have such a hook, it enables to
>> invalidate cached entries being indexed by tid, but already vacuumed.
>> Its best location is probably lazy_scan_heap() to call back extension
>> for each page getting vacuumed, with
>>
>> How about your opinion?
>
> I think it's hard to say in the abstract.  I'd suggest adding the
> hooks you feel like you need and see what the resulting patch looks
> like.  Then post that and we can talk about how (and whether) to do it
> better.  My personal bet is that triggers are the wrong way to do
> something like this; I'd look to do it all with hooks.  Of course,
> figuring how to position those hooks so that they are maintainable and
> don't affect performance when not used is the tricky part.
>
Yep, right now, all the idea is still in my brain, so it takes additional
times to make up it as a patch form.
The reason why I'm inclined to use trigger is, it is already supported
on the place I want to get control. So, it enables to omit efforts to
add and maintenance similar features in the limited time slot.

>> I'd like to find out the best way to implement this table-caching
>> mechanism within scope of v9.4 functionality set.
>> Any ideas, comments or suggestions are welcome.
>
> I think that Tom is right: there's not time to get something like this
> done for 9.4.  If you start working relatively soon, I think you could
> hope to have a good proposal in time for 9.5.
>
Yes, I agree his suggestion is realistic. It is nearly impossible to
get whole of the table-cache mechanism by CF3 deadline this week.
So, I assume to implement it as extension using facilities in v9.4.

It seems to me the last piece to implement this feature (of course,
custom-scan is pre-requisite) is a hook in vacuum.
Does it make sense to propose something other worth but small
contrib module that utilizes this last piece? For example, a contrib
module that reports progress of vacuum...

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Kohei KaiGai
2013/11/12 Tom Lane :
> Kohei KaiGai  writes:
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>
> This seems like a pretty dubious idea to me --- you're talking about
> expending large amounts of memory on a single-purpose cache, without
> any clear way to know if the data will ever be used before it gets
> invalidated (and the life expectancy of the cache doesn't sound like
> it'd be very high either).
>
Thanks for your comments. I assume this table-cache shall be applied
on tables that holds data set for analytic workloads. Even though it
may consume large amount of memory, it will hit its major workload
in this use scenario.
Probably, it will have upper limit of cache memory usage, and need
a mechanism to reclaim the cache block from the oldest one. Here
is nothing special. Also, even I call it "table cache", it is similar to
in-memory database being supported by rich memory hardware.

>> I'd like to find out the best way to implement this table-caching
>> mechanism within scope of v9.4 functionality set.
>
> There's no possible way you'll finish this for 9.4.  Keep in mind that
> CF3 starts Friday, and CF4 starts 2014-01-15, and project policy is that
> major feature patches should arrive (in at least rough form) by CF3.
> Even ignoring that policy, this is more than 2 months worth of work.
>
Yes, I understand it is not possible to submit whole of the patch until
CF3 deadline. So, I'd like to find out a way to implement it as an
extension using facilities being supported or to be enhanced on v9.4.

Best regards,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Robert Haas
On Tue, Nov 12, 2013 at 9:45 AM, Kohei KaiGai  wrote:
> It is a brief design proposal of a feature I'd like to implement on top of
> custom-scan APIs. Because it (probably) requires a few additional base
> features not only custom-scan, I'd like to see feedback from the hackers.
>
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.
> It shall allow to scan a particular table without storage access, thus
> make scan performance improved.
> So what? Which is difference from large shared_buffers configuration?
> This mechanism intends to cache a part of columns being referenced
> in the query, not whole of the records. It makes sense to the workloads
> that scan a table with many columns but qualifier references just a few
> columns, typically used to analytic queries, because it enables to
> reduce memory consumption to be cached, thus more number of records
> can be cached.
> In addition, it has another role from my standpoint. It also performs as
> fast data supplier towards GPU/MIC devices. When we move data to
> GPU device, the source address has to be a region marked as "page-
> locked" that is exempted from concurrent swap out, if we want CUDA
> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>
> Probably, here is no problem on construction of this table cache.
> All we need to do is inject a custom-scan node instead of seq-scan,
> then it can construct table cache in concurrence with regular seq-
> scan, even though the first access become a little bit slow down.
>
> My concern is how to handle a case when table gets modified.
> A straightforward idea is that each cached entries being modified
> shall be invalidated by callback mechanism.
> Trigger can help in case of INSERT, UPDATE, DELETE and
> TRUNCATE. Furthermore, it's better if extension could inject
> its own trigger definition at RelationBuildTriggers() on the fly,
> to perform the feature transparently.
> On the other hand, we have no way to get control around VACUUM.
> I want to have a hook that allows extensions to get control when
> a page got vacuumed. Once we have such a hook, it enables to
> invalidate cached entries being indexed by tid, but already vacuumed.
> Its best location is probably lazy_scan_heap() to call back extension
> for each page getting vacuumed, with
>
> How about your opinion?

I think it's hard to say in the abstract.  I'd suggest adding the
hooks you feel like you need and see what the resulting patch looks
like.  Then post that and we can talk about how (and whether) to do it
better.  My personal bet is that triggers are the wrong way to do
something like this; I'd look to do it all with hooks.  Of course,
figuring how to position those hooks so that they are maintainable and
don't affect performance when not used is the tricky part.

> I'd like to find out the best way to implement this table-caching
> mechanism within scope of v9.4 functionality set.
> Any ideas, comments or suggestions are welcome.

I think that Tom is right: there's not time to get something like this
done for 9.4.  If you start working relatively soon, I think you could
hope to have a good proposal in time for 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's needed for cache-only table scan?

2013-11-12 Thread Tom Lane
Kohei KaiGai  writes:
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.

This seems like a pretty dubious idea to me --- you're talking about
expending large amounts of memory on a single-purpose cache, without
any clear way to know if the data will ever be used before it gets
invalidated (and the life expectancy of the cache doesn't sound like
it'd be very high either).

> I'd like to find out the best way to implement this table-caching
> mechanism within scope of v9.4 functionality set.

There's no possible way you'll finish this for 9.4.  Keep in mind that
CF3 starts Friday, and CF4 starts 2014-01-15, and project policy is that
major feature patches should arrive (in at least rough form) by CF3.
Even ignoring that policy, this is more than 2 months worth of work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers