回复:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

2021-12-23 Thread ()

Fixed a bug found during testing.


Wenjing



 --原始邮件 --
发件人:曾文旌(义从) 
发送时间:Sun Dec 12 20:51:08 2021
收件人:Zhihong Yu 
抄送:Tomas Vondra , wjzeng , 
PostgreSQL Hackers , shawn wang 
, ggys...@gmail.com 
主题:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?




 --原始邮件 --
发件人:Zhihong Yu 
发送时间:Sun Dec 12 01:13:11 2021
收件人:曾文旌(义从) 
抄送:Tomas Vondra , wjzeng , 
PostgreSQL Hackers , shawn wang 
, ggys...@gmail.com 
主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?



On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从)  wrote:




 --原始邮件 --
发件人:Tomas Vondra 
发送时间:Wed Dec 8 11:26:35 2021
收件人:曾文旌(义从) , shawn wang 
, ggys...@gmail.com , PostgreSQL 
Hackers 
抄送:wjzeng 
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers
> 
> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.
> 
> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).
> 
> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.
> 
> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.
> 
> Look forward to your suggestions on this proposal.
> 

I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed 
down into sublink.
Hopefully this will help you understand the details of this patch. Later, I 
will add more cases.
2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x0077b412 in adjust_appendrel_attrs_mutator
  (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470  Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under 
src/test/regress.
As you pointed out, there was a problem with regression under contrib(in 
contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.


3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any 
details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

if (ec->ec_processed)
;

   really be?

if (ec->ec_processed)
continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two 
generate_base_implied_equalities calls
1) I need the base equivalent expression generated after 
generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with 
parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again 
(generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have

回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

2021-12-12 Thread ()



 --原始邮件 --
发件人:Zhihong Yu 
发送时间:Sun Dec 12 01:13:11 2021
收件人:曾文旌(义从) 
抄送:Tomas Vondra , wjzeng , 
PostgreSQL Hackers , shawn wang 
, ggys...@gmail.com 
主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?



On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从)  wrote:




 --原始邮件 --
发件人:Tomas Vondra 
发送时间:Wed Dec 8 11:26:35 2021
收件人:曾文旌(义从) , shawn wang 
, ggys...@gmail.com , PostgreSQL 
Hackers 
抄送:wjzeng 
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers
> 
> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.
> 
> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).
> 
> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.
> 
> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.
> 
> Look forward to your suggestions on this proposal.
> 

I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed 
down into sublink.
Hopefully this will help you understand the details of this patch. Later, I 
will add more cases.
2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x0077b412 in adjust_appendrel_attrs_mutator
  (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470  Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under 
src/test/regress.
As you pointed out, there was a problem with regression under contrib(in 
contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.


3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any 
details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

if (ec->ec_processed)
;

   really be?

if (ec->ec_processed)
continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two 
generate_base_implied_equalities calls
1) I need the base equivalent expression generated after 
generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with 
parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again 
(generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all 
laz

回复:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

2021-12-11 Thread ()



 --原始邮件 --
发件人:Tomas Vondra 
发送时间:Wed Dec 8 11:26:35 2021
收件人:曾文旌(义从) , shawn wang 
, ggys...@gmail.com , PostgreSQL 
Hackers 
抄送:wjzeng 
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers
> 
> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.
> 
> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).
> 
> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.
> 
> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.
> 
> Look forward to your suggestions on this proposal.
> 

I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed 
down into sublink.
Hopefully this will help you understand the details of this patch. Later, I 
will add more cases.
2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x0077b412 in adjust_appendrel_attrs_mutator
  (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470  Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under 
src/test/regress.
As you pointed out, there was a problem with regression under contrib(in 
contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.


3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any 
details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

if (ec->ec_processed)
;

   really be?

if (ec->ec_processed)
continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two 
generate_base_implied_equalities calls
1) I need the base equivalent expression generated after 
generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with 
parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again 
(generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mean we
expand sublinks in two places - sometimes lazily, sometimes not?

Yes, not all sublink is delayed. Let me explain this:
1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, all 
lazy process sublink will not happen,
qual pushdown to sublink depend on lazy procee sublink, which means no quals 
will be pushed down.
2) Even  if enable_lazy_process_sublink = true If Query in this level contains 
some complex features,
sublink in this level query will not try do qual pushdown. (see functio

回复:Re: Is it worth pushing conditions to sublink/subplan?

2021-12-07 Thread ()
Hi Hackers

For my previous proposal, I developed a prototype and passed regression testing.
It works similarly to subquery's qual pushdown. We know that sublink expands
at the beginning of each level of query. At this stage, The query's conditions 
and
equivalence classes are not processed. But after 
generate_base_implied_equalities
the conditions are processed,  which is why qual can push down to subquery but 
sublink not.

My POC implementation chose to delay the sublink expansion in the SELECT clause 
(targetList)
and where clause. Specifically, it is delayed after 
generate_base_implied_equalities. Thus,
the equivalent conditions already established in the Up level query can be 
easily obtained
in the sublink expansion process (make_subplan). 

For example, if the up level query has a.id = 10 and the sublink query has a.id 
= b.id, then
we get b.id = 10 and push it down to the sublink quey. If b is a partitioned 
table and is
partitioned by id, then a large number of unrelated subpartitions are pruned 
out, 
This optimizes a significant amount of Planner and SQL execution time, 
especially
if the partitioned table has a large number of subpartitions and is what I want.

Currently, There were two SQL failures in the regression test, because the 
expansion
order of sublink was changed, which did not affect the execution result of SQL.

Look forward to your suggestions on this proposal.

Thanks

Wenjing




 --原始邮件 --
发件人:shawn wang 
发送时间:Wed Sep 1 10:54:50 2021
收件人:曾文旌(义从) 
抄送:PostgreSQL Hackers , wjzeng 

主题:Re: Is it worth pushing conditions to sublink/subplan?

I tested it the way you said and increased the number of sub-tables.
I created a hash partition table of 1000 sub-tables.
Test according to your first SQL, the optimizer cuts the unnecessary sub-tables 
well.
You can see the plan:

postgres=# explain analyze
postgres-# select temp.p1 from
postgres-# (
postgres(# select p1,p2 from test1.test1hashtable x where x.p1 = '1'
postgres(# union all
postgres(# (values('1','1'))
postgres(# ) temp,
postgres-# test1.test1hashtable y
postgres-# where y.p2 = temp.p2 and y.p1 = '1' and y.p1='1';
 QUERY PLAN 
-
 Nested Loop (cost=0.00..25.55 rows=1 width=32) (actual time=0.004..0.004 
rows=0 loops=1)
 Join Filter: (x.p2 = y.p2)
 -> Seq Scan on test1hashtable826 y (cost=0.00..12.75 rows=1 width=32) (actual 
time=0.002..0.002 rows=0 loops=1)
 Filter: (p1 = '1'::text)
 -> Append (cost=0.00..12.78 rows=2 width=64) (never executed)
 -> Seq Scan on test1hashtable826 x (cost=0.00..12.75 rows=1 width=64) (never 
executed)
 Filter: (p1 = '1'::text)
 -> Result (cost=0.00..0.01 rows=1 width=64) (never executed)
 Planning Time: 0.158 ms
 Execution Time: 0.022 ms
(10 rows)

But when the second one runs, the planning time reaches 13.942ms.
The plan:

postgres=# explain analyze
postgres-# select
postgres-# y.p1,
postgres-# (Select x.p2 from test1.test1hashtable x where y.p1 =x.p1 and 
y.p2=x.p2) as b
postgres-# from test1.test1hashtable y where p1 = '1' and p2 = '1';
 QUERY PLAN 
--
 Seq Scan on test1hashtable826 y (cost=0.00..13318.30 rows=1 width=64) (actual 
time=0.004..0.047 rows=0 loops=1)
 Filter: ((p1 = '1'::text) AND (p2 = '1'::text))
 SubPlan 1
 -> Append (cost=0.00..13305.00 rows=1000 width=32) (never executed)
 -> Seq Scan on test1hashtable1 x_1 (cost=0.00..13.30 rows=1 width=32) (never 
executed)
 Filter: ((y.p1 = p1) AND (y.p2 = p2))
 -> Seq Scan on test1hashtable1000 x_1000 (cost=0.00..13.30 rows=1 width=32) 
(never executed)
 Filter: ((y.p1 = p1) AND (y.p2 = p2))
 Planning Time: 13.942 ms
 Execution Time: 4.899 ms
(2006 rows)

This is a very worthwhile thing to do. In a relatively large business system, a 
large number of partition tables and high concurrency are often used. If the 
planning time is too long, this will greatly affect the business.

regards,
Shawn.
Wenjing  于2021年8月17日周二 上午10:31写道:




2021年8月16日 17:15,Wenjing  写道:
Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in 
SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, 
However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get 
the conditions for pruning.

So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 parti

Re: [Proposal] Global temporary tables

2020-03-16 Thread ()


> 2020年3月16日 下午5:31,Prabhat Sahu  写道:
> 
> 
> 
> On Mon, Mar 16, 2020 at 1:30 PM Konstantin Knizhnik 
> mailto:k.knizh...@postgrespro.ru>> wrote:
> 
> It seems to be expected behavior: GTT data is private to the session and 
> postgres_fdw establish its own session where content of the table is empty.
> But if you insert some data in f_gtt1, then you will be able to select this 
> data from it because of connection cache in postgres_fdw.
> 
> Thanks for the explanation.
> I am able to insert and select the value from f_gtt1.
> 
>  postgres=# insert into f_gtt1 values (1,'gtt_c21');
> INSERT 0 1
> postgres=# select * from f_gtt1;
>  c1 |   c2
> +-
>   1 | gtt_c21
> (1 row)
> 
> I have one more doubt,
> As you told above "GTT data is private to the session and postgres_fdw 
> establish its own session where content of the table is empty."
> Please check the below scenario, 
> we can select data from the "root GTT" and "foreign GTT partitioned table" 
> but we are unable to select data from "GTT partitioned table"
postgres=# select pg_backend_pid();
 pg_backend_pid 

 119135
(1 row)

postgres=# select * from pg_gtt_attached_pids;
 schemaname | tablename | relid |  pid   
+---+---+
 public | gtt2_p1   | 73845 | 119135
 public | gtt2_p1   | 73845 |  51482
(2 rows)


postgres=# select datid,datname,pid,application_name,query from 
pg_stat_activity where usename = ‘wenjing';
 datid | datname  |  pid   | application_name | 
   query 
---+--++--+--
 13589 | postgres | 119135 | psql | select 
datid,datname,pid,application_name,query from pg_stat_activity where usename = 
'wenjing';
 13589 | postgres |  51482 | postgres_fdw | COMMIT TRANSACTION
(2 rows)

This can be explained
The postgre_fdw connection has not been disconnected, and it produced data in 
another session.
In other words, gtt2_p1 is empty in session 119135, but not in session 51482.


> 
> postgres=# create global temporary table gtt2 (c1 integer, c2 integer) 
> partition by range(c1);
> CREATE TABLE
> postgres=# create global temporary table gtt2_p1 (c1 integer, c2 integer);
> CREATE TABLE
> postgres=# create foreign table f_gtt2_p1 (c1 integer, c2 integer) server fdw 
> options (table_name 'gtt2_p1');
> CREATE FOREIGN TABLE
> postgres=# alter table gtt2 attach partition f_gtt2_p1 for values from 
> (minvalue) to (10);
> ALTER TABLE
> postgres=# insert into gtt2 select i,i from generate_series(1,5,2)i;
> INSERT 0 3
> postgres=# select * from gtt2;
>  c1 | c2 
> +
>   1 |  1
>   3 |  3
>   5 |  5
> (3 rows)
> 
> postgres=# select * from gtt2_p1;
>  c1 | c2 
> +
> (0 rows)
> 
> postgres=# select * from f_gtt2_p1;
>  c1 | c2 
> +
>   1 |  1
>   3 |  3
>   5 |  5
> (3 rows)
> 
> Is this an expected behavior?
> 
> -- 
> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com 



Re: [Proposal] Global temporary tables

2020-03-16 Thread ()


> 2020年3月16日 下午5:04,Pavel Stehule  写道:
> 
> 
> 
> po 16. 3. 2020 v 9:58 odesílatel tushar  <mailto:tushar.ah...@enterprisedb.com>> napsal:
> Hi Wenjing,
> 
> I have created a global table on X session but i am not able to drop from Y 
> session ?
> 
> X session - ( connect to psql terminal )
> postgres=# create global temp table foo(n int);
> CREATE TABLE
> postgres=# select * from foo;
>  n 
> ---
> (0 rows)
> 
> 
> Y session - ( connect to psql terminal )
> postgres=# drop table foo;
> ERROR:  can not drop relation foo when other backend attached this global 
> temp table
> 
> Table has been created  so i think - user should be able to drop from another 
> session as well without exit from X session. 
> 
> By the original design GTT was not modifiable until is used by any session. 
> Now, you cannot to drop normal table when this table is used.
> 
> It is hard to say what is most correct behave and design, but for this 
> moment, I think so protecting table against drop while it is used by other 
> session is the best behave.
> 
> Maybe for next release we can introduce DROP TABLE x (FORCE) - like we have 
> for DROP DATABASE. This behave is very similar.
I agree with that.


Wenjing

> 
> Pavel
> 
> 
> regards,
> 
> On 3/16/20 1:35 PM, 曾文旌(义从) wrote:
>> 
>> 
>>> 2020年3月16日 下午2:23,Prabhat Sahu >> <mailto:prabhat.s...@enterprisedb.com>> 写道:
>>> 
>>> Hi Wenjing,
>>> Please check the below scenario, where the Foreign table on GTT not showing 
>>> records.
>>> 
>>> postgres=# create extension postgres_fdw;
>>> CREATE EXTENSION
>>> postgres=# do $d$
>>> begin
>>> execute $$create server fdw foreign data wrapper postgres_fdw 
>>> options (host 'localhost',dbname 'postgres',port 
>>> '$$||current_setting('port')||$$')$$;
>>> end;
>>> $d$;
>>> DO
>>> postgres=# create user mapping for public server fdw;
>>> CREATE USER MAPPING
>>> 
>>> postgres=# create table lt1 (c1 integer, c2 varchar(50));
>>> CREATE TABLE
>>> postgres=# insert into lt1 values (1,'c21');
>>> INSERT 0 1
>>> postgres=# create foreign table ft1 (c1 integer, c2 varchar(50)) server fdw 
>>> options (table_name 'lt1');
>>> CREATE FOREIGN TABLE
>>> postgres=# select * from ft1;
>>>  c1 | c2  
>>> +-
>>>   1 | c21
>>> (1 row)
>>> 
>>> postgres=# create global temporary table gtt1 (c1 integer, c2 varchar(50));
>>> CREATE TABLE
>>> postgres=# insert into gtt1 values (1,'gtt_c21');
>>> INSERT 0 1
>>> postgres=# create foreign table f_gtt1 (c1 integer, c2 varchar(50)) server 
>>> fdw options (table_name 'gtt1');
>>> CREATE FOREIGN TABLE
>>> 
>>> postgres=# select * from gtt1;
>>>  c1 |   c2
>>> +-
>>>   1 | gtt_c21
>>> (1 row)
>>> 
>>> postgres=# select * from f_gtt1;
>>>  c1 | c2 
>>> +
>>> (0 rows)
>>> 
>>> -- 
>> 
>> I understand that postgre_fdw works similar to dblink.
>> postgre_fdw access to the table requires a new connection.
>> The data in the GTT table is empty in the newly established connection.
>> Because GTT shares structure but not data between connections.
>> 
>> Try local temp table:
>> create temporary table ltt1 (c1 integer, c2 varchar(50));
>> 
>> insert into ltt1 values (1,'gtt_c21');
>> 
>> create foreign table f_ltt1 (c1 integer, c2 varchar(50)) server fdw options 
>> (table_name 'ltt1');
>> 
>> select * from ltt1;
>>  c1 |   c2
>> +-
>>   1 | gtt_c21
>> (1 row)
>> 
>> select * from l_gtt1;
>> ERROR:  relation "l_gtt1" does not exist
>> LINE 1: select * from l_gtt1;
>> 
>> 
>> Wenjing
>> 
>> 
>>> With Regards,
>>> Prabhat Kumar Sahu
>>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>> 
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/ <https://www.enterprisedb.com/>
> The Enterprise PostgreSQL Company



Re: [Proposal] Global temporary tables

2020-03-16 Thread ()


> 2020年3月16日 下午4:58,tushar  写道:
> 
> Hi Wenjing,
> 
> I have created a global table on X session but i am not able to drop from Y 
> session ?
> 
> X session - ( connect to psql terminal )
> postgres=# create global temp table foo(n int);
> CREATE TABLE
> postgres=# select * from foo;
>  n 
> ---
> (0 rows)
> 
> 
> Y session - ( connect to psql terminal )
> postgres=# drop table foo;
> ERROR:  can not drop relation foo when other backend attached this global 
> temp table
For now, If one dba wants to drop one GTT,
he can use the view pg_gtt_attached_pids to see which backends are using this 
GTT.
then kill these sessions with pg_terminate_backend, and he can drop this GTT.

> 
> Table has been created  so i think - user should be able to drop from another 
> session as well without exit from X session. 
> 
> regards,
> 
> On 3/16/20 1:35 PM, 曾文旌(义从) wrote:
>> 
>> 
>>> 2020年3月16日 下午2:23,Prabhat Sahu >> <mailto:prabhat.s...@enterprisedb.com>> 写道:
>>> 
>>> Hi Wenjing,
>>> Please check the below scenario, where the Foreign table on GTT not showing 
>>> records.
>>> 
>>> postgres=# create extension postgres_fdw;
>>> CREATE EXTENSION
>>> postgres=# do $d$
>>> begin
>>> execute $$create server fdw foreign data wrapper postgres_fdw 
>>> options (host 'localhost',dbname 'postgres',port 
>>> '$$||current_setting('port')||$$')$$;
>>> end;
>>> $d$;
>>> DO
>>> postgres=# create user mapping for public server fdw;
>>> CREATE USER MAPPING
>>> 
>>> postgres=# create table lt1 (c1 integer, c2 varchar(50));
>>> CREATE TABLE
>>> postgres=# insert into lt1 values (1,'c21');
>>> INSERT 0 1
>>> postgres=# create foreign table ft1 (c1 integer, c2 varchar(50)) server fdw 
>>> options (table_name 'lt1');
>>> CREATE FOREIGN TABLE
>>> postgres=# select * from ft1;
>>>  c1 | c2  
>>> +-
>>>   1 | c21
>>> (1 row)
>>> 
>>> postgres=# create global temporary table gtt1 (c1 integer, c2 varchar(50));
>>> CREATE TABLE
>>> postgres=# insert into gtt1 values (1,'gtt_c21');
>>> INSERT 0 1
>>> postgres=# create foreign table f_gtt1 (c1 integer, c2 varchar(50)) server 
>>> fdw options (table_name 'gtt1');
>>> CREATE FOREIGN TABLE
>>> 
>>> postgres=# select * from gtt1;
>>>  c1 |   c2
>>> +-
>>>   1 | gtt_c21
>>> (1 row)
>>> 
>>> postgres=# select * from f_gtt1;
>>>  c1 | c2 
>>> +
>>> (0 rows)
>>> 
>>> -- 
>> 
>> I understand that postgre_fdw works similar to dblink.
>> postgre_fdw access to the table requires a new connection.
>> The data in the GTT table is empty in the newly established connection.
>> Because GTT shares structure but not data between connections.
>> 
>> Try local temp table:
>> create temporary table ltt1 (c1 integer, c2 varchar(50));
>> 
>> insert into ltt1 values (1,'gtt_c21');
>> 
>> create foreign table f_ltt1 (c1 integer, c2 varchar(50)) server fdw options 
>> (table_name 'ltt1');
>> 
>> select * from ltt1;
>>  c1 |   c2
>> +-
>>   1 | gtt_c21
>> (1 row)
>> 
>> select * from l_gtt1;
>> ERROR:  relation "l_gtt1" does not exist
>> LINE 1: select * from l_gtt1;
>> 
>> 
>> Wenjing
>> 
>> 
>>> With Regards,
>>> Prabhat Kumar Sahu
>>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>> 
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/ <https://www.enterprisedb.com/>
> The Enterprise PostgreSQL Company



Re: [Proposal] Global temporary tables

2020-03-16 Thread ()


> 2020年3月16日 下午2:23,Prabhat Sahu  写道:
> 
> Hi Wenjing,
> Please check the below scenario, where the Foreign table on GTT not showing 
> records.
> 
> postgres=# create extension postgres_fdw;
> CREATE EXTENSION
> postgres=# do $d$
> begin
> execute $$create server fdw foreign data wrapper postgres_fdw options 
> (host 'localhost',dbname 'postgres',port '$$||current_setting('port')||$$')$$;
> end;
> $d$;
> DO
> postgres=# create user mapping for public server fdw;
> CREATE USER MAPPING
> 
> postgres=# create table lt1 (c1 integer, c2 varchar(50));
> CREATE TABLE
> postgres=# insert into lt1 values (1,'c21');
> INSERT 0 1
> postgres=# create foreign table ft1 (c1 integer, c2 varchar(50)) server fdw 
> options (table_name 'lt1');
> CREATE FOREIGN TABLE
> postgres=# select * from ft1;
>  c1 | c2  
> +-
>   1 | c21
> (1 row)
> 
> postgres=# create global temporary table gtt1 (c1 integer, c2 varchar(50));
> CREATE TABLE
> postgres=# insert into gtt1 values (1,'gtt_c21');
> INSERT 0 1
> postgres=# create foreign table f_gtt1 (c1 integer, c2 varchar(50)) server 
> fdw options (table_name 'gtt1');
> CREATE FOREIGN TABLE
> 
> postgres=# select * from gtt1;
>  c1 |   c2
> +-
>   1 | gtt_c21
> (1 row)
> 
> postgres=# select * from f_gtt1;
>  c1 | c2 
> +
> (0 rows)
> 
> -- 

I understand that postgre_fdw works similar to dblink.
postgre_fdw access to the table requires a new connection.
The data in the GTT table is empty in the newly established connection.
Because GTT shares structure but not data between connections.

Try local temp table:
create temporary table ltt1 (c1 integer, c2 varchar(50));

insert into ltt1 values (1,'gtt_c21');

create foreign table f_ltt1 (c1 integer, c2 varchar(50)) server fdw options 
(table_name 'ltt1');

select * from ltt1;
 c1 |   c2
+-
  1 | gtt_c21
(1 row)

select * from l_gtt1;
ERROR:  relation "l_gtt1" does not exist
LINE 1: select * from l_gtt1;


Wenjing


> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com 



Re: [Proposal] Global temporary tables

2020-03-12 Thread ()



> 2020年3月12日 上午4:12,Robert Haas  写道:
> 
> On Wed, Mar 11, 2020 at 9:07 AM 曾文旌(义从)  wrote:
>> reindex need change relfilenode, but GTT is not currently supported.
> 
> In my view that'd have to be fixed somehow.
Ok , I am working on it.



> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-03-09 Thread ()



> 2020年3月9日 下午10:37,tushar  写道:
> 
> On 3/6/20 12:35 PM, 曾文旌(义从) wrote:
>> Fixed in global_temporary_table_v17-pg13.patch
> 
> I observed that , we do support 'global temp' keyword with views
> 
> postgres=# create or replace  global temp view v1 as select 5;
> CREATE VIEW
I think we should not support global temp view.
Fixed in global_temporary_table_v18-pg13.patch.



Wenjing


> 
> but if we take the dump( using pg_dumpall) then it only display 'create view'
> 
> look like we are skipping it ?
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-03-05 Thread ()



> 2020年3月5日 下午10:38,Robert Haas  写道:
> 
> On Thu, Mar 5, 2020 at 9:19 AM tushar  wrote:
>> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash when 
>> forget
>> ERROR:  out of shared memory
>> HINT:  You might need to increase max_active_gtt.
>> 
>> also , would be great  if we can make this error message  user friendly like 
>>  - "max connection reached"  rather than memory error
> 
> That would be nice, but the bigger problem is that the WARNING there
> looks totally unacceptable. It's looks like it's complaining of some
> internal issue (i.e. a bug or corruption) and the grammar is poor,
> too.

Yes, WARNING should not exist.
This is a bug in the rollback process and I have fixed it in 
global_temporary_table_v17-pg13.patch


Wenjing


> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-03-05 Thread ()


> 2020年3月5日 下午10:19,tushar  写道:
> 
> On 3/3/20 2:10 PM, 曾文旌(义从) wrote:
>> I fixed in global_temporary_table_v16-pg13.patch.
> Please refer this scenario -
> 
> --Connect to psql -
> 
> postgres=# alter system set max_active_global_temporary_table =1;
> ALTER SYSTEM
> 
> --restart the server (./pg_ctl -D data restart) 
> 
> --create global temp table 
> 
> postgres=# create global temp  table ccc1  (c int);
> CREATE TABLE
> 
> --Try to Create another global temp table
> 
> postgres=# create global temp  table ccc2  (c int);
> WARNING:  relfilenode 13589/1663/19063 not exist in gtt shared hash when 
> forget
> ERROR:  out of shared memory
> HINT:  You might need to increase max_active_gtt.
> 
> postgres=# show max_active_gtt;
> ERROR:  unrecognized configuration parameter "max_active_gtt"
> postgres=# 
> postgres=# show max_active_global_temporary_table ;
>  max_active_global_temporary_table 
> ---
>  1
> (1 row)
> 
> postgres=# 
> 
> I cannot find "max_active_gtt"  GUC . I think you are referring to  
> "max_active_global_temporary_table" here ? 
> 
You're right.

Fixed in global_temporary_table_v17-pg13.patch


Wenjing


> also , would be great  if we can make this error message  user friendly like  
> - "max connection reached"  rather than memory error
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/ <https://www.enterprisedb.com/>
> The Enterprise PostgreSQL Company



Re: [Proposal] Global temporary tables

2020-02-26 Thread ()


> 2020年2月25日 下午9:36,Prabhat Sahu  写道:
> 
> Hi All,
> 
> Please check the below findings on GTT.
> -- Scenario 1:
> Under "information_schema", We are not allowed to create "temporary table", 
> whereas we can CREATE/DROP "Global Temporary Table", is it expected ?
> 
> postgres=# create temporary table information_schema.temp1(c1 int);
> ERROR:  cannot create temporary relation in non-temporary schema
> LINE 1: create temporary table information_schema.temp1(c1 int);
>^
> 
> postgres=# create global temporary table information_schema.temp1(c1 int);
> CREATE TABLE
> 
> postgres=# drop table information_schema.temp1 ;
> DROP TABLE
> 
> -- Scenario 2:
> Here I am getting the same error message in both the below cases.
> We may add a "global" keyword with GTT related error message.
> 
> postgres=# create global temporary table gtt1 (c1 int unique);
> CREATE TABLE
> postgres=# create temporary table tmp1 (c1 int unique);
> CREATE TABLE
> 
> postgres=# create temporary table tmp2 (c1 int references gtt1(c1) );
> ERROR:  constraints on temporary tables may reference only temporary tables
> 
> postgres=# create global temporary table gtt2 (c1 int references tmp1(c1) );
> ERROR:  constraints on temporary tables may reference only temporary tables
Fixed in global_temporary_table_v15-pg13.patch


Wenjing


> 
> Thanks,
> Prabhat Sahu
> 
> On Tue, Feb 25, 2020 at 2:25 PM 曾文旌(义从)  <mailto:wenjing....@alibaba-inc.com>> wrote:
> 
> 
>> 2020年2月24日 下午5:44,Prabhat Sahu > <mailto:prabhat.s...@enterprisedb.com>> 写道:
>> 
>> On Fri, Feb 21, 2020 at 9:10 PM 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> wrote:
>> Hi,
>> I have started testing the "Global temporary table" feature,
>> That's great, I see hope.
>> from "gtt_v11-pg13.patch". Below is my findings:
>> 
>> -- session 1:
>> postgres=# create global temporary table gtt1(a int);
>> CREATE TABLE
>> 
>> -- seeeion 2:
>> postgres=# truncate gtt1 ;
>> ERROR:  could not open file "base/13585/t3_16384": No such file or directory
>> 
>> is it expected?
>> 
>> Oh ,this is a bug, I fixed it.
>> Thanks for the patch.
>> I have verified the same, Now the issue is resolved with v12 patch.
>> 
>> Kindly confirm the below scenario:
>> 
>> postgres=# create global temporary table gtt1 (c1 int unique);
>> CREATE TABLE
>> 
>> postgres=# create global temporary table gtt2 (c1 int references gtt1(c1) );
>> ERROR:  referenced relation "gtt1" is not a global temp table
>> 
>> postgres=# create table tab2 (c1 int references gtt1(c1) );
>> ERROR:  referenced relation "gtt1" is not a global temp table
>> 
>> Thanks, 
>> Prabhat Sahu
> 
> GTT supports foreign key constraints in global_temporary_table_v13-pg13.patch
> 
> 
> Wenjing
> 
> 
> 
> 
> 
> -- 
> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>



Re: [Proposal] Global temporary tables

2020-02-26 Thread ()


> 2020年2月25日 下午11:31,tushar  写道:
> 
> Hi,
> 
> I have created two  global temporary tables like this -
> 
> Case 1- 
> postgres=# create global  temp table foo(n int) with 
> (on_commit_delete_rows='true');
> CREATE TABLE
> 
> Case 2- 
> postgres=# create global  temp table bar1(n int) on commit delete rows;
> CREATE TABLE
> 
> 
> but   if i try to do the same having only 'temp' keyword , Case 2 is working 
> fine but getting this error  for case 1 -
> 
> postgres=# create   temp table foo1(n int) with 
> (on_commit_delete_rows='true');
> ERROR:  regular table cannot specifie on_commit_delete_rows
> postgres=# 
> 
> postgres=#  create   temp table bar1(n int) on commit delete rows;
> CREATE TABLE
> 
> i think this error message need to be more clear .
Also fixed in global_temporary_table_v14-pg13.patch

Wenjing



> 
> regards,
> tushar 
> 
> On 2/25/20 7:19 PM, Pavel Stehule wrote/:
>> 
>> 
>> út 25. 2. 2020 v 14:36 odesílatel Prabhat Sahu 
>> mailto:prabhat.s...@enterprisedb.com>> 
>> napsal:
>> Hi All,
>> 
>> Please check the below findings on GTT.
>> -- Scenario 1:
>> Under "information_schema", We are not allowed to create "temporary table", 
>> whereas we can CREATE/DROP "Global Temporary Table", is it expected ?
>> 
>> It is ok for me. temporary tables should be created only in proprietary 
>> schema. For GTT there is not risk of collision, so it can be created in any 
>> schema where are necessary access rights.
>> 
>> Pavel
>> 
>> 
>> postgres=# create temporary table information_schema.temp1(c1 int);
>> ERROR:  cannot create temporary relation in non-temporary schema
>> LINE 1: create temporary table information_schema.temp1(c1 int);
>>^
>> 
>> postgres=# create global temporary table information_schema.temp1(c1 int);
>> CREATE TABLE
>> 
>> postgres=# drop table information_schema.temp1 ;
>> DROP TABLE
>> 
>> -- Scenario 2:
>> Here I am getting the same error message in both the below cases.
>> We may add a "global" keyword with GTT related error message.
>> 
>> postgres=# create global temporary table gtt1 (c1 int unique);
>> CREATE TABLE
>> postgres=# create temporary table tmp1 (c1 int unique);
>> CREATE TABLE
>> 
>> postgres=# create temporary table tmp2 (c1 int references gtt1(c1) );
>> ERROR:  constraints on temporary tables may reference only temporary tables
>> 
>> postgres=# create global temporary table gtt2 (c1 int references tmp1(c1) );
>> ERROR:  constraints on temporary tables may reference only temporary tables
>> 
>> Thanks,
>> Prabhat Sahu
>> 
>> On Tue, Feb 25, 2020 at 2:25 PM 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> wrote:
>> 
>> 
>>> 2020年2月24日 下午5:44,Prabhat Sahu >> <mailto:prabhat.s...@enterprisedb.com>> 写道:
>>> 
>>> On Fri, Feb 21, 2020 at 9:10 PM 曾文旌(义从) >> <mailto:wenjing@alibaba-inc.com>> wrote:
>>> Hi,
>>> I have started testing the "Global temporary table" feature,
>>> That's great, I see hope.
>>> from "gtt_v11-pg13.patch". Below is my findings:
>>> 
>>> -- session 1:
>>> postgres=# create global temporary table gtt1(a int);
>>> CREATE TABLE
>>> 
>>> -- seeeion 2:
>>> postgres=# truncate gtt1 ;
>>> ERROR:  could not open file "base/13585/t3_16384": No such file or directory
>>> 
>>> is it expected?
>>> 
>>> Oh ,this is a bug, I fixed it.
>>> Thanks for the patch.
>>> I have verified the same, Now the issue is resolved with v12 patch.
>>> 
>>> Kindly confirm the below scenario:
>>> 
>>> postgres=# create global temporary table gtt1 (c1 int unique);
>>> CREATE TABLE
>>> 
>>> postgres=# create global temporary table gtt2 (c1 int references gtt1(c1) );
>>> ERROR:  referenced relation "gtt1" is not a global temp table
>>> 
>>> postgres=# create table tab2 (c1 int references gtt1(c1) );
>>> ERROR:  referenced relation "gtt1" is not a global temp table
>>> 
>>> Thanks, 
>>> Prabhat Sahu
>> 
>> GTT supports foreign key constraints in global_temporary_table_v13-pg13.patch
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> 
>> -- 
>> With Regards,
>> Prabhat Kumar Sahu
>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
> 
> -- 
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/ <https://www.enterprisedb.com/>
> The Enterprise PostgreSQL Company



Re: [Proposal] Global temporary tables

2020-02-24 Thread ()


> 2020年2月24日 下午9:34,Prabhat Sahu  写道:
> 
> Hi All,
> 
> I observe a different behavior in  "temporary table" and "global temporary 
> table".
> Not sure if it is expected?
> 
> postgres=# create global temporary table parent1(a int)  on commit delete 
> rows;
> CREATE TABLE
> postgres=# create global temporary table child1() inherits (parent1);
> CREATE TABLE
> postgres=# insert into parent1 values(1);
> INSERT 0 1
> postgres=# insert into child1 values(2);
> INSERT 0 1
> postgres=# select * from parent1;
>  a 
> ---
> (0 rows)
> 
> postgres=# select * from child1;
>  a 
> ---
> (0 rows)
Because child1 inherits its father's on commit property.
I can make GTT behave like local temp table.


> 
> 
> postgres=# create temporary table parent2(a int)  on commit delete rows;
> CREATE TABLE
> postgres=# create temporary table child2() inherits (parent2);
> CREATE TABLE
> postgres=# insert into parent2 values(1);
> INSERT 0 1
> postgres=# insert into child2 values(2);
> INSERT 0 1
> postgres=# select * from parent2;
>  a 
> ---
>  2
> (1 row)
> 
> postgres=# select * from child2;
>  a 
> ---
>  2
> (1 row)
> 
> 
> Thanks,
> Prabhat Sahu
> 



Re: [Proposal] Global temporary tables

2020-02-16 Thread ()


> 2020年2月15日 下午6:06,Pavel Stehule  写道:
> 
> 
>> postgres=# insert into foo select generate_series(1,1);
>> INSERT 0 1
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
>> └┴──┴───┴───┴─┴┴─┘
>> (1 row)
>> 
>> postgres=# truncate foo;
>> TRUNCATE TABLE
>> postgres=# \dt+ foo
>>   List of relations
>> ┌┬──┬───┬───┬─┬───┬─┐
>> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
>> ╞╪══╪═══╪═══╪═╪═══╪═╡
>> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
>> └┴──┴───┴───┴─┴───┴─┘
>> (1 row)
>> 
>> I expect zero size after truncate.
> Thanks for review.
> 
> I can explain, I don't think it's a bug.
> The current implementation of the truncated GTT retains two blocks of FSM 
> pages.
> The same is true for truncating regular tables in subtransactions.
> This is an implementation that truncates the table without changing the 
> relfilenode of the table.
> 
> 
> This is not extra important feature - now this is little bit a surprise, 
> because I was not under transaction.
> 
> Changing relfilenode, I think, is necessary, minimally for future VACUUM FULL 
> support.
Not allowing relfilenode changes is the current limit.
I think can improve on it. But ,This is a bit complicated.
so I'd like to know the necessity of this improvement.
Could you give me more details?

> 
> Regards
> 
> Pavel Stehule
>  
> 
> Wenjing
> 
>> 
>> Regards
>> 
>> Pavel
>> 
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> > 
>> > -- 
>> > Robert Haas
>> > EnterpriseDB: http://www.enterprisedb.com 
>> > The Enterprise PostgreSQL Company
>> 
> 



Re: [Proposal] Global temporary tables

2020-02-15 Thread ()


> 2020年2月14日 下午5:19,Pavel Stehule  写道:
> 
> 
> 
> čt 30. 1. 2020 v 15:21 odesílatel Pavel Stehule  <mailto:pavel.steh...@gmail.com>> napsal:
> 
> 
> čt 30. 1. 2020 v 15:17 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
> > 2020年1月29日 下午9:48,Robert Haas  > <mailto:robertmh...@gmail.com>> 写道:
> > 
> > On Tue, Jan 28, 2020 at 12:12 PM 曾文旌(义从)  > <mailto:wenjing@alibaba-inc.com>> wrote:
> >>> Opinion by Pavel
> >>> + rel->rd_islocaltemp = true;  <<<<<<< if this is valid, then the name of 
> >>> field "rd_islocaltemp" is not probably best
> >>> I renamed rd_islocaltemp
> >> 
> >> I don't see any change?
> >> 
> >> Rename rd_islocaltemp to rd_istemp  in global_temporary_table_v8-pg13.patch
> > 
> > In view of commit 6919b7e3294702adc39effd16634b2715d04f012, I think
> > that this has approximately a 0% chance of being acceptable. If you're
> > setting a field in a way that is inconsistent with the current use of
> > the field, you're probably doing it wrong, because the field has an
> > existing purpose to which new code must conform. And if you're not
> > doing that, then you don't need to rename it.
> Thank you for pointing it out.
> I've rolled back the rename.
> But I still need rd_localtemp to be true, The reason is that
> 1 GTT The GTT needs to support DML in read-only transactions ,like local temp 
> table.
> 2 GTT does not need to hold the lock before modifying the index buffer ,also 
> like local temp table.
> 
> Please give me feedback.
> 
> maybe some like
> 
> rel->rd_globaltemp = true;
> 
> and somewhere else
> 
> if (rel->rd_localtemp || rel->rd_globaltemp)
> {
>   ...
> }
> 
> 
> I tested this patch again and I am very well satisfied with behave. 
> 
> what doesn't work still - TRUNCATE statement
> 
> postgres=# insert into foo select generate_series(1,1);
> INSERT 0 1
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │  Size  │ Description │
> ╞╪══╪═══╪═══╪═╪╪═╡
> │ public │ foo  │ table │ pavel │ session │ 384 kB │ │
> └┴──┴───┴───┴─┴┴─┘
> (1 row)
> 
> postgres=# truncate foo;
> TRUNCATE TABLE
> postgres=# \dt+ foo
>   List of relations
> ┌┬──┬───┬───┬─┬───┬─┐
> │ Schema │ Name │ Type  │ Owner │ Persistence │ Size  │ Description │
> ╞╪══╪═══╪═══╪═╪═══╪═╡
> │ public │ foo  │ table │ pavel │ session │ 16 kB │ │
> └┴──┴───┴───┴─┴───┴─┘
> (1 row)
> 
> I expect zero size after truncate.
Thanks for review.

I can explain, I don't think it's a bug.
The current implementation of the truncated GTT retains two blocks of FSM pages.
The same is true for truncating regular tables in subtransactions.
This is an implementation that truncates the table without changing the 
relfilenode of the table.


Wenjing

> 
> Regards
> 
> Pavel
> 
> 
> 
> Wenjing
> 
> 
> 
> 
> > 
> > -- 
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
> > The Enterprise PostgreSQL Company
> 



Re: [Proposal] Global temporary tables

2020-02-05 Thread ()



> 2020年2月5日 下午10:15,Robert Haas  写道:
> 
> On Wed, Feb 5, 2020 at 8:21 AM 曾文旌(义从)  wrote:
>> What do you mean by "catalog buffer"?
>> Yes, cleanup of local temp table requires deletion of correspondent entry 
>> from catalog and GTT should not do it.
>> But  I am speaking only about cleanup of data files of temp relations. It is 
>> done in the same way for local and global temp tables.
>> 
>> For native pg, the data file of temp table will not be cleaned up direct 
>> after oom happen.
>> Because the orphan local temp table(include catalog, local buffer, datafile) 
>> will be cleaned up by deleting the orphan temp schame in autovacuum.
>> So for GTT ,we cannot do the same with just deleting data files. This is why 
>> I dealt with it specifically.
> 
> After a crash restart, all temporary relfilenodes (e.g t12345_67890)
> are removed. I think GTTs should use relfilenodes of this general
> form, and then they'll be cleaned up by the existing code. For a
> regular temporary table, there is also the problem of removing the
> catalog entries, but GTTs shouldn't have this problem, because a GTT
> doesn't have any catalog entries for individual sessions, just for the
> main object, which isn't going away just because the system restarted.
> Right?
Wenjing wrote:
I have implemented its processing in global_temporary_table_v10-pg13.patch
When oom happen, all backend will be killed.
Then, I choose to clean up these files(all like t12345_67890) in startup 
process.

Wenjing

> 
>> In my patch autovacuum is prohibited for GTT.
>> 
>> But vacuum GTT is not prohibited.
> 
> That sounds right to me.
Wenjing wrote:
Also implemented in global_temporary_table_v10-pg13.patch

Wenjing

> 
> This thread is getting very hard to follow because neither Konstantin
> nor Wenjing seem to be using the standard method of quoting. When I
> reply, I get the whole thing quoted with "> " but can't easily tell
> the difference between what Wenjing wrote and what Konstantin wrote,
> because both of your mailers are quoting using indentation rather than
> "> " and it gets wiped out by my mailer. Please see if you can get
> your mailer to do what is normally done on this mailing list.
> 
> Thanks,
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-02-05 Thread ()


> 2020年2月5日 上午12:47,Konstantin Knizhnik  写道:
> 
> 
> 
> On 04.02.2020 18:01, 曾文旌(义从) wrote:
>> 
>> 
>>> 
>>> Yes, exactly.
>>> But it is still not clear to me why do we need some special handling for 
>>> GTT?
>>> Shared memory is reinitialized and storage of temporary tables is removed.
>>> It is true for both local and global temp tables.
>> Of course not. The local temp table cleans up the entire table (including 
>> catalog buffer and datafile). GTT is not.
>> 
> 
> What do you mean by "catalog buffer"?
> Yes, cleanup of local temp table requires deletion of correspondent entry 
> from catalog and GTT should not do it.
> But  I am speaking only about cleanup of data files of temp relations. It is 
> done in the same way for local and global temp tables.
For native pg, the data file of temp table will not be cleaned up direct after 
oom happen.
Because the orphan local temp table(include catalog, local buffer, datafile) 
will be cleaned up by deleting the orphan temp schame in autovacuum.
So for GTT ,we cannot do the same with just deleting data files. This is why I 
dealt with it specifically.

> 
> 
>>> In my patch autovacuum is prohibited for GTT.
>> But vacuum GTT is not prohibited. 
>> 
> Yes, but the simplest solution is to prohibit also explicit vacuum of GTT, 
> isn't it?
> 
>>> 
>>>> IMHO forced terminated of client sessions is not acceptable solution.
>>>>> And it is not an absolutely necessary requirement.
>>>>> So from my point of view we should not add such limitations to GTT design.
>>>> This limitation makes it possible for the GTT to do all the DDL.
>>>> IMHO even oracle's GTT has similar limitations.
>>> 
>>> I have checked that Oracle is not preventing creation of index for GTT if 
>>> there are some active sessions working with this table. And this index 
>>> becomes visible for all this sessions.
>> 1 Yes The creation of inde gtt has been improved in 
>> global_temporary_table_v10-pg13.patch
>> 2 But alter GTT ; drop GTT ; drop index on GTT is blocked by other sessions
>> 
> Yes, you are right.
> Orale documetation says:
> >  1) DDL operation on global temporary tables
> > It is not possible to perform a DDL operation (except TRUNCATE 
> > <https://www.oracletutorial.com/oracle-basics/oracle-truncate-table/>) on 
> > an existing global temporary table if one or more sessions are currently 
> > bound to that table.
> 
> But looks like create index is not considered as DDL operation on GTT and is 
> also supported by Oracle.

> 
> Your approach with prohibiting such accessed using shared cache is certainly 
> better then my attempt to prohibit such DDLs for GTT at all.
> I just what to eliminate maintenance of such shared cache to simplify the 
> patch.
> 
> But I still think that we should allow truncation of GTT and 
> creating/dropping indexes on it without any limitations. 
I think the goal of this work is this.
But, the first step is let GTT get as many features as possible on regular 
tables, even with some limitations.

>>> 
>>> May be the easies solution is to prohibit explicit vacuum of GTT?
>> I think vacuum is an important part of GTT.
>> 
>> Looking back at previous emails, robert once said that vacuum GTT is pretty 
>> important.
>> https://www.postgresql.org/message-id/CA%2BTgmob%3DL1k0cpXRcipdsaE07ok%2BOn%3DtTjRiw7FtD_D2T%3DJwhg%40mail.gmail.com
>>  
>> <https://www.postgresql.org/message-id/CA+Tgmob=L1k0cpXRcipdsaE07ok+On=tTjRiw7FtD_D2T=j...@mail.gmail.com>
>> 
> 
> Well, may be I am not right.
> I never saw use cases where temp table are used not like append-only storage 
> (when temp table tuples are updated multiple times).
> But I think that if such problem actually exists then solution is to support 
> autovacuum for temp tables, rather than allow manual vacuum.
> Certainly it can not be done by another  worker because it has no access to 
> private backend's data. But it can done incrementally by backend itself.
> 
> 



Re: [Proposal] Global temporary tables

2020-02-05 Thread ()



> 2020年2月5日 上午4:57,Robert Haas  写道:
> 
> On Sat, Feb 1, 2020 at 11:14 AM 曾文旌(义从)  wrote:
>> As global_private_temp-8.patch, think about:
>> 1 session X tale several hours doing some statistical work with the GTT A, 
>> which generated some data using transaction 100, The work is not over.
>> 2 Then session Y vacuumed A, and the GTT's relfrozenxid (in pg_class) was 
>> updated to 1000 .
>> 3 Then the aotuvacuum happened, the clog  before 1000  was cleaned up.
>> 4 The data in session A could be lost due to missing clog, The analysis task 
>> failed.
>> 
>> However This is likely to happen because you allowed the GTT do vacuum.
>> And this is not a common problem, that not happen with local temp tables.
>> I feel uneasy about leaving such a question. We can improve it.
> 
> Each session is going to need to maintain its own notion of the
> relfrozenxid and relminmxid of each GTT to which it is attached.
> Storing the values in pg_class makes no sense and is completely
> unacceptable.
Yes, I've implemented it in global_temporary_table_v10-pg13.patch

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company





Re: [Proposal] Global temporary tables

2020-02-04 Thread ()


> 2020年2月3日 下午4:16,Konstantin Knizhnik  写道:
> 
> 
> 
> On 01.02.2020 19:14, 曾文旌(义从) wrote:
>> 
>> 
>>> 2020年1月27日 下午5:38,Konstantin Knizhnik >> <mailto:k.knizh...@postgrespro.ru>> 写道:
>>> 
>>> 
>>> 
>>> On 25.01.2020 18:15, 曾文旌(义从) wrote:
>>>> I wonder why do we need some special check for GTT here.
>>>>> From my point of view cleanup at startup of local storage of temp tables 
>>>>> should be performed in the same way for local and global temp tables.
>>>> After oom kill, In autovacuum, the Isolated local temp table will be 
>>>> cleaned like orphan temporary tables. The definition of local temp table 
>>>> is deleted with the storage file. 
>>>> But GTT can not do that. So we have the this implementation in my patch.
>>>> If you have other solutions, please let me know.
>>>> 
>>> I wonder if it is possible that autovacuum or some other Postgres process 
>>> is killed by OOM and postmaster is not noticing it can doens't restart 
>>> Postgres instance?
>>> as far as I know, crash of any process connected to Postgres shared memory 
>>> (and autovacuum definitely has such connection) cause Postgres restart.
>> Postmaster will not restart after oom happen, but the startup process will. 
>> GTT data files are cleaned up in the startup process.
> 
> Yes, exactly.
> But it is still not clear to me why do we need some special handling for GTT?
> Shared memory is reinitialized and storage of temporary tables is removed.
> It is true for both local and global temp tables.
Of course not. The local temp table cleans up the entire table (including 
catalog buffer and datafile). GTT is not.

> 
>>> 
>>> 
>>>> In my design
>>>> 1 Because different sessions have different transaction information, I 
>>>> choose to store the transaction information of GTT in MyProc,not catalog.
>>>> 2 About the XID wraparound problem, the reason is the design of the temp 
>>>> table storage(local temp table and global temp table) that makes it can 
>>>> not to do vacuum by autovacuum. 
>>>> It should be completely solve at the storage level.
>>>> 
>>> 
>>> My point of view is that vacuuming of temp tables is common problem for 
>>> local and global temp tables. 
>>> So it has to be addressed in the common way and so we should not try to fix 
>>> this problem only for GTT.
>> I think I agree with you this point.
>> However, this does not mean that GTT transaction information stored in 
>> pg_class is correct.
>> If you keep it that way, like in global_private_temp-8.patch, It may cause 
>> data loss in GTT after aotuvauum.
> 
> In my patch autovacuum is prohibited for GTT.
But vacuum GTT is not prohibited. 

> 
>> IMHO forced terminated of client sessions is not acceptable solution.
>>> And it is not an absolutely necessary requirement.
>>> So from my point of view we should not add such limitations to GTT design.
>> This limitation makes it possible for the GTT to do all the DDL.
>> IMHO even oracle's GTT has similar limitations.
> 
> I have checked that Oracle is not preventing creation of index for GTT if 
> there are some active sessions working with this table. And this index 
> becomes visible for all this sessions.
1 Yes The creation of inde gtt has been improved in 
global_temporary_table_v10-pg13.patch
2 But alter GTT ; drop GTT ; drop index on GTT is blocked by other sessions

SQL> drop table gtt;
drop table gtt
   *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already
in use


SQL> ALTER TABLE gtt add b int ; 
ALTER TABLE gtt add b int
*
ERROR at line 1:
ORA-14450: attempt to access a transactional temp table already in use

SQL> drop index idx_gtt;
drop index idx_gtt
   *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already
in use

I'm not saying we should do this, but from an implementation perspective we 
face similar issues.
If a dba changes a GTT, he can do it. Therefore, I think it is acceptable to do 
so.

> 
> 
>> As global_private_temp-8.patch, think about:
>> 1 session X tale several hours doing some statistical work with the GTT A, 
>> which generated some data using transaction 100, The work is not over.
>> 2 Then session Y vacuumed A, and the GTT's relfrozenxid (in pg_class) was 
>> updated to 1000 .
>> 3 Then the aotuvacuum happened, the clog  before 1000  was cleaned up.
>> 4 The data in session A could be lost due to missing clog,

Re: [Proposal] Global temporary tables

2020-02-03 Thread ()


> 2020年2月2日 上午2:00,Pavel Stehule  写道:
> 
> 
> 
> so 1. 2. 2020 v 14:39 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
>> 2020年1月30日 下午10:21,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> 
>> 
>> čt 30. 1. 2020 v 15:17 odesílatel 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> napsal:
>> 
>> 
>> > 2020年1月29日 下午9:48,Robert Haas > > <mailto:robertmh...@gmail.com>> 写道:
>> > 
>> > On Tue, Jan 28, 2020 at 12:12 PM 曾文旌(义从) > > <mailto:wenjing@alibaba-inc.com>> wrote:
>> >>> Opinion by Pavel
>> >>> + rel->rd_islocaltemp = true;  <<<<<<< if this is valid, then the name 
>> >>> of field "rd_islocaltemp" is not probably best
>> >>> I renamed rd_islocaltemp
>> >> 
>> >> I don't see any change?
>> >> 
>> >> Rename rd_islocaltemp to rd_istemp  in 
>> >> global_temporary_table_v8-pg13.patch
>> > 
>> > In view of commit 6919b7e3294702adc39effd16634b2715d04f012, I think
>> > that this has approximately a 0% chance of being acceptable. If you're
>> > setting a field in a way that is inconsistent with the current use of
>> > the field, you're probably doing it wrong, because the field has an
>> > existing purpose to which new code must conform. And if you're not
>> > doing that, then you don't need to rename it.
>> Thank you for pointing it out.
>> I've rolled back the rename.
>> But I still need rd_localtemp to be true, The reason is that
>> 1 GTT The GTT needs to support DML in read-only transactions ,like local 
>> temp table.
>> 2 GTT does not need to hold the lock before modifying the index buffer ,also 
>> like local temp table.
>> 
>> Please give me feedback.
>> 
>> maybe some like
>> 
>> rel->rd_globaltemp = true;
>> 
>> and somewhere else
>> 
>> if (rel->rd_localtemp || rel->rd_globaltemp)
>> {
>>   ...
>> }
> I tried to optimize code in global_temporary_table_v10-pg13.patch
> 
> 
> Please give me feedback.
> 
> I tested this patch and I have not any objections - from my user perspective 
> it is work as I expect
> 
> +#define RELATION_IS_TEMP(relation) \
> + ((relation)->rd_islocaltemp || \
> + (relation)->rd_rel->relpersistence == RELPERSISTENCE_GLOBAL_TEMP)
>  
> It looks little bit unbalanced
> 
> maybe is better to inject rd_isglobaltemp to relation structure
> 
> and then
> 
> it should to like 
> 
> +#define RELATION_IS_TEMP(relation) \
> + ((relation)->rd_islocaltemp || \
> + (relation)->rd_isglobaltemp))
> 
> But I have not idea if it helps in complex
In my opinion
For local temp table we need (relation)->rd_rel->relpersistence == 
RELPERSISTENCE_TEMP 
and because one local temp table belongs to only one session, need to mark one 
sessions rd_islocaltemp = true ,and other to rd_islocaltemp = false.

But For GTT, just need (relation)->rd_rel->relpersistence == 
RELPERSISTENCE_GLOBAL_GLOBAL_TEMP
One GTT can be used for every session, so no need rd_isglobaltemp anymore. This 
seems duplicated and redundant.

> 
> 
> 
> 
> 
> 
> 
> Wenjing
> 
> 
> 
>> 
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> > 
>> > -- 
>> > Robert Haas
>> > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>> > The Enterprise PostgreSQL Company
>> 
> 



Re: [Proposal] Global temporary tables

2020-02-01 Thread ()


> 2020年1月27日 下午5:38,Konstantin Knizhnik  写道:
> 
> 
> 
> On 25.01.2020 18:15, 曾文旌(义从) wrote:
>> I wonder why do we need some special check for GTT here.
>>> From my point of view cleanup at startup of local storage of temp tables 
>>> should be performed in the same way for local and global temp tables.
>> After oom kill, In autovacuum, the Isolated local temp table will be cleaned 
>> like orphan temporary tables. The definition of local temp table is deleted 
>> with the storage file. 
>> But GTT can not do that. So we have the this implementation in my patch.
>> If you have other solutions, please let me know.
>> 
> I wonder if it is possible that autovacuum or some other Postgres process is 
> killed by OOM and postmaster is not noticing it can doens't restart Postgres 
> instance?
> as far as I know, crash of any process connected to Postgres shared memory 
> (and autovacuum definitely has such connection) cause Postgres restart.
Postmaster will not restart after oom happen, but the startup process will. GTT 
data files are cleaned up in the startup process.
> 
> 
>> In my design
>> 1 Because different sessions have different transaction information, I 
>> choose to store the transaction information of GTT in MyProc,not catalog.
>> 2 About the XID wraparound problem, the reason is the design of the temp 
>> table storage(local temp table and global temp table) that makes it can not 
>> to do vacuum by autovacuum. 
>> It should be completely solve at the storage level.
>> 
> 
> My point of view is that vacuuming of temp tables is common problem for local 
> and global temp tables. 
> So it has to be addressed in the common way and so we should not try to fix 
> this problem only for GTT.
I think I agree with you this point.
However, this does not mean that GTT transaction information stored in pg_class 
is correct.
If you keep it that way, like in global_private_temp-8.patch, It may cause data 
loss in GTT after aotuvauum.

> 
> 
>> In fact, The dba can still complete the DDL of the GTT.
>> I've provided a set of functions for this case.
>> If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he 
>> needs to do:
>> 1 Use the pg_gtt_attached_pids view to list the pids for the session that is 
>> using the GTT A.
>> 2 Use pg_terminate_backend(pid)terminate they except itself.
>> 3 Do alter GTT A.
>> 
> IMHO forced terminated of client sessions is not acceptable solution.
> And it is not an absolutely necessary requirement.
> So from my point of view we should not add such limitations to GTT design.
This limitation makes it possible for the GTT to do all the DDL.
IMHO even oracle's GTT has similar limitations.

> 
> 
> 
>>> 
>>> What are the reasons of using RowExclusiveLock for GTT instead of 
>>> AccessExclusiveLock?
>>> Yes, GTT data is access only by one backend so no locking here seems to be 
>>> needed at all.
>>> But I wonder what are the motivations/benefits of using weaker lock level 
>>> here?
>> 1 Truncate GTT deletes only the data in the session, so no need use 
>> high-level lock.
>> 2 I think it still needs to be block by DDL of GTT, which is why I use 
>> RowExclusiveLock.
> 
> Sorry, I do not understand your arguments: we do not need exclusive lock 
> because we drop only local (private) data
> but we need some kind of lock. I agree with 1) and not 2).
Yes, we don't need lock for private data, but metadata need.
> 
>> 
>>> There should be no conflicts in any case...
>>> 
>>> +/* We allow to create index on global temp table only this session 
>>> use it */
>>> +if (is_other_backend_use_gtt(heapRelation->rd_node))
>>> +elog(ERROR, "can not create index when have other backend 
>>> attached this global temp table");
>>> +
>>> 
>>> The same argument as in case of dropping GTT: I do not think that 
>>> prohibiting DLL operations on GTT used by more than one backend is bad idea.
>> The idea was to give the GTT almost all the features of a regular table with 
>> few code changes.
>> The current version DBA can still do all DDL for GTT, I've already described.
> 
> I absolutely agree with you that GTT should be given the same features as 
> regular tables.
> The irony is that this most natural and convenient behavior is most easy to 
> implement without putting some extra restrictions.
> Just let indexes for GTT be constructed on demand. It it can be done using 
> the same function used for regular index creation.
The limitation on index creation have been i

Re: [Proposal] Global temporary tables

2020-01-29 Thread ()


> 2020年1月29日 上午1:54,Pavel Stehule  写道:
> 
> 
> 
> út 28. 1. 2020 v 18:13 odesílatel Pavel Stehule  <mailto:pavel.steh...@gmail.com>> napsal:
> 
> 
> út 28. 1. 2020 v 18:12 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
>> 2020年1月29日 上午12:40,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> 
>> 
>> út 28. 1. 2020 v 17:01 odesílatel 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> napsal:
>> 
>> 
>>> 2020年1月24日 上午4:47,Robert Haas >> <mailto:robertmh...@gmail.com>> 写道:
>>> 
>>> On Sat, Jan 11, 2020 at 8:51 PM Tomas Vondra
>>> mailto:tomas.von...@2ndquadrant.com>> wrote:
>>>> I proposed just ignoring those new indexes because it seems much simpler
>>>> than alternative solutions that I can think of, and it's not like those
>>>> other solutions don't have other issues.
>>> 
>>> +1.
>> I complete the implementation of this feature.
>> When a session x create an index idx_a on GTT A then
>> For session x, idx_a is valid when after create index.
>> For session y, before session x create index done, GTT A has some data, then 
>> index_a is invalid.
>> For session z, before session x create index done, GTT A has no data, then 
>> index_a is valid.
>> 
>>> 
>>>> For example, I've looked at the "on demand" building as implemented in
>>>> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>>>> calls into various places in index code seems somewht suspicious.
>>> 
>>> +1. I can't imagine that's a safe or sane thing to do.
>>> 
>>> -- 
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>>> The Enterprise PostgreSQL Company
>> 
>> Opinion by Pavel
>> +rel->rd_islocaltemp = true;  <<<<<<< if this is valid, then the name of 
>> field "rd_islocaltemp" is not probably best
>> I renamed rd_islocaltemp
>> 
>> I don't see any change?
> Rename rd_islocaltemp to rd_istemp  in global_temporary_table_v8-pg13.patch
> 
> ok :)
> 
> I found a bug
> 
> postgres=# create global temp table x(a int);
> CREATE TABLE
> postgres=# insert into x values(1);
> INSERT 0 1
> postgres=# create index on x (a);
> CREATE INDEX
> postgres=# create index on x((a + 1));
> CREATE INDEX
> postgres=# analyze x;
> WARNING:  oid 16468 not a relation
> ANALYZE
Thanks for review.

The index expression need to store statistics on index, I missed it and I'll 
fix it later.


Wenjing

> 
> other behave looks well for me.
> 
> Regards
> 
> Pavel
> 
> 
> Pavel
> 
> 
> Wenjing
> 
> 
> 
>> 
>> 
>> 
>> Opinion by Konstantin Knizhnik
>> 1 Fixed comments
>> 2 Fixed assertion
>> 
>> 
>> Please help me review.
>> 
>> 
>> Wenjing
>> 
> 



Re: [Proposal] Global temporary tables

2020-01-25 Thread ()
Thank you for review patch.

> 2020年1月24日 下午4:20,Konstantin Knizhnik  写道:
> 
> 
> 
> On 23.01.2020 19:28, 曾文旌(义从) wrote:
>> 
>> I'm trying to improve this part of the implementation in 
>> global_temporary_table_v7-pg13.patch
>> Please check my patch and give me feedback.
>> 
>> 
>> Thanks
>> 
>> Wenjing
>> 
>> 
> 
> Below is my short review of the patch:
> 
> +/*
> + * For global temp table only
> + * use AccessExclusiveLock for ensure safety
> + */
> +{
> +{
> +"on_commit_delete_rows",
> +"global temp table on commit options",
> +RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> +ShareUpdateExclusiveLock
> +},
> +true
> +},
> 
> 
> The comment seems to be confusing: it says about AccessExclusiveLock but 
> actually uses ShareUpdateExclusiveLock.
There is a problem with the comment description, I will fix it.

> 
> -Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
> -Assert(MultiXactIdIsValid(onerel->rd_rel->relminmxid));
> +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relfrozenxid 
> == InvalidTransactionId) ||
> +(!RELATION_IS_GLOBAL_TEMP(onerel) && 
> TransactionIdIsNormal(onerel->rd_rel->relfrozenxid)));
> +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relminmxid == 
> InvalidMultiXactId) ||
> +(!RELATION_IS_GLOBAL_TEMP(onerel) && 
> MultiXactIdIsValid(onerel->rd_rel->relminmxid)));
>  
> It is actually equivalent to:
> 
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ 
> TransactionIdIsNormal(onerel->rd_rel->relfrozenxid);
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ 
> MultiXactIdIsValid(onerel->rd_rel->relminmxid));
Yes, Thank you for your points out, It's simpler.

> 
> +/* clean temp relation files */
> +if (max_active_gtt > 0)
> +RemovePgTempFiles();
> +
>  /*
>  
> I wonder why do we need some special check for GTT here.
> From my point of view cleanup at startup of local storage of temp tables 
> should be performed in the same way for local and global temp tables.
After oom kill, In autovacuum, the Isolated local temp table will be cleaned 
like orphan temporary tables. The definition of local temp table is deleted 
with the storage file. 
But GTT can not do that. So we have the this implementation in my patch.
If you have other solutions, please let me know.

> 
> 
> -new_rel_reltup->relfrozenxid = relfrozenxid;
> -new_rel_reltup->relminmxid = relminmxid;
> +/* global temp table not remember transaction info in catalog */
> +if (relpersistence == RELPERSISTENCE_GLOBAL_TEMP)
> +{
> +new_rel_reltup->relfrozenxid = InvalidTransactionId;
> +new_rel_reltup->relminmxid = InvalidMultiXactId;
> +}
> +else
> +{
> +new_rel_reltup->relfrozenxid = relfrozenxid;
> +new_rel_reltup->relminmxid = relminmxid;
> +}
> +
> 
> 
> Why do we need to do it for GTT?
> Did you check that there will be no problems with GTT in case of XID 
> wraparound?
> Right now if you create temp table and keep session open, then it will block 
> XID wraparound.
In my design
1 Because different sessions have different transaction information, I choose 
to store the transaction information of GTT in MyProc,not catalog.
2 About the XID wraparound problem, the reason is the design of the temp table 
storage(local temp table and global temp table) that makes it can not to do 
vacuum by autovacuum. 
It should be completely solve at the storage level.

> 
> +/* We allow to drop global temp table only this session use it */
> +if (RELATION_IS_GLOBAL_TEMP(rel))
> +{
> +if (is_other_backend_use_gtt(rel->rd_node))
> +elog(ERROR, "can not drop relation when other backend attached 
> this global temp table");
> +}
> +
> 
> Here we once again introduce incompatibility with normal (permanent) tables.
> Assume that DBA or programmer need to change format of GTT. But there are 
> some active sessions which have used this GTT sometime in the past.
> We will not be able to drop this GTT until all this sessions are terminated.
> I do not think that it is acceptable behaviour.
In fact, The dba can still complete the DDL of the GTT.
I've provided a set of functions for this case.
If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he 
needs to do:
1 Use the pg_gtt_attached_pids view to list the pids for the session that is 
using the GTT A.
2 Use pg_terminate_backend(pid)terminate they 

Re: [Proposal] Global temporary tables

2020-01-21 Thread ()


> 2020年1月22日 上午2:51,Pavel Stehule  写道:
> 
> 
> 
> út 21. 1. 2020 v 9:46 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
>> 2020年1月12日 上午4:27,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> Hi
>> 
>> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> napsal:
>> Hi all
>> 
>> This is the latest patch
>> 
>> The updates are as follows:
>> 1. Support global temp Inherit table global temp partition table
>> 2. Support serial column in GTT
>> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
>> 4. Provide view pg_gtt_attached_pids to manage GTT
>> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
>> 6. Alter GTT or rename GTT is allowed under some conditions
>> 
>> 
>> Please give me feedback.
>> 
>> I tested the functionality
>> 
>> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local 
>> temp tables).
> 
> ON COMMIT PRESERVE ROWS is default mode now.
> 
> Thank you
> 
> * I tried to create global temp table with index. When I tried to drop this 
> table (and this table was used by second instance), then I got error message
> 
> postgres=# drop table foo;
> ERROR:  can not drop index when other backend attached this global temp table
> 
> It is expected, but it is not too much user friendly. Is better to check if 
> you can drop table, then lock it, and then drop all objects.
I don't understand what needs to be improved. Could you describe it in detail?

> 
> * tab complete can be nice for CREATE GLOBAL TEMP table
Yes, I will improve it.
> 
> \dt+ \di+ doesn't work correctly, or maybe I don't understand to the 
> implementation.
> 

postgres=# create table t(a int primary key);
CREATE TABLE
postgres=# create global temp table gt(a int primary key);
CREATE TABLE
postgres=# insert into t values(generate_series(1,1));
INSERT 0 1
postgres=# insert into gt values(generate_series(1,1));
INSERT 0 1

postgres=# \dt+
List of relations
 Schema | Name | Type  |Owner| Persistence |  Size  | Description 
+--+---+-+-++-
 public | gt   | table | wenjing.zwj | session | 384 kB | 
 public | t| table | wenjing.zwj | permanent   | 384 kB | 
(2 rows)

postgres=# \di+
  List of relations
 Schema |  Name   | Type  |Owner| Table | Persistence |  Size  | 
Description 
+-+---+-+---+-++-
 public | gt_pkey | index | wenjing.zwj | gt| session | 240 kB | 
 public | t_pkey  | index | wenjing.zwj | t | permanent   | 240 kB | 
(2 rows)


> I see same size in all sessions. Global temp tables shares same files?
No, they use their own files.
But \dt+ \di+ counts the total file sizes in all sessions for each GTT.



Wenjing

> 
> Regards
> 
> Pavel
> 
> 
> 
> 
> 
> Wenjing
> 
> 
> 
>> 
>> I tested some simple scripts 
>> 
>> test01.sql
>> 
>> CREATE TEMP TABLE foo(a int, b int);
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DROP TABLE foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, the table pg_attribute has 3.2MB
>> and 64 tps, 6446 transaction
>> 
>> test02.sql
>> 
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DELETE FROM foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, 1688 tps, 168830 transactions
>> 
>> So performance is absolutely different as we expected.
>> 
>> From my perspective, this functionality is great.
>> 
>> Todo:
>> 
>> pg_table_size function doesn't work
>> 
>> Regards
>> 
>> Pavel
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> 
>>> 2020年1月6日 上午4:06,Tomas Vondra >> <mailto:tomas.von...@2ndquadrant.com>> 写道:
>>> 
>>> Hi,
>>> 
>>> I think we need to do something with having two patches aiming to add
>>> global temporary tables:
>>> 
>>> [1] https://commitfest.postgresql.org/26/2349/ 
>>> <https://commitfest.postgresql.org/26/2349/>
>>> 
>>> [2] https://commitfest.postgresql.org/26/2233/ 
>>> <https://commitfest.postgresql.org/26/2233/>
>>> 
>>> As a reviewer I ha

Re: [Proposal] Global temporary tables

2020-01-21 Thread ()


> 2020年1月21日 下午1:43,Pavel Stehule  写道:
> 
> Hi
> 
> I have a free time this evening, so I will check this patch
> 
> I have a one question
> 
> + /* global temp table get relstats from localhash */
> + if (RELATION_IS_GLOBAL_TEMP(rel))
> + {
> + get_gtt_relstats(RelationGetRelid(rel),
> + , , ,
> + NULL, NULL);
> + }
> + else
> + {
> + /* coerce values in pg_class to more desirable types */
> + relpages = (BlockNumber) rel->rd_rel->relpages;
> + reltuples = (double) rel->rd_rel->reltuples;
> + relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
> + }
> 
> Isbn't possible to fill the rd_rel structure too, so this branching can be 
> reduced?
I'll make some improvements to optimize this part of the code.

> 
> Regards
> 
> Pavel
> 
> po 20. 1. 2020 v 17:27 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
> > 2020年1月20日 上午1:32,Erik Rijkers mailto:e...@xs4all.nl>> 写道:
> > 
> > On 2020-01-19 18:04, 曾文旌(义从) wrote:
> >>> 2020年1月14日 下午9:20,Pavel Stehule  >>> <mailto:pavel.steh...@gmail.com>> 写道:
> >>> út 14. 1. 2020 v 14:09 odesílatel 曾文旌(义从)  >>> <mailto:wenjing@alibaba-inc.com> <mailto:wenjing@alibaba-inc.com 
> >>> <mailto:wenjing@alibaba-inc.com>>> napsal:
> > 
> >>> [global_temporary_table_v4-pg13.patch ]
> > 
> > Hi,
> > 
> > This patch doesn't quiet apply for me:
> > 
> > patching file src/backend/access/common/reloptions.c
> > patching file src/backend/access/gist/gistutil.c
> > patching file src/backend/access/hash/hash.c
> > Hunk #1 succeeded at 149 (offset 3 lines).
> > patching file src/backend/access/heap/heapam_handler.c
> > patching file src/backend/access/heap/vacuumlazy.c
> > patching file src/backend/access/nbtree/nbtpage.c
> > patching file src/backend/access/table/tableam.c
> > patching file src/backend/access/transam/xlog.c
> > patching file src/backend/catalog/Makefile
> > Hunk #1 FAILED at 44.
> > 1 out of 1 hunk FAILED -- saving rejects to file 
> > src/backend/catalog/Makefile.rej
> > [...]
> >   (The rest applies without errors)
> > 
> > src/backend/catalog/Makefile.rej contains:
> > 
> > 
> > --- src/backend/catalog/Makefile
> > +++ src/backend/catalog/Makefile
> > @@ -44,6 +44,8 @@ OBJS = \
> >   storage.o \
> >   toasting.o
> > 
> > +OBJS += storage_gtt.o
> > +
> > BKIFILES = postgres.bki postgres.description postgres.shdescription
> > 
> > include $(top_srcdir)/src/backend/common.mk <http://common.mk/>
> > 
> > 
> > Can you have a look?
> I updated the code and remade the patch.
> Please give me feedback if you have any more questions.
> 
> 
> 
> 
> > 
> > 
> > thanks,
> > 
> > Erik Rijkers
> > 
> > 
> > 
> > 
> > 
> 



Re: [Proposal] Global temporary tables

2020-01-15 Thread ()



> 2020年1月13日 下午4:08,Konstantin Knizhnik  写道:
> 
> 
> 
> On 12.01.2020 4:51, Tomas Vondra wrote:
>> On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>>> 
>>> 
>>> On 09.01.2020 19:48, Tomas Vondra wrote:
 
> The most complex and challenged task is to support GTT for all kind of 
> indexes. Unfortunately I can not proposed some good universal solution 
> for it.
> Just patching all existed indexes implementation seems to be the only 
> choice.
> 
 
 I haven't looked at the indexing issue closely, but IMO we need to
 ensure that every session sees/uses only indexes on GTT that were
 defined before the seesion started using the table.
>>> 
>>> Why? It contradicts with behavior of normal tables.
>>> Assume that you have active clients and at some point of time DBA 
>>> recognizes that them are spending to much time in scanning some GTT.
>>> It cab create index for this GTT but if existed client will not be able to 
>>> use this index, then we need somehow make this clients to restart their 
>>> sessions?
>>> In my patch I have implemented building indexes for GTT on demand: if 
>>> accessed index on GTT is not yet initialized, then it is filled with local 
>>> data.
>> 
>> Yes, I know the behavior would be different from behavior for regular
>> tables. And yes, it would not allow fixing slow queries in sessions
>> without interrupting those sessions.
>> 
>> I proposed just ignoring those new indexes because it seems much simpler
>> than alternative solutions that I can think of, and it's not like those
>> other solutions don't have other issues.
> 
> Quit opposite: prohibiting sessions to see indexes created before session 
> start to use GTT requires more efforts. We need to somehow maintain and check 
> GTT first access time.
> 
>> 
>> For example, I've looked at the "on demand" building as implemented in
>> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>> calls into various places in index code seems somewht suspicious.
> 
> We in any case has to initialize GTT indexes on demand even if we prohibit 
> usages of indexes created after first access by session to GTT.
> So the difference is only in one thing: should we just initialize empty index 
> or populate it with local data (if rules for index usability are the same for 
> GTT as for normal tables).
> From implementation point of view there is no big difference. Actually 
> building index in standard way is even simpler than constructing empty index. 
> Originally I have implemented
> first approach (I just forgot to consider case when GTT was already user by a 
> session). Then I rewrited it using second approach and patch even became 
> simpler.
> 
>> 
>> * brinbuild is added to brinRevmapInitialize, which is meant to
>>   initialize state for scanning. It seems wrong to build the index we're
>>   scanning from this function (layering and all that).
>> 
>> * btbuild is called from _bt_getbuf. That seems a bit ... suspicious?
> 
> 
> As I already mentioned - support of indexes for GTT is one of the most 
> challenged things in my patch.
> I didn't find good and universal solution. So I agreed that call of btbuild 
> from _bt_getbuf may be considered as suspicious.
> I will be pleased if you or sombody else can propose better elternative and 
> not only for B-Tree, but for all other indexes.
> 
> But as I already wrote above, prohibiting session to used indexes created 
> after first access to GTT doesn't solve the problem.
> For normal tables (and for local temp tables) indexes are initialized at the 
> time of their creation.
> With GTT it doesn't work, because each session has its own local data of GTT.
> We should either initialize/build index on demand (when it is first 
> accessed), either at the moment of session start initialize indexes for all 
> existed GTTs.
> Last options seem to be much worser from my point of view: there may me huge 
> number of GTT and session may not need to access GTT at all.
>> 
>> ... and so on for other index types. Also, what about custom indexes
>> implemented in extensions? It seems a bit strange each of them has to
>> support this separately.
> 
> I have already complained about it: my patch supports GTT for all built-in 
> indexes, but custom indexes has to handle it themselves.
> Looks like to provide some generic solution we need to extend index API, 
> providing two diffrent operations: creation and initialization.
> But extending index API is very critical change... And also it doesn't solve 
> the problem with all existed extensions: them in any case have
> to be rewritten to implement new API version in order to support GTT.
>> 
>> IMHO if this really is the right solution, we need to make it work for
>> existing indexes without having to tweak them individually. Why don't we
>> track a flag whether an index on GTT was initialized in a given session,
>> and if it was not then call the build function before calling 

Re: [Proposal] Global temporary tables

2020-01-14 Thread ()


> 2020年1月14日 下午9:20,Pavel Stehule  写道:
> 
> 
> 
> út 14. 1. 2020 v 14:09 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> Thank you for review my patch.
> 
> 
>> 2020年1月12日 上午4:27,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> Hi
>> 
>> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从) > <mailto:wenjing@alibaba-inc.com>> napsal:
>> Hi all
>> 
>> This is the latest patch
>> 
>> The updates are as follows:
>> 1. Support global temp Inherit table global temp partition table
>> 2. Support serial column in GTT
>> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
>> 4. Provide view pg_gtt_attached_pids to manage GTT
>> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
>> 6. Alter GTT or rename GTT is allowed under some conditions
>> 
>> 
>> Please give me feedback.
>> 
>> I tested the functionality
>> 
>> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local 
>> temp tables).
> makes sense, I will fix it.
> 
>> 
>> I tested some simple scripts 
>> 
>> test01.sql
>> 
>> CREATE TEMP TABLE foo(a int, b int);
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DROP TABLE foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, the table pg_attribute has 3.2MB
>> and 64 tps, 6446 transaction
>> 
>> test02.sql
>> 
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DELETE FROM foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, 1688 tps, 168830 transactions
>> 
>> So performance is absolutely different as we expected.
>> 
>> From my perspective, this functionality is great.
> Yes, frequent ddl causes catalog bloat, GTT avoids this problem.
> 
>> 
>> Todo:
>> 
>> pg_table_size function doesn't work
> Do you mean that function pg_table_size() need get the storage space used by 
> the one GTT in the entire db(include all session) .
> 
> It's question how much GTT tables should be similar to classic tables. But 
> the reporting in psql should to work \dt+, \l+, \di+
Get it, I will fix it.
> 
> 
> 
>> 
>> Regards
>> 
>> Pavel
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> 
>>> 2020年1月6日 上午4:06,Tomas Vondra >> <mailto:tomas.von...@2ndquadrant.com>> 写道:
>>> 
>>> Hi,
>>> 
>>> I think we need to do something with having two patches aiming to add
>>> global temporary tables:
>>> 
>>> [1] https://commitfest.postgresql.org/26/2349/ 
>>> <https://commitfest.postgresql.org/26/2349/>
>>> 
>>> [2] https://commitfest.postgresql.org/26/2233/ 
>>> <https://commitfest.postgresql.org/26/2233/>
>>> 
>>> As a reviewer I have no idea which of the threads to look at - certainly
>>> not without reading both threads, which I doubt anyone will really do.
>>> The reviews and discussions are somewhat intermixed between those two
>>> threads, which makes it even more confusing.
>>> 
>>> I think we should agree on a minimal patch combining the necessary/good
>>> bits from the various patches, and terminate one of the threads (i.e.
>>> mark it as rejected or RWF). And we need to do that now, otherwise
>>> there's about 0% chance of getting this into v13.
>>> 
>>> In general, I agree with the sentiment Rober expressed in [1] - the
>>> patch needs to be as small as possible, not adding "nice to have"
>>> features (like support for parallel queries - I very much doubt just
>>> using shared instead of local buffers is enough to make it work.)
>>> 
>>> regards
>>> 
>>> -- 
>>> Tomas Vondra  http://www.2ndQuadrant.com 
>>> <http://www.2ndquadrant.com/>
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> 
> 



Re: [Proposal] Global temporary tables

2020-01-14 Thread ()



> 2020年1月12日 上午9:14,Tomas Vondra  写道:
> 
> On Fri, Jan 10, 2020 at 03:24:34PM +0300, Konstantin Knizhnik wrote:
>> 
>> 
>> On 09.01.2020 19:30, Tomas Vondra wrote:
>> 
>> 
>>> 
 
> 
>> 3 Still no one commented on GTT's transaction information processing, 
>> they include
>> 3.1 Should gtt's frozenxid need to be care?
>> 3.2 gtt’s clog clean
>> 3.3 How to deal with "too old" gtt data
>> 
> 
> No idea what to do about this.
> 
 
 I wonder what is the specific of GTT here?
 The same problem takes place for normal (local) temp tables, doesn't it?
 
>>> 
>>> Not sure. TBH I'm not sure I understand what the issue actually is.
>> 
>> Just open session, create temporary table and insert some data in it.
>> Then in other session run 2^31 transactions (at my desktop it takes about 2 
>> hours).
>> As far as temp tables are not proceeded by vacuum, database is stalled:
>> 
>>  ERROR:  database is not accepting commands to avoid wraparound data loss in 
>> database "postgres"
>> 
>> It seems to be quite dubious behavior and it is strange to me that nobody 
>> complains about it.
>> We discuss  many issues related with temp tables (statistic, parallel 
>> queries,...) which seems to be less critical.
>> 
>> But this problem is not specific to GTT - it can be reproduced with normal 
>> (local) temp tables.
>> This is why I wonder why do we need to solve it in GTT patch.
>> 
> 
> Yeah, I think that's out of scope for GTT patch. Once we solve it for
> plain temporary tables, we'll solve it for GTT too.
1. The core problem is that the data contains transaction information (xid), 
which needs to be vacuum(freeze) regularly to avoid running out of xid.
The autovacuum supports vacuum regular table but local temp does not. 
autovacuum also does not support GTT.

2. However, the difference between the local temp table and the global temp 
table(GTT) is that
a) For local temp table: one table hava one piece of data. the frozenxid of one 
local temp table is store in the catalog(pg_class). 
b) For global temp table: each session has a separate copy of data, one GTT may 
contain maxbackend frozenxid.
and I don't think it's a good idea to keep frozenxid of GTT in the 
catalog(pg_class). 
It becomes a question: how to handle GTT transaction information?

I agree that problem 1 should be completely solved by a some feature, such as 
local transactions. It is definitely not included in the GTT patch.

But, I think we need to ensure the durability of GTT data. For example, data in 
GTT cannot be lost due to the clog being cleaned up. It belongs to problem 2.



Wenjing


> 
> regards
> 
> -- 
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [Proposal] Global temporary tables

2020-01-14 Thread ()
Thank you for review my patch.


> 2020年1月12日 上午4:27,Pavel Stehule  写道:
> 
> Hi
> 
> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> Hi all
> 
> This is the latest patch
> 
> The updates are as follows:
> 1. Support global temp Inherit table global temp partition table
> 2. Support serial column in GTT
> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
> 4. Provide view pg_gtt_attached_pids to manage GTT
> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
> 6. Alter GTT or rename GTT is allowed under some conditions
> 
> 
> Please give me feedback.
> 
> I tested the functionality
> 
> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local 
> temp tables).
makes sense, I will fix it.

> 
> I tested some simple scripts 
> 
> test01.sql
> 
> CREATE TEMP TABLE foo(a int, b int);
> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DROP TABLE foo; -- simulate disconnect
> 
> 
> after 100 sec, the table pg_attribute has 3.2MB
> and 64 tps, 6446 transaction
> 
> test02.sql
> 
> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DELETE FROM foo; -- simulate disconnect
> 
> 
> after 100 sec, 1688 tps, 168830 transactions
> 
> So performance is absolutely different as we expected.
> 
> From my perspective, this functionality is great.
Yes, frequent ddl causes catalog bloat, GTT avoids this problem.

> 
> Todo:
> 
> pg_table_size function doesn't work
Do you mean that function pg_table_size() need get the storage space used by 
the one GTT in the entire db(include all session) .

> 
> Regards
> 
> Pavel
> 
> 
> Wenjing
> 
> 
> 
> 
> 
>> 2020年1月6日 上午4:06,Tomas Vondra > <mailto:tomas.von...@2ndquadrant.com>> 写道:
>> 
>> Hi,
>> 
>> I think we need to do something with having two patches aiming to add
>> global temporary tables:
>> 
>> [1] https://commitfest.postgresql.org/26/2349/ 
>> <https://commitfest.postgresql.org/26/2349/>
>> 
>> [2] https://commitfest.postgresql.org/26/2233/ 
>> <https://commitfest.postgresql.org/26/2233/>
>> 
>> As a reviewer I have no idea which of the threads to look at - certainly
>> not without reading both threads, which I doubt anyone will really do.
>> The reviews and discussions are somewhat intermixed between those two
>> threads, which makes it even more confusing.
>> 
>> I think we should agree on a minimal patch combining the necessary/good
>> bits from the various patches, and terminate one of the threads (i.e.
>> mark it as rejected or RWF). And we need to do that now, otherwise
>> there's about 0% chance of getting this into v13.
>> 
>> In general, I agree with the sentiment Rober expressed in [1] - the
>> patch needs to be as small as possible, not adding "nice to have"
>> features (like support for parallel queries - I very much doubt just
>> using shared instead of local buffers is enough to make it work.)
>> 
>> regards
>> 
>> -- 
>> Tomas Vondra  http://www.2ndQuadrant.com 
>> <http://www.2ndquadrant.com/>
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 



Re: [Proposal] Global temporary tables

2020-01-07 Thread ()


> 2020年1月6日 下午8:17,Dean Rasheed  写道:
> 
> On Mon, 6 Jan 2020 at 11:01, Tomas Vondra  
> wrote:
>> 
>> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>> 
>>> 2 We feel that gtt needs to maintain statistics, but there is no
>>> agreement on what it will be done.
>>> 
>> 
>> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
>> to poor query plans.
> 
> +1
> 
>> AFAIK the current patch stores the info in a hash
>> table in a backend private memory, and I don't see how else to do that
>> (e.g. storing it in a catalog would cause catalog bloat).
>> 
> 
> It sounds like it needs a pair of system GTTs to hold the table and
> column statistics for other GTTs. One would probably have the same
> columns as pg_statistic, and the other just the relevant columns from
> pg_class. I can see it being useful for the user to be able to see
> these stats, so perhaps they could be UNIONed into the existing stats
> view.
The current patch provides several functions as extension(pg_gtt) for read gtt 
statistics. 
Next I can move them to the kernel and let the view pg_stats can see gtt’s 
statistics.


> Regards,
> Dean



Re: [Proposal] Global temporary tables

2020-01-05 Thread ()
In the previous communication

1 we agreed on the general direction
1.1 gtt use local (private) buffer
1.2 no replica access in first version

2 We feel that gtt needs to maintain statistics, but there is no agreement on 
what it will be done.

3 Still no one commented on GTT's transaction information processing, they 
include
3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data

I suggest we discuss further, reach an agreement, and merge the two patches to 
one.


Wenjing


> 2020年1月6日 上午4:06,Tomas Vondra  写道:
> 
> Hi,
> 
> I think we need to do something with having two patches aiming to add
> global temporary tables:
> 
> [1] https://commitfest.postgresql.org/26/2349/
> 
> [2] https://commitfest.postgresql.org/26/2233/
> 
> As a reviewer I have no idea which of the threads to look at - certainly
> not without reading both threads, which I doubt anyone will really do.
> The reviews and discussions are somewhat intermixed between those two
> threads, which makes it even more confusing.
> 
> I think we should agree on a minimal patch combining the necessary/good
> bits from the various patches, and terminate one of the threads (i.e.
> mark it as rejected or RWF). And we need to do that now, otherwise
> there's about 0% chance of getting this into v13.
> 
> In general, I agree with the sentiment Rober expressed in [1] - the
> patch needs to be as small as possible, not adding "nice to have"
> features (like support for parallel queries - I very much doubt just
> using shared instead of local buffers is enough to make it work.)
> 
> regards
> 
> -- 
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [Proposal] Global temporary tables

2019-11-08 Thread ()
My comments for global_private_temp-4.patch

good side:
1 Lots of  index type on GTT. I think we need support for all kinds of indexes.
2 serial column on GTT.
3 INHERITS GTT.
4 PARTITION GTT.

I didn't choose to support them in the first release, but you did.

Other side:
1 case: create global temp table gtt2(a int primary key, b text) on commit 
delete rows;
I think you've lost the meaning of the on commit delete rows clause.
After the GTT is created, the other sessions feel that this is an on commit 
PRESERVE rows GTT.

2 truncate gtt, mybe this is a bug in DropRelFileNodeBuffers.
GTT's local buffer is not released.
Case:
postgres=# insert into gtt2 values(1,'xx');
INSERT 0 1
postgres=# truncate gtt2;
TRUNCATE TABLE
postgres=# insert into gtt2 values(1,'xx');
ERROR:  unexpected data beyond EOF in block 0 of relation base/13579/t3_16384
HINT:  This has been seen to occur with buggy kernels; consider updating your 
system.

3  lock type of truncate GTT.
I don't think it's a good idea to hold a big lock with truncate GTT, because it 
only needs to process private data.

4 GTT's ddl Those ddl that need to rewrite data files may need attention.
We have discussed in the previous email. This is why I used shared hash to 
track the GTT file.


5 There will be problems with DDL that will change relfilenode. Such as cluster 
GTT ,vacuum full GTT.
A session completes vacuum full gtt(a), and other sessions will immediately 
start reading and writing new storage files and existing data is also lost.
I disable them in my current version.

6 drop GTT
I think drop GTT should clean up all storage files and definitions. How do you 
think?

7 MVCC visibility clog clean
GTT data visibility rules, like regular tables, so GTT also need clog.
We need to avoid the clog that GTT needs to be cleaned up. 
At the same time, GTT does not do autovacuum, and retaining "too old data" will 
cause wraparound data loss.
I have given a solution in my design.


Zeng Wenjing

> 2019年11月1日 下午11:15,Konstantin Knizhnik  写道:
> 
> 
> 
> On 25.10.2019 20:00, Pavel Stehule wrote:
>> 
>> >
>> >> So except the limitation mentioned above (which I do not consider as 
>> >> critical) there is only one problem which was not addressed: maintaining 
>> >> statistics for GTT.
>> >> If all of the following conditions are true:
>> >>
>> >> 1) GTT are used in joins
>> >> 2) There are indexes defined for GTT
>> >> 3) Size and histogram of GTT in different backends can significantly vary.
>> >> 4) ANALYZE was explicitly called for GTT
>> >>
>> >> then query execution plan built in one backend will be also used for 
>> >> other backends where it can be inefficient.
>> >> I also do not consider this problem as "show stopper" for adding GTT to 
>> >> Postgres.
>> > I think that's *definitely* a show stopper.
>> Well, if both you and Pavel think that it is really "show stopper", then 
>> this problem really has to be addressed.
>> I slightly confused about this opinion, because Pavel has told me 
>> himself that 99% of users never create indexes for temp tables
>> or run "analyze" for them. And without it, this problem is not a problem 
>> at all.
>> 
>> 
>> Users doesn't do ANALYZE on temp tables in 99%. It's true. But second fact 
>> is so users has lot of problems. It's very similar to wrong statistics on 
>> persistent tables. When data are small, then it is not problem for users, 
>> although from my perspective it's not optimal. When data are not small, then 
>> the problem can be brutal. Temporary tables are not a exception. And users 
>> and developers are people - we know only about fatal problems. There are lot 
>> of unoptimized queries, but because the problem is not fatal, then it is not 
>> reason for report it. And lot of people has not any idea how fast the 
>> databases can be. The knowledges of  users and app developers are sad book.
>> 
>> Pavel
> 
> It seems to me that I have found quite elegant solution for per-backend 
> statistic for GTT: I just inserting it in backend's catalog cache, but not in 
> pg_statistic table itself.
> To do it I have to add InsertSysCache/InsertCatCache functions which insert 
> pinned entry in the correspondent cache.
> I wonder if there are some pitfalls of such approach?
> 
> New patch for GTT is attached.
> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com 
> 
> The Russian Postgres Company 
> 



Re: [Proposal] Global temporary tables

2019-11-07 Thread ()


> 2019年11月7日 下午5:40,Pavel Stehule  写道:
> 
> 
> 
> čt 7. 11. 2019 v 10:30 odesílatel 曾文旌(义从)  <mailto:wenjing@alibaba-inc.com>> napsal:
> 
> 
> > 2019年11月7日 上午12:08,Konstantin Knizhnik  > <mailto:k.knizh...@postgrespro.ru>> 写道:
> > 
> > 
> > 
> > On 06.11.2019 16:24, 曾文旌(义从) wrote:
> >> Dear Hackers
> >> 
> >> 
> >> I attached the patch of GTT implementationI base on PG12.
> >> The GTT design came from my first email.
> >> Some limitations in patch will be eliminated in later versions.
> >> 
> >> Later, I will comment on Konstantin's patch and make some proposals for 
> >> cooperation.
> >> Looking forward to your feedback.
> >> 
> >> Thanks.
> >> 
> >> Zeng Wenjing
> >> 
> > 
> > Thank you for this patch.
> > My first comments:
> > 
> > 1.  I have ported you patch to the latest Postgres version (my patch is 
> > attached).
> > 2. You patch is supporting only B-Tree index for GTT. All other indexes 
> > (hash, gin, gist, brin,...) are not currently supported.
> Currently I only support btree index.
> I noticed that your patch supports more index types, which is where I'd like 
> to work with you.
> 
> > 3. I do not understand the reason for the following limitation:
> > "We allow to create index on global temp table only this session use it"
> > 
> > First of all it seems to significantly reduce usage of global temp tables.
> > Why do we need GTT at all? Mostly because we need to access temporary data 
> > in more than one backend. Otherwise we can just use normal table.
> > If temp table is expected to be larger enough, so that we need to create 
> > index for it, then it is hard to believe that it will be needed only in one 
> > backend.
> > 
> > May be the assumption is that all indexes has to be created before GTT 
> > start to be used.
> Yes, Currently, GTT's index is only supported and created in an empty table 
> state, and other sessions are not using it.
> There has two improvements pointer:
> 1 Index can create on GTT(A) when the GTT(A)  in the current session is not 
> empty, requiring the GTT table to be empty in the other session.
> Index_build needs to be done in the current session just like a normal table. 
> This improvement is relatively easy.
> 
> 2 Index can create on GTT(A)  when more than one session are using this 
> GTT(A).
> Because when I'm done creating an index of the GTT in this session and 
> setting it to be an valid index, it's not true for the GTT in other sessions.
> Indexes on gtt in other sessions require "rebuild_index" before using it. 
> I don't have a better solution right now, maybe you have some suggestions.
> 
> I think so DDL operations can be implemented in some reduced form - so DDL 
> are active only for one session, and for other sessions are invisible. 
> Important is state of GTT object on session start. 
> 
> For example ALTER TABLE DROP COLUMN can has very fatal impact on other 
> sessions. So I think the best of GTT can be pattern - the structure of GTT 
> table is immutable for any session that doesn't do DDL operations.
Yes, Those ddl that need to rewrite data files will have this problem.
This is why I disabled alter GTT in the current version.
It can be improved, such as Alter GTT can also be allowed when only the current 
session is in use.
Users can also choose to kick off other sessions that are using gtt, then do 
alter GTT.
I provide a function(pg_gtt_attached_pid(relation, schema)) to query which 
session a GTT is being used by.

> 
> 
> 
> > But right now this check is not working correctly in any case - if you 
> > insert some data into the table, then
> > you can not create index any more:
> > 
> > postgres=# create global temp table gtt(x integer primary key, y integer);
> > CREATE TABLE
> > postgres=# insert into gtt values (generate_series(1,10), 
> > generate_series(1,10));
> > INSERT 0 10
> > postgres=# create index on gtt(y);
> > ERROR:  can not create index when have one or more backend attached this 
> > global temp table
> > 
> > I wonder why do you need such restriction?
> > 
> > 
> > -- 
> > Konstantin Knizhnik
> > Postgres Professional: http://www.postgrespro.com 
> > <http://www.postgrespro.com/>
> > The Russian Postgres Company
> > 
> > 
> 



Re: [Proposal] Global temporary tables

2019-11-07 Thread ()



> 2019年11月7日 上午12:08,Konstantin Knizhnik  写道:
> 
> 
> 
> On 06.11.2019 16:24, 曾文旌(义从) wrote:
>> Dear Hackers
>> 
>> 
>> I attached the patch of GTT implementationI base on PG12.
>> The GTT design came from my first email.
>> Some limitations in patch will be eliminated in later versions.
>> 
>> Later, I will comment on Konstantin's patch and make some proposals for 
>> cooperation.
>> Looking forward to your feedback.
>> 
>> Thanks.
>> 
>> Zeng Wenjing
>> 
> 
> Thank you for this patch.
> My first comments:
> 
> 1.  I have ported you patch to the latest Postgres version (my patch is 
> attached).
> 2. You patch is supporting only B-Tree index for GTT. All other indexes 
> (hash, gin, gist, brin,...) are not currently supported.
Currently I only support btree index.
I noticed that your patch supports more index types, which is where I'd like to 
work with you.

> 3. I do not understand the reason for the following limitation:
> "We allow to create index on global temp table only this session use it"
> 
> First of all it seems to significantly reduce usage of global temp tables.
> Why do we need GTT at all? Mostly because we need to access temporary data in 
> more than one backend. Otherwise we can just use normal table.
> If temp table is expected to be larger enough, so that we need to create 
> index for it, then it is hard to believe that it will be needed only in one 
> backend.
> 
> May be the assumption is that all indexes has to be created before GTT start 
> to be used.
Yes, Currently, GTT's index is only supported and created in an empty table 
state, and other sessions are not using it.
There has two improvements pointer:
1 Index can create on GTT(A) when the GTT(A)  in the current session is not 
empty, requiring the GTT table to be empty in the other session.
Index_build needs to be done in the current session just like a normal table. 
This improvement is relatively easy.

2 Index can create on GTT(A)  when more than one session are using this GTT(A).
Because when I'm done creating an index of the GTT in this session and setting 
it to be an valid index, it's not true for the GTT in other sessions.
Indexes on gtt in other sessions require "rebuild_index" before using it. 
I don't have a better solution right now, maybe you have some suggestions.


> But right now this check is not working correctly in any case - if you insert 
> some data into the table, then
> you can not create index any more:
> 
> postgres=# create global temp table gtt(x integer primary key, y integer);
> CREATE TABLE
> postgres=# insert into gtt values (generate_series(1,10), 
> generate_series(1,10));
> INSERT 0 10
> postgres=# create index on gtt(y);
> ERROR:  can not create index when have one or more backend attached this 
> global temp table
> 
> I wonder why do you need such restriction?
> 
> 
> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
> 
> 





Re: [Proposal] Global temporary tables

2019-10-28 Thread ()



> 2019年10月26日 上午12:22,Konstantin Knizhnik  写道:
> 
> 
> 
> On 25.10.2019 18:01, Robert Haas wrote:
>> On Fri, Oct 11, 2019 at 9:50 AM Konstantin Knizhnik
>>  wrote:
>>> Just to clarify.
>>> I have now proposed several different solutions for GTT:
>>> 
>>> Shared vs. private buffers for GTT:
>>> 1. Private buffers. This is least invasive patch, requiring no changes in 
>>> relfilenodes.
>>> 2. Shared buffers. Requires changing relfilenode but supports parallel 
>>> query execution for GTT.
>> I vote for #1. I think parallel query for temp objects may be a
>> desirable feature, but I don't think it should be the job of a patch
>> implementing GTTs to make it happen. In fact, I think it would be an
>> actively bad idea, because I suspect that if we do eventually support
>> temp relations for parallel query, we're going to want a solution that
>> is shared between regular temp tables and global temp tables, not
>> separate solutions for each.
> 
> Sorry, may be I do not not understand you.
> It seems to me that there is only one thing preventing usage of temporary 
> tables in parallel plans: private buffers.
> If global temporary tables are accessed as normal tables though shared 
> buffers then them can be used in parallel queries
> and no extra support is required for it.
> At least I have checked that parallel queries are correctly worked for my 
> implementation of GTT with shared buffers.
> So I do not understand about which "separate solutions" you are talking about.
> 
> I can agree that private buffers may be  good starting point for GTT 
> implementation, because it is less invasive and GTT access speed is exactly 
> the same as of normal temp tables.
> But I do not understand your argument why it is "actively bad idea".
> 
>>> Access to GTT at replica:
>>> 1. Access is prohibited (as for original temp tables). No changes at all.
>>> 2. Tuples of temp tables are marked with forzen XID.  Minimal changes, 
>>> rollbacks are not possible.
>>> 3. Providing special XIDs for GTT at replica. No changes in CLOG are 
>>> required, but special MVCC visibility rules are used for GTT. Current 
>>> limitation: number of transactions accessing GTT at replica is limited by 
>>> 2^32
>>> and bitmap of correspondent size has to be maintained (tuples of GTT are 
>>> not proceeded by vacuum and not frozen, so XID horizon never moved).
>> I again vote for #1. A GTT is defined to allow data to be visible only
>> within one session -- so what does it even mean for the data to be
>> accessible on a replica?
> 
> There are sessions at replica (in case of hot standby), aren't there?
> 
>> 
>>> So except the limitation mentioned above (which I do not consider as 
>>> critical) there is only one problem which was not addressed: maintaining 
>>> statistics for GTT.
>>> If all of the following conditions are true:
>>> 
>>> 1) GTT are used in joins
>>> 2) There are indexes defined for GTT
>>> 3) Size and histogram of GTT in different backends can significantly vary.
>>> 4) ANALYZE was explicitly called for GTT
>>> 
>>> then query execution plan built in one backend will be also used for other 
>>> backends where it can be inefficient.
>>> I also do not consider this problem as "show stopper" for adding GTT to 
>>> Postgres.
>> I think that's *definitely* a show stopper.
> Well, if both you and Pavel think that it is really "show stopper", then this 
> problem really has to be addressed.
> I slightly confused about this opinion, because Pavel has told me himself 
> that 99% of users never create indexes for temp tables
> or run "analyze" for them. And without it, this problem is not a problem at 
> all.
> 
>>> I still do not understand the opinion of community which functionality of 
>>> GTT is considered to be most important.
>>> But the patch with local buffers and no replica support is small enough to 
>>> become good starting point.
>> Well, it seems we now have two patches for this feature. I guess we
>> need to figure out which one is better, and whether it's possible for
>> the two efforts to be merged, rather than having two different teams
>> hacking on separate code bases.
> 
> I am open for cooperations.
> Source code of all my patches is available.
We are also willing to cooperate to complete this feature.
Let me prepare the code(merge code to pg12) and up to community, then see how 
we work together.

> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
> 
> 
> 





Re: [Proposal] Global temporary tables

2019-10-17 Thread ()


> 2019年10月11日 下午9:50,Konstantin Knizhnik  写道:
> 
> 
> 
> On 11.10.2019 15:15, 曾文旌(义从) wrote:
>> Dear Hackers,
>> 
>> This propose a way to develop global temporary tables in PostgreSQL.
>> 
>> I noticed that there is an "Allow temporary tables to exist as empty by 
>> default in all sessions" in the postgresql todolist.
>> https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>
>> 
>> In recent years, PG community had many discussions about global temp table 
>> (GTT) support. Previous discussion covered the following topics: 
>> (1)  The main benefit or function: GTT offers features like “persistent 
>> schema, ephemeral data”, which avoids catalog bloat and reduces catalog 
>> vacuum. 
>> (2)  Whether follows ANSI concept of temporary tables
>> (3)  How to deal with statistics, single copy of schema definition, relcache
>> (4)  More can be seen in 
>> https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
>>  
>> <https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru>
>> (5)  A recent implementation and design from Konstantin Knizhnik covered 
>> many functions of GTT: 
>> https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch
>>  
>> <https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>
>> 
>> However, as pointed by Konstantin himself, the implementation still needs 
>> functions related to CLOG, vacuum, and MVCC visibility.
>> 
> 
> Just to clarify.
> I have now proposed several different solutions for GTT:
> 
> Shared vs. private buffers for GTT:
> 1. Private buffers. This is least invasive patch, requiring no changes in 
> relfilenodes.
> 2. Shared buffers. Requires changing relfilenode but supports parallel query 
> execution for GTT.
> 
> Access to GTT at replica:
> 1. Access is prohibited (as for original temp tables). No changes at all.
> 2. Tuples of temp tables are marked with forzen XID.  Minimal changes, 
> rollbacks are not possible.
> 3. Providing special XIDs for GTT at replica. No changes in CLOG are 
> required, but special MVCC visibility rules are used for GTT. Current 
> limitation: number of transactions accessing GTT at replica is limited by 2^32
> and bitmap of correspondent size has to be maintained (tuples of GTT are not 
> proceeded by vacuum and not frozen, so XID horizon never moved).
> 
> So except the limitation mentioned above (which I do not consider as 
> critical) there is only one problem which was not addressed: maintaining 
> statistics for GTT. 
> If all of the following conditions are true:
> 
> 1) GTT are used in joins
> 2) There are indexes defined for GTT
> 3) Size and histogram of GTT in different backends can significantly vary. 
> 4) ANALYZE was explicitly called for GTT
> 
> then query execution plan built in one backend will be also used for other 
> backends where it can be inefficient.
> I also do not consider this problem as "show stopper" for adding GTT to 
> Postgres.
When session A writes 1000 rows of data to gtt X, session B also uses X at 
the same time and it has 100 rows of different data. If B uses analyze to count 
the statistics of 10 rows of data and updates it to catalog.
Obviously, session A will get inaccurate query plan based on misaligned 
statistics when calculating the query plan for X related queries. Session A may 
think that table X is too small to be worth using index scan, but it is not. 
Each session needs to get the statistics of the self data to make the query 
plan.


> I still do not understand the opinion of community which functionality of GTT 
> is considered to be most important.
> But the patch with local buffers and no replica support is small enough to 
> become good starting point.
Yes ,the first step, we focus on complete basic functions of gtt (dml ddl index 
on gtt (MVCC visibility rules) storage).
Abnormal statistics can cause problems with index selection on gtt, so index on 
gtt and accurate statistical information is necessary.


> 
> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com 
> <http://www.postgrespro.com/>
> The Russian Postgres Company 



Re: [Proposal] Global temporary tables

2019-10-16 Thread ()


> 2019年10月12日 下午1:16,Pavel Stehule  写道:
> 
> 
> 
> pá 11. 10. 2019 v 15:50 odesílatel Konstantin Knizhnik 
> mailto:k.knizh...@postgrespro.ru>> napsal:
> 
> 
> On 11.10.2019 15:15, 曾文旌(义从) wrote:
>> Dear Hackers,
>> 
>> This propose a way to develop global temporary tables in PostgreSQL.
>> 
>> I noticed that there is an "Allow temporary tables to exist as empty by 
>> default in all sessions" in the postgresql todolist.
>> https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>
>> 
>> In recent years, PG community had many discussions about global temp table 
>> (GTT) support. Previous discussion covered the following topics: 
>> (1)  The main benefit or function: GTT offers features like “persistent 
>> schema, ephemeral data”, which avoids catalog bloat and reduces catalog 
>> vacuum. 
>> (2)  Whether follows ANSI concept of temporary tables
>> (3)  How to deal with statistics, single copy of schema definition, relcache
>> (4)  More can be seen in 
>> https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
>>  
>> <https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru>
>> (5)  A recent implementation and design from Konstantin Knizhnik covered 
>> many functions of GTT: 
>> https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch
>>  
>> <https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>
>> 
>> However, as pointed by Konstantin himself, the implementation still needs 
>> functions related to CLOG, vacuum, and MVCC visibility.
>> 
> 
> Just to clarify.
> I have now proposed several different solutions for GTT:
> 
> Shared vs. private buffers for GTT:
> 1. Private buffers. This is least invasive patch, requiring no changes in 
> relfilenodes.
> 2. Shared buffers. Requires changing relfilenode but supports parallel query 
> execution for GTT.
> 
> This is important argument for using share buffers. Maybe the best is mix of 
> both - store files in temporal tablespace, but using share buffers. More, it 
> can be accessible for autovacuum.
> 
> Access to GTT at replica:
> 1. Access is prohibited (as for original temp tables). No changes at all.
> 2. Tuples of temp tables are marked with forzen XID.  Minimal changes, 
> rollbacks are not possible.
> 3. Providing special XIDs for GTT at replica. No changes in CLOG are 
> required, but special MVCC visibility rules are used for GTT. Current 
> limitation: number of transactions accessing GTT at replica is limited by 2^32
> and bitmap of correspondent size has to be maintained (tuples of GTT are not 
> proceeded by vacuum and not frozen, so XID horizon never moved).
> 
> So except the limitation mentioned above (which I do not consider as 
> critical) there is only one problem which was not addressed: maintaining 
> statistics for GTT. 
> If all of the following conditions are true:
> 
> 1) GTT are used in joins
> 2) There are indexes defined for GTT
> 3) Size and histogram of GTT in different backends can significantly vary. 
> 4) ANALYZE was explicitly called for GTT
> 
> then query execution plan built in one backend will be also used for other 
> backends where it can be inefficient.
> I also do not consider this problem as "show stopper" for adding GTT to 
> Postgres.
> 
> The last issue is show stopper in my mind. It really depends on usage. There 
> are situation where shared statistics are ok (and maybe good solution), and 
> other situation, where shared statistics are just unusable.
This proposal calculates and stores independent statistics(relpages reltuples 
and histogram of GTT) for the gtt data within each session, ensuring optimizer 
can get accurate statistics.


> Regards
> 
> Pavel
> 
> 
> 
> I still do not understand the opinion of community which functionality of GTT 
> is considered to be most important.
> But the patch with local buffers and no replica support is small enough to 
> become good starting point.
> 
> 
> -- 
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com 
> <http://www.postgrespro.com/>
> The Russian Postgres Company 



[Proposal] Global temporary tables

2019-10-11 Thread ()
Dear Hackers,

This propose a way to develop global temporary tables in PostgreSQL.

I noticed that there is an "Allow temporary tables to exist as empty by default 
in all sessions" in the postgresql todolist.
https://wiki.postgresql.org/wiki/Todo 

In recent years, PG community had many discussions about global temp table 
(GTT) support. Previous discussion covered the following topics: 
(1) The main benefit or function: GTT offers features like “persistent 
schema, ephemeral data”, which avoids catalog bloat and reduces catalog vacuum. 
(2) Whether follows ANSI concept of temporary tables
(3) How to deal with statistics, single copy of schema definition, relcache
(4) More can be seen in 
https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
(5) A recent implementation and design from Konstantin Knizhnik covered 
many functions of GTT: 
https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch
 


However, as pointed by Konstantin himself, the implementation still needs 
functions related to CLOG, vacuum, and MVCC visibility.

We developed GTT based on PG 11 and included most needed features, such as how 
to deal with concurrent DDL and DML operations, how to handle vacuum and too 
old relfrozenxids, and how to store and access GTT statistics. 

This design followed many suggestions from previous discussion in community. 
Here are some examples:
“have a separate 'relpersistence' setting for global temp tables…by 
having the backend id in all filename….   From Andres Freund
Use session memory context to store information related to GTT.   From 
Pavel Stehule
“extend the relfilenode mapper to support a backend-local 
non-persistent relfilenode map that's used to track temp table and index 
relfilenodes…” from Craig Ringer

Our implementation creates one record in pg_class for GTT’s schema definition. 
When rows are first inserted into the GTT in a session, a session specific file 
is created to store the GTT’s data. Those files are removed when the session 
ends. We maintain the GTT’s statistics in session local memory. DDL operations, 
such as DROP table or CREATE INDEX, can be executed on a GTT only by one 
session, while no other sessions insert any data into the GTT before or it is 
already truncated. This also avoids the concurrency of DML and DDL operations 
on GTT. We maintain a session level oldest relfrozenxids for GTT. This way, 
autovacuum or vacuum can truncate CLOG and increase global relfrozenxids based 
on all tables’ relfrozenxids, including GTT’s. 
The follows summarize the main design and implementation: 
Syntax: ON COMMIT PRESERVE ROWS and ON COMMIT DELETE ROWS
Data storage and buffering follows the same way as local temp table 
with a relfilenode including session id.
A hash table(A) in shared memory is used to track sessions and their 
usage of GTTs and to serialize DDL and DML operations. 
Another hash table(B) in session memory is introduced to record storage 
files for GTTs and their indexes. When a session ends, those files are removed. 
The same hash table(B) in session memory is used to record the 
relfrozenxids of each GTT. The oldest one is stored in myproc so that 
autovacuum and vacuum may use it to determine global oldest relfrozenxids and 
truncate clog. 
The same hash table(B) in session memory stores GTT’s session level 
statistics, It is generated during the operations of vacuum and analyze, and 
used by SQL optimizer to create execution plan. 
Some utility functions are added for DBA to manage GTTs. 
TRUNCATE command on a GTT behaves differently from that on a normal 
table. The command deletes the data immediately but keeps relfilenode using 
lower level table lock, RowExclusiveLock, instead of  AccessExclusiveLock. 
Main limits of this version or future improvement: need suggestions 
from community: 
1 VACUUM FULL and CLUSTER are not supported; any operations 
which may change relfilenode are disabled to GTT.
2 Sequence column is not supported in GTT for now.
3 Users defined statistics is not supported.


Details:

Requirement
The features list about global temp table:
1. global temp table (ON COMMIT clause is omitted, SQL specifies that 
the default behavior is ON COMMIT DELETE ROWS)
2. support with on commit DELETE ROWS
3. support with on commit PRESERVE ROWS
4. not support ON COMMIT DROP

Feature description
Global temp tables are defined just once and automatically exist (starting with 
empty contents) in every session that needs them.
Global temp table, each session use local buffer, read or write independent 
data files.
Use on commit DELETE ROWS for a transaction-specific global temp