from:"Kohei KaiGai"

Re: Using non-grouping-keys at HAVING clause

2023-09-08 Thread Kohei KaiGai

2023年9月8日(金) 19:07 Vik Fearing :
>
> On 9/8/23 09:42, Kohei KaiGai wrote:
> > Hello,
> >
> > I got a trouble report here:
> > https://github.com/heterodb/pg-strom/issues/636
> >
> > It says that PG-Strom raised an error when the HAVING clause used
> > non-grouping-keys,
> > even though the vanilla PostgreSQL successfully processed the query.
> >
> > SELECT MAX(c0) FROM t0 GROUP BY t0.c1 HAVING t0.c0 >
> > However, I'm not certain what is the right behavior here.
> > The "c0" column does not appear in the GROUP BY clause, thus we cannot
> > know its individual
> > values after the group-by stage, right?
>
> Wrong.  c1 is the primary key and so c0 is functionally dependent on it.
>   Grouping by the PK is equivalent to grouping by all of the columns in
> the table.
>
Wow! Thanks, I got the point. Indeed, it is equivalent to the grouping
by all the columns.

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Using non-grouping-keys at HAVING clause

2023-09-08 Thread Kohei KaiGai

Hello,

I got a trouble report here:
https://github.com/heterodb/pg-strom/issues/636

It says that PG-Strom raised an error when the HAVING clause used
non-grouping-keys,
even though the vanilla PostgreSQL successfully processed the query.

SELECT MAX(c0) FROM t0 GROUP BY t0.c1 HAVING t0.c0

Re: Asynchronous execution support for Custom Scan

2022-12-01 Thread Kohei KaiGai

> > IIUC, we already can use ctid in the where clause on the latest
> > PostgreSQL, can't we?
>
> Oh, sorry, I missed the TidRangeScan. My apologies for the noise.
>
I made the ctidscan extension when we developed CustomScan API
towards v9.5 or v9.6, IIRC. It would make sense just an example of
CustomScan API (e.g, PG-Strom code is too large and complicated
to learn about this API), however, makes no sense from the standpoint
of the functionality.

Best regards,

2022年12月1日(木) 22:27 Ronan Dunklau :
>
> > IIUC, we already can use ctid in the where clause on the latest
> > PostgreSQL, can't we?
>
> Oh, sorry, I missed the TidRangeScan. My apologies for the noise.
>
> Best regards,
>
> --
> Ronan Dunklau
>
>
>
>


-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Asynchronous execution support for Custom Scan

2022-08-23 Thread Kohei KaiGai

I internally suggested him to expand the ctidscan module for the PoC purpose.
https://github.com/kaigai/ctidscan

Even though it does not have asynchronous capability actually, but
suitable to ensure
API works and small enough for reviewing.

Best regards,

2022年8月22日(月) 17:55 Etsuro Fujita :
>
> Hi Onishi-san,
>
> On Sun, Aug 14, 2022 at 12:59 PM Kazutaka Onishi  wrote:
> > v1 patch occurs gcc warnings, I fixed it.
>
> Thanks for working on this!
>
> I'd like to review this (though, I'm not sure I can have time for it
> in the next commitfet), but I don't think we can review this without
> any example.  Could you provide it?  I think a simple example is
> better for ease of review.
>
> Best regards,
> Etsuro Fujita

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-13 Thread Kohei KaiGai

2021年4月14日(水) 0:00 Fujii Masao :
>
> On 2021/04/13 23:25, Kohei KaiGai wrote:
> > 2021年4月13日(火) 21:03 Bharath Rupireddy 
> > :
> >> Yeah, ONLY clause is not pushed to the remote server in case of SELECT
> >> commands. This is also true for DELETE and UPDATE commands on foreign
> >> tables.
>
> This sounds reasonable reason why ONLY should be ignored in TRUNCATE on
> foreign tables, for now. If there is the existing rule about how to treat
> ONLY clause for foreign tables, basically TRUNCATE should follow that at this
> stage. Maybe we can change the rule, but it's an item for v15 or later?
>
>
> >> I'm not sure if it wasn't thought necessary or if there is an
> >> issue to push it or I may be missing something here.
>
> I could not find the past discussion about foreign tables and ONLY clause.
> I guess that ONLY is ignored in SELECT on foreign tables case because ONLY
> is interpreted outside the executor and it's not easy to change the executor
> so that ONLY is passed to FDW. Maybe..
>
>
> >> I think we can
> >> start a separate thread to see other hackers' opinions on this.
> >>
> >> I'm not sure whether all the clauses that are possible for
> >> SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote
> >> server by postgres_fdw.
> >>
> >> Well, now foreign TRUNCATE pushes the ONLY clause to the remote server
> >> which is inconsistent when compared to SELECT/UPDATE/DELETE commands.
> >> If we were to keep it consistent across all foreign commands that
> >> ONLY clause is not pushed to remote server, then we can restrict for
> >> TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified,
> >> just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I
> >> don't see any real problem in pushing the ONLY clause, at least in
> >> case of TRUNCATE.
> >>
> > If ONLY-clause would be pushed down to the remote query of postgres_fdw,
> > what does the foreign-table represent in the local system?
> >
> > In my understanding, a local foreign table by postgres_fdw is a
> > representation of
> > entire tree of the remote parent table and its children.
>
> If so, I'm still wondering why CASCADE/RESTRICT (i.e., DropBehavior) needs to
> be passed to FDW. IOW, if a foreign table is an abstraction of an external
> data source, ISTM that postgres_fdw should always issue TRUNCATE with
> CASCADE. Why do we need to allow RESTRICT to be specified for a foreign table
> even though it's an abstraction of an external data source?
>
Please assume the internal heap data is managed by PostgreSQL core, and
external data source is managed by postgres_fdw (or other FDW driver).
TRUNCATE command requires these object managers to eliminate the data
on behalf of the foreign tables picked up.

Even though the object manager tries to eliminate the managed data, it may be
restricted by some reason; FK restrictions in case of PostgreSQL internal data.
In this case, CASCADE/RESTRICT option suggests the object manager how
to handle the target data.

The ONLY clause controls whoes data shall be eliminated.
On the other hand, CASCADE/RESTRICT and CONTINUE/RESTART controls
how data shall be eliminated. It is a primitive difference.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-13 Thread Kohei KaiGai

2021年4月13日(火) 21:03 Bharath Rupireddy :
>
> On Tue, Apr 13, 2021 at 2:37 PM Kohei KaiGai  wrote:
> > Here are two points to discuss.
> >
> > Regarding to the FDW-APIs, yes, nobody can deny someone want to implement
> > their own FDW module that adds special handling when its foreign table
> > is specified
> > with ONLY-clause, even if we usually ignore.
> >
> >
> > On the other hand, when we consider a foreign table is an abstraction
> > of an external
> > data source, at least, the current postgres_fdw's behavior is not 
> > consistent.
> >
> > When a foreign table by postgres_fdw that maps a remote parent table,
> > has a local
> > child table,
> >
> > This command shows all the rows from both of local and remote.
> >
> > postgres=# select * from f_table ;
> >  id |  v
> > +-
> >   1 | remote table t_parent id=1
> >   2 | remote table t_parent id=2
> >   3 | remote table t_parent id=3
> >  10 | remote table t_child1 id=10
> >  11 | remote table t_child1 id=11
> >  12 | remote table t_child1 id=12
> >  20 | remote table t_child2 id=20
> >  21 | remote table t_child2 id=21
> >  22 | remote table t_child2 id=22
> >  50 | it is l_child id=50
> >  51 | it is l_child id=51
> >  52 | it is l_child id=52
> >  53 | it is l_child id=53
> > (13 rows)
> >
> > If f_table is specified with "ONLY", it picks up only the parent table
> > (f_table),
> > however, ONLY-clause is not push down to the remote side.
> >
> > postgres=# select * from only f_table ;
> >  id |  v
> > +-
> >   1 | remote table t_parent id=1
> >   2 | remote table t_parent id=2
> >   3 | remote table t_parent id=3
> >  10 | remote table t_child1 id=10
> >  11 | remote table t_child1 id=11
> >  12 | remote table t_child1 id=12
> >  20 | remote table t_child2 id=20
> >  21 | remote table t_child2 id=21
> >  22 | remote table t_child2 id=22
> > (9 rows)
> >
> > On the other hands, TRUNCATE ONLY f_table works as follows...
> >
> > postgres=# truncate only f_table;
> > TRUNCATE TABLE
> > postgres=# select * from f_table ;
> >  id |  v
> > +-
> >  10 | remote table t_child1 id=10
> >  11 | remote table t_child1 id=11
> >  12 | remote table t_child1 id=12
> >  20 | remote table t_child2 id=20
> >  21 | remote table t_child2 id=21
> >  22 | remote table t_child2 id=22
> >  50 | it is l_child id=50
> >  51 | it is l_child id=51
> >  52 | it is l_child id=52
> >  53 | it is l_child id=53
> > (10 rows)
> >
> > It eliminates the rows only from the remote parent table although it
> > is a part of the foreign table.
> >
> > My expectation at the above command shows rows from the local child
> > table (id=50...53).
>
> Yeah, ONLY clause is not pushed to the remote server in case of SELECT
> commands. This is also true for DELETE and UPDATE commands on foreign
> tables. I'm not sure if it wasn't thought necessary or if there is an
> issue to push it or I may be missing something here. I think we can
> start a separate thread to see other hackers' opinions on this.
>
> I'm not sure whether all the clauses that are possible for
> SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote
> server by postgres_fdw.
>
> Well, now foreign TRUNCATE pushes the ONLY clause to the remote server
> which is inconsistent when compared to SELECT/UPDATE/DELETE commands.
> If we were to keep it consistent across all foreign commands that
> ONLY clause is not pushed to remote server, then we can restrict for
> TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified,
> just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I
> don't see any real problem in pushing the ONLY clause, at least in
> case of TRUNCATE.
>
If ONLY-clause would be pushed down to the remote query of postgres_fdw,
what does the foreign-table represent in the local system?

In my understanding, a local foreign table by postgres_fdw is a
representation of
entire tree of the remote parent table and its children.
Thus, we have assumed that DML command fetches rows from the remote
parent table without ONLY-clause, once PostgreSQL picked up the foreign table
as a scan target.
I think we don't need to adjust definitions of the role of
foreign-table, even if
it represents non-RDBMS data sources.

If a foreign table by postgres_fdw supports a special table option to
indicate adding
ONLY-clause when remote query uses remote tables, it is suitable to
add ONLY-clause
on the remote TRUNCATE command also, not only SELECT/INSERT/UPDATE/DELETE.
In the other words, if a foreign-table represents only a remote parent
table, it is
suitable to truncate only the remote parent table.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-13 Thread Kohei KaiGai

2021年4月13日(火) 16:17 Fujii Masao :
>
> On 2021/04/13 14:22, Kohei KaiGai wrote:
> > Let me remind the discussion at the design level.
> >
> > If postgres_fdw (and other FDW drivers) needs to consider whether
> > ONLY-clause is given
> > on the foreign tables of them, what does a foreign table represent in
> > PostgreSQL system?
> >
> > My assumption is, a foreign table provides a view to external data, as
> > if it performs like a table.
> > TRUNCATE command eliminates all the segment files, even if a table
> > contains multiple
> > underlying files, never eliminate them partially.
> > If a foreign table is equivalent to a table in SQL operation level,
> > indeed, ONLY-clause controls
> > which tables are picked up by the TRUNCATE command, but never controls
> > which portion of
> > the data shall be eliminated. So, I conclude that
> > ExecForeignTruncate() shall eliminate the entire
> > external data on behalf of a foreign table, regardless of ONLY-clause.
> >
> > I think it is more significant to clarify prior to the implementation 
> > details.
> > How about your opinions?
>
> I'm still thinking that it's better to pass all information including
> ONLY clause about TRUNCATE command to FDW and leave FDW to determine
> how to use them. How postgres_fdw should use the information about ONLY
> is debetable. But for now IMO that users who explicitly specify ONLY clause 
> for
> foreign tables understand the structure of remote tables and want to use ONLY
> in TRUNCATE command issued by postgres_fdw. But my opinion might be minority,
> so I'd like to hear more opinion about this, from other developers.
>
Here are two points to discuss.

Regarding to the FDW-APIs, yes, nobody can deny someone want to implement
their own FDW module that adds special handling when its foreign table
is specified
with ONLY-clause, even if we usually ignore.


On the other hand, when we consider a foreign table is an abstraction
of an external
data source, at least, the current postgres_fdw's behavior is not consistent.

When a foreign table by postgres_fdw that maps a remote parent table,
has a local
child table,

This command shows all the rows from both of local and remote.

postgres=# select * from f_table ;
 id |  v
+-
  1 | remote table t_parent id=1
  2 | remote table t_parent id=2
  3 | remote table t_parent id=3
 10 | remote table t_child1 id=10
 11 | remote table t_child1 id=11
 12 | remote table t_child1 id=12
 20 | remote table t_child2 id=20
 21 | remote table t_child2 id=21
 22 | remote table t_child2 id=22
 50 | it is l_child id=50
 51 | it is l_child id=51
 52 | it is l_child id=52
 53 | it is l_child id=53
(13 rows)

If f_table is specified with "ONLY", it picks up only the parent table
(f_table),
however, ONLY-clause is not push down to the remote side.

postgres=# select * from only f_table ;
 id |  v
+-
  1 | remote table t_parent id=1
  2 | remote table t_parent id=2
  3 | remote table t_parent id=3
 10 | remote table t_child1 id=10
 11 | remote table t_child1 id=11
 12 | remote table t_child1 id=12
 20 | remote table t_child2 id=20
 21 | remote table t_child2 id=21
 22 | remote table t_child2 id=22
(9 rows)

On the other hands, TRUNCATE ONLY f_table works as follows...

postgres=# truncate only f_table;
TRUNCATE TABLE
postgres=# select * from f_table ;
 id |  v
+-
 10 | remote table t_child1 id=10
 11 | remote table t_child1 id=11
 12 | remote table t_child1 id=12
 20 | remote table t_child2 id=20
 21 | remote table t_child2 id=21
 22 | remote table t_child2 id=22
 50 | it is l_child id=50
 51 | it is l_child id=51
 52 | it is l_child id=52
 53 | it is l_child id=53
(10 rows)

It eliminates the rows only from the remote parent table although it
is a part of the foreign table.

My expectation at the above command shows rows from the local child
table (id=50...53).

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-12 Thread Kohei KaiGai

2021年4月9日(金) 23:49 Kohei KaiGai :
>
> 2021年4月9日(金) 22:51 Fujii Masao :
> >
> > On 2021/04/09 12:33, Kohei KaiGai wrote:
> > > 2021年4月8日(木) 22:14 Fujii Masao :
> > >>
> > >> On 2021/04/08 22:02, Kohei KaiGai wrote:
> > >>>> Anyway, attached is the updated version of the patch. This is still 
> > >>>> based on the latest Kazutaka-san's patch. That is, extra list for ONLY 
> > >>>> is still passed to FDW. What about committing this version at first? 
> > >>>> Then we can continue the discussion and change the behavior later if 
> > >>>> necessary.
> > >>
> > >> Pushed! Thank all involved in this development!!
> > >> For record, I attached the final patch I committed.
> > >>
> > >>
> > >>> Ok, it's fair enought for me.
> > >>>
> > >>> I'll try to sort out my thought, then raise a follow-up discussion if 
> > >>> necessary.
> > >>
> > >> Thanks!
> > >>
> > >> The followings are the open items and discussion points that I'm 
> > >> thinking of.
> > >>
> > >> 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, 
> > >> TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a 
> > >> foreign table was specified as the target to truncate in TRUNCATE 
> > >> command is collected and passed to FDW. Does this really need to be 
> > >> passed to FDW? Seems Stephen, Michael and I think that's necessary. But 
> > >> Kaigai-san does not. I also think that TRUNCATE_REL_CONTEXT_CASCADING 
> > >> can be removed because there seems no use case for that maybe.
> > >>
> > >> 2. Currently when the same foreign table is specified multiple times in 
> > >> the command, the extra information only for the foreign table found 
> > >> first is collected. For example, when "TRUNCATE ft, ONLY ft" is 
> > >> executed, TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored 
> > >> because "ft" is found first. Is this OK? Or we should collect all, e.g., 
> > >> both _NORMAL and _ONLY should be collected in that example? I think that 
> > >> the current approach (i.e., collect the extra info about table found 
> > >> first if the same table is specified multiple times) is good because 
> > >> even local tables are also treated the same way. But Kaigai-san does not.
> > >>
> > >> 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that 
> > >> it constructs. That is, if the foreign table is specified with ONLY, 
> > >> postgres_fdw also issues the TRUNCATE command for the corresponding 
> > >> remote table with ONLY to the remote server. Then only root table is 
> > >> truncated in remote server side, and the tables inheriting that are not 
> > >> truncated. Is this behavior desirable? Seems Michael and I think this 
> > >> behavior is OK. But Kaigai-san does not.
> > >>
> > > Prior to the discussion of 1-3, I like to clarify the role of 
> > > foreign-tables.
> > > (Likely, it will lead a natural conclusion for the above open items.)
> > >
> > > As literal of SQL/MED (Management of External Data), a foreign table
> > > is a representation of external data in PostgreSQL.
> > > It allows to read and (optionally) write the external data wrapped by
> > > FDW drivers, as if we usually read / write heap tables.
> > > By the FDW-APIs, the core PostgreSQL does not care about the
> > > structure, location, volume and other characteristics of
> > > the external data itself. It expects FDW-APIs invocation will perform
> > > as if we access a regular heap table.
> > >
> > > On the other hands, we can say local tables are representation of
> > > "internal" data in PostgreSQL.
> > > A heap table is consists of one or more files (per BLCKSZ *
> > > RELSEG_SIZE), and table-am intermediates
> > > the on-disk data to/from on-memory structure (TupleTableSlot).
> > > Here are no big differences in the concept. Ok?
> > >
> > > As you know, ONLY clause controls whether TRUNCATE command shall run
> > > on child-tables also, not only the parent.
> > > If "ONLY parent_table" is given, its child tables are not picked up by
> > > ExecuteTruncate(), unless child tables are not
> > > listed up individually.
> > >

Re: TRUNCATE on foreign table

2021-04-09 Thread Kohei KaiGai

2021年4月9日(金) 22:51 Fujii Masao :
>
> On 2021/04/09 12:33, Kohei KaiGai wrote:
> > 2021年4月8日(木) 22:14 Fujii Masao :
> >>
> >> On 2021/04/08 22:02, Kohei KaiGai wrote:
> >>>> Anyway, attached is the updated version of the patch. This is still 
> >>>> based on the latest Kazutaka-san's patch. That is, extra list for ONLY 
> >>>> is still passed to FDW. What about committing this version at first? 
> >>>> Then we can continue the discussion and change the behavior later if 
> >>>> necessary.
> >>
> >> Pushed! Thank all involved in this development!!
> >> For record, I attached the final patch I committed.
> >>
> >>
> >>> Ok, it's fair enought for me.
> >>>
> >>> I'll try to sort out my thought, then raise a follow-up discussion if 
> >>> necessary.
> >>
> >> Thanks!
> >>
> >> The followings are the open items and discussion points that I'm thinking 
> >> of.
> >>
> >> 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, 
> >> TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a 
> >> foreign table was specified as the target to truncate in TRUNCATE command 
> >> is collected and passed to FDW. Does this really need to be passed to FDW? 
> >> Seems Stephen, Michael and I think that's necessary. But Kaigai-san does 
> >> not. I also think that TRUNCATE_REL_CONTEXT_CASCADING can be removed 
> >> because there seems no use case for that maybe.
> >>
> >> 2. Currently when the same foreign table is specified multiple times in 
> >> the command, the extra information only for the foreign table found first 
> >> is collected. For example, when "TRUNCATE ft, ONLY ft" is executed, 
> >> TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored because "ft" 
> >> is found first. Is this OK? Or we should collect all, e.g., both _NORMAL 
> >> and _ONLY should be collected in that example? I think that the current 
> >> approach (i.e., collect the extra info about table found first if the same 
> >> table is specified multiple times) is good because even local tables are 
> >> also treated the same way. But Kaigai-san does not.
> >>
> >> 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that 
> >> it constructs. That is, if the foreign table is specified with ONLY, 
> >> postgres_fdw also issues the TRUNCATE command for the corresponding remote 
> >> table with ONLY to the remote server. Then only root table is truncated in 
> >> remote server side, and the tables inheriting that are not truncated. Is 
> >> this behavior desirable? Seems Michael and I think this behavior is OK. 
> >> But Kaigai-san does not.
> >>
> > Prior to the discussion of 1-3, I like to clarify the role of 
> > foreign-tables.
> > (Likely, it will lead a natural conclusion for the above open items.)
> >
> > As literal of SQL/MED (Management of External Data), a foreign table
> > is a representation of external data in PostgreSQL.
> > It allows to read and (optionally) write the external data wrapped by
> > FDW drivers, as if we usually read / write heap tables.
> > By the FDW-APIs, the core PostgreSQL does not care about the
> > structure, location, volume and other characteristics of
> > the external data itself. It expects FDW-APIs invocation will perform
> > as if we access a regular heap table.
> >
> > On the other hands, we can say local tables are representation of
> > "internal" data in PostgreSQL.
> > A heap table is consists of one or more files (per BLCKSZ *
> > RELSEG_SIZE), and table-am intermediates
> > the on-disk data to/from on-memory structure (TupleTableSlot).
> > Here are no big differences in the concept. Ok?
> >
> > As you know, ONLY clause controls whether TRUNCATE command shall run
> > on child-tables also, not only the parent.
> > If "ONLY parent_table" is given, its child tables are not picked up by
> > ExecuteTruncate(), unless child tables are not
> > listed up individually.
> > Then, once ExecuteTruncate() picked up the relations, it makes the
> > relations empty using table-am
> > (relation_set_new_filenode), and the callee
> > (heapam_relation_set_new_filenode) does not care about whether the
> > table is specified with ONLY, or not. It just makes the data
> > represented by the table empty (in transactional way).
> >
> > So, how foreign tables shal

Re: TRUNCATE on foreign table

2021-04-08 Thread Kohei KaiGai

2021年4月8日(木) 22:14 Fujii Masao :
>
> On 2021/04/08 22:02, Kohei KaiGai wrote:
> >> Anyway, attached is the updated version of the patch. This is still based 
> >> on the latest Kazutaka-san's patch. That is, extra list for ONLY is still 
> >> passed to FDW. What about committing this version at first? Then we can 
> >> continue the discussion and change the behavior later if necessary.
>
> Pushed! Thank all involved in this development!!
> For record, I attached the final patch I committed.
>
>
> > Ok, it's fair enought for me.
> >
> > I'll try to sort out my thought, then raise a follow-up discussion if 
> > necessary.
>
> Thanks!
>
> The followings are the open items and discussion points that I'm thinking of.
>
> 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, 
> TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a 
> foreign table was specified as the target to truncate in TRUNCATE command is 
> collected and passed to FDW. Does this really need to be passed to FDW? Seems 
> Stephen, Michael and I think that's necessary. But Kaigai-san does not. I 
> also think that TRUNCATE_REL_CONTEXT_CASCADING can be removed because there 
> seems no use case for that maybe.
>
> 2. Currently when the same foreign table is specified multiple times in the 
> command, the extra information only for the foreign table found first is 
> collected. For example, when "TRUNCATE ft, ONLY ft" is executed, 
> TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored because "ft" is 
> found first. Is this OK? Or we should collect all, e.g., both _NORMAL and 
> _ONLY should be collected in that example? I think that the current approach 
> (i.e., collect the extra info about table found first if the same table is 
> specified multiple times) is good because even local tables are also treated 
> the same way. But Kaigai-san does not.
>
> 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that it 
> constructs. That is, if the foreign table is specified with ONLY, 
> postgres_fdw also issues the TRUNCATE command for the corresponding remote 
> table with ONLY to the remote server. Then only root table is truncated in 
> remote server side, and the tables inheriting that are not truncated. Is this 
> behavior desirable? Seems Michael and I think this behavior is OK. But 
> Kaigai-san does not.
>
Prior to the discussion of 1-3, I like to clarify the role of foreign-tables.
(Likely, it will lead a natural conclusion for the above open items.)

As literal of SQL/MED (Management of External Data), a foreign table
is a representation of external data in PostgreSQL.
It allows to read and (optionally) write the external data wrapped by
FDW drivers, as if we usually read / write heap tables.
By the FDW-APIs, the core PostgreSQL does not care about the
structure, location, volume and other characteristics of
the external data itself. It expects FDW-APIs invocation will perform
as if we access a regular heap table.

On the other hands, we can say local tables are representation of
"internal" data in PostgreSQL.
A heap table is consists of one or more files (per BLCKSZ *
RELSEG_SIZE), and table-am intermediates
the on-disk data to/from on-memory structure (TupleTableSlot).
Here are no big differences in the concept. Ok?

As you know, ONLY clause controls whether TRUNCATE command shall run
on child-tables also, not only the parent.
If "ONLY parent_table" is given, its child tables are not picked up by
ExecuteTruncate(), unless child tables are not
listed up individually.
Then, once ExecuteTruncate() picked up the relations, it makes the
relations empty using table-am
(relation_set_new_filenode), and the callee
(heapam_relation_set_new_filenode) does not care about whether the
table is specified with ONLY, or not. It just makes the data
represented by the table empty (in transactional way).

So, how foreign tables shall perform?

Once ExecuteTruncate() picked up a foreign table, according to
ONLY-clause, does FDW driver shall consider
the context where the foreign tables are specified? And, what behavior
is consistent?
I think that FDW driver shall make the external data represented by
the foreign table empty, regardless of the
structure, location, volume and others.

Therefore, if we follow the above assumption, we don't need to inform
the context where foreign-tables are
picked up (TRUNCATE_REL_CONTEXT_*), so postgres_fdw shall not control
the remote TRUNCATE query
according to the flags. It always truncate the entire tables (if
multiple) on behalf of the foreign tables.

As an aside, if postgres_fdw maps are remote table with "ONLY" clause,
it is exactly a situation where we add
"ONLY" clause on the truncate command, because it is a representation
of the remote "ONLY parent_table" in
this case.

How about your thought?
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-08 Thread Kohei KaiGai

2021年4月8日(木) 18:25 Fujii Masao :
>
> On 2021/04/08 15:48, Kohei KaiGai wrote:
> > 2021年4月8日(木) 15:04 Fujii Masao :
> >>
> >> On 2021/04/08 13:43, Kohei KaiGai wrote:
> >>> In case when a local table (with no children) has same contents,
> >>> TRUNCATE command
> >>> witll remove the entire table contents.
> >>
> >> But if there are local child tables that inherit the local parent table, 
> >> and TRUNCATE ONLY  is executed, only the contents in the 
> >> parent will be truncated. I was thinking that this behavior should be 
> >> applied to the foreign table whose remote (parent) table have remote child 
> >> tables.
> >>
> >> So what we need to reach the consensus is; how far ONLY option affects. 
> >> Please imagine the case where we have
> >>
> >> (1) local parent table, also foreign table of remote parent table
> >> (2) local child table, inherits local parent table
> >> (3) remote parent table
> >> (4) remote child table, inherits remote parent table
> >>
> >> I think that we agree all (1), (2), (3) and (4) should be truncated if 
> >> local parent table (1) is specified without ONLY in TRUNCATE command. 
> >> OTOH, if ONLY is specified, we agree that at least local child table (2) 
> >> should NOT be truncated.
> >>
> > My understanding of a foreign table is a representation of external
> > data, including remote RDBMS but not only RDBMS,
> > regardless of the parent-child relationship at the local side.
> > So, once a local foreign table wraps entire tables tree (a parent and
> > relevant children) at the remote side, at least, it shall
> > be considered as a unified data chunk from the standpoint of the local side.
>
> At least for me it's not intuitive to truncate the remote table and its all 
> dependent tables even though users explicitly specify ONLY for the foreign 
> table. As far as I read the past discussion, some people was thinking the 
> same.
>
> >
> > Please assume if file_fdw could map 3 different CSV files, then
> > truncate on the foreign table may eliminate just 1 of 3 files.
> > Is it an expected / preferable behavior?
>
> I think that's up to each FDW. That is, IMO the information about whether 
> ONLY is specified or not for each table should be passed to FDW. Then FDW 
> itself should determine how to handle that information.
>
> Anyway, attached is the updated version of the patch. This is still based on 
> the latest Kazutaka-san's patch. That is, extra list for ONLY is still passed 
> to FDW. What about committing this version at first? Then we can continue the 
> discussion and change the behavior later if necessary.
>
Ok, it's fair enought for me.

I'll try to sort out my thought, then raise a follow-up discussion if necessary.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-08 Thread Kohei KaiGai

2021年4月8日(木) 15:04 Fujii Masao :
>
> On 2021/04/08 13:43, Kohei KaiGai wrote:
> > In case when a local table (with no children) has same contents,
> > TRUNCATE command
> > witll remove the entire table contents.
>
> But if there are local child tables that inherit the local parent table, and 
> TRUNCATE ONLY  is executed, only the contents in the parent 
> will be truncated. I was thinking that this behavior should be applied to the 
> foreign table whose remote (parent) table have remote child tables.
>
> So what we need to reach the consensus is; how far ONLY option affects. 
> Please imagine the case where we have
>
> (1) local parent table, also foreign table of remote parent table
> (2) local child table, inherits local parent table
> (3) remote parent table
> (4) remote child table, inherits remote parent table
>
> I think that we agree all (1), (2), (3) and (4) should be truncated if local 
> parent table (1) is specified without ONLY in TRUNCATE command. OTOH, if ONLY 
> is specified, we agree that at least local child table (2) should NOT be 
> truncated.
>
My understanding of a foreign table is a representation of external
data, including remote RDBMS but not only RDBMS,
regardless of the parent-child relationship at the local side.
So, once a local foreign table wraps entire tables tree (a parent and
relevant children) at the remote side, at least, it shall
be considered as a unified data chunk from the standpoint of the local side.

Please assume if file_fdw could map 3 different CSV files, then
truncate on the foreign table may eliminate just 1 of 3 files.
Is it an expected / preferable behavior?
Basically, we don't assume any charasteristics of the data on behalf
of the FDW driver, even if it is PostgreSQL server.
Thus, I think the new API will expect  to eliminate the entire rows on
behalf of the foreign table, regardless of the ONLY-clause,
because it already controls which foreign-tables shall be picked up,
but does not control which part of the foreign table
shall be eliminated.

> So the remaining point is; remote tables (3) and (4) should be truncated or 
> not when ONLY is specified? You seem to argue that both should be truncated 
> by removing extra list. I was thinking that only remote parent table (3) 
> should be truncated. That is, IMO we should treat the truncation on foreign 
> table as the same as that on its forein data source.
>
> Other people might think neither (3) nor (4) should be truncated in that case 
> because ONLY should affect only the table directly specified in TRUNCATE 
> command, i.e., local parent table (1). For now this also looks good to me.
>
In case when the local foreign table is a parent, the entire remote
table shall be truncated, if ONLY is given.
In case when the local foreign table is a child, nothing shall be
happen (API is not called), if ONLY is given.

IMO, it is stable and simple definition, even if FDW driver wraps
non-RDBMS data source that has no idea
of table inheritance.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-07 Thread Kohei KaiGai

2021年4月8日(木) 11:44 Fujii Masao :
>
> On 2021/04/08 10:56, Kohei KaiGai wrote:
> > 2021年4月8日(木) 4:19 Fujii Masao :
> >>
> >> On 2021/04/06 21:06, Kazutaka Onishi wrote:
> >>> Thank you for checking v13, and here is v14 patch.
> >>>
> >>>> 1) Are we using all of these macros? I see that we are setting them
> >>>> but we only use TRUNCATE_REL_CONTEXT_ONLY. If not used, can we remove
> >>>> them?
> >>>
> >>> These may be needed for the foreign data handler other than postgres_fdw.
> >>
> >> Could you tell me how such FDWs use TRUNCATE_REL_CONTEXT_CASCADING and 
> >> _NORMAL? I'm still not sure if TRUNCATE_REL_CONTEXT_CASCADING is really 
> >> required.
> >>
> > https://www.postgresql.org/message-id/20200102144644.GM3195%40tamriel.snowman.net
> >
> > This is the suggestion when I added the flag to inform cascading.
> >
> > |  Instead, I'd suggest we have the core code build
> > | up a list of tables to truncate, for each server, based just on the list
> > | passed in by the user, and then also pass in if CASCADE was included or
> > | not, and then let the FDW handle that in whatever way makes sense for
> > | the foreign server (which, for a PG system, would probably be just
> > | building up the TRUNCATE command and running it with or without the
> > | CASCADE option, but it might be different on other systems).
> > |
> > Indeed, it is not a strong technical reason at this moment.
> > (And, I also don't have idea to distinct these differences in my module 
> > also.)
>
> CASCADE option mentioned in the above seems the CASCADE clause specified in 
> TRUNCATE command. No? So the above doesn't seem to suggest to include the 
> information about how each table to truncate is picked up. Am I missing 
> something?
>
It might be a bit different context.

> >
> >> With the patch, both inherited and referencing relations are marked as 
> >> TRUNCATE_REL_CONTEXT_CASCADING? Is this ok for that use? Or we should 
> >> distinguish them?
> >>
> > In addition, even though my prior implementation distinguished and deliver
> > the status whether the truncate command is issued with NORMAL or ONLY,
> > does the remote query by postgres_fdw needs to follow the manner?
> >
> > Please assume the case when a foreign-table "ft" that maps a remote table
> > with some child-relations.
> > If we run TRUNCATE ONLY ft at the local server, postgres_fdw setup
> > a remote truncate command with "ONLY" qualifier, then remote postgresql
> > server truncate only parent table of the remote side.
> > Next, "SELECT * FROM ft" command returns some valid rows from the
> > child tables in the remote side, even if it is just after TRUNCATE command.
> > Is it a intuitive behavior for users?
>
> Yes, because that's the same behavior as for the local tables. No?
>
No. ;-p

When we define a foreign-table as follows,

postgres=# CREATE FOREIGN TABLE ft (id int, v text)
   SERVER loopback OPTIONS (table_name 't_parent',
truncatable 'true');
postgres=# select * from ft;
 id | v
+---
  1 | 1 in the parent
  2 | 2 in the parent
  3 | 3 in the parent
  4 | 4 in the parent
 11 | 11 in the child_1
 12 | 12 in the child_1
 13 | 13 in the child_1
 21 | 21 in the child_2
 22 | 22 in the child_2
 23 | 23 in the child_2
(10 rows)

TRUNCATE ONLY eliminates the rows come from parent table on the remote side,
even though this foreign table has no parent-child relationship in the
local side.

postgres=# begin;
BEGIN
postgres=# truncate only ft;
TRUNCATE TABLE
postgres=# select * from ft;
 id | v
+---
 11 | 11 in the child_1
 12 | 12 in the child_1
 13 | 13 in the child_1
 21 | 21 in the child_2
 22 | 22 in the child_2
 23 | 23 in the child_2
(6 rows)

postgres=# abort;
ROLLBACK

In case when a local table (with no children) has same contents,
TRUNCATE command
witll remove the entire table contents.

postgres=# select * INTO tt FROM ft;
SELECT 10
postgres=# select * from tt;
 id | v
+---
  1 | 1 in the parent
  2 | 2 in the parent
  3 | 3 in the parent
  4 | 4 in the parent
 11 | 11 in the child_1
 12 | 12 in the child_1
 13 | 13 in the child_1
 21 | 21 in the child_2
 22 | 22 in the child_2
 23 | 23 in the child_2
(10 rows)

postgres=# truncate only tt;
TRUNCATE TABLE
postgres=# select * from tt;
 id | v
+---
(0 rows)

> If this understanding is true, the following note that the patch added is 
> also intuitive, and not necessary? At least "partition leafs" part should be 
> removed because TRUNCATE ONLY fails if the remot

Re: TRUNCATE on foreign table

2021-04-07 Thread Kohei KaiGai

2021年4月8日(木) 4:19 Fujii Masao :
>
> On 2021/04/06 21:06, Kazutaka Onishi wrote:
> > Thank you for checking v13, and here is v14 patch.
> >
> >> 1) Are we using all of these macros? I see that we are setting them
> >> but we only use TRUNCATE_REL_CONTEXT_ONLY. If not used, can we remove
> >> them?
> >
> > These may be needed for the foreign data handler other than postgres_fdw.
>
> Could you tell me how such FDWs use TRUNCATE_REL_CONTEXT_CASCADING and 
> _NORMAL? I'm still not sure if TRUNCATE_REL_CONTEXT_CASCADING is really 
> required.
>
https://www.postgresql.org/message-id/20200102144644.GM3195%40tamriel.snowman.net

This is the suggestion when I added the flag to inform cascading.

|  Instead, I'd suggest we have the core code build
| up a list of tables to truncate, for each server, based just on the list
| passed in by the user, and then also pass in if CASCADE was included or
| not, and then let the FDW handle that in whatever way makes sense for
| the foreign server (which, for a PG system, would probably be just
| building up the TRUNCATE command and running it with or without the
| CASCADE option, but it might be different on other systems).
|
Indeed, it is not a strong technical reason at this moment.
(And, I also don't have idea to distinct these differences in my module also.)

> With the patch, both inherited and referencing relations are marked as 
> TRUNCATE_REL_CONTEXT_CASCADING? Is this ok for that use? Or we should 
> distinguish them?
>
In addition, even though my prior implementation distinguished and deliver
the status whether the truncate command is issued with NORMAL or ONLY,
does the remote query by postgres_fdw needs to follow the manner?

Please assume the case when a foreign-table "ft" that maps a remote table
with some child-relations.
If we run TRUNCATE ONLY ft at the local server, postgres_fdw setup
a remote truncate command with "ONLY" qualifier, then remote postgresql
server truncate only parent table of the remote side.
Next, "SELECT * FROM ft" command returns some valid rows from the
child tables in the remote side, even if it is just after TRUNCATE command.
Is it a intuitive behavior for users?

Even though we have discussed about the flags and expected behavior of
foreign truncate, strip of the relids_extra may be the most straight-forward
API design.
So, in other words, the API requires FDW driver to make the entire data
represented by the foreign table empty, by ExecForeignTruncate().
It is probably more consistent to look at DropBehavior for listing-up the
target relations at the local relations only.

How about your thought?

If we stand on the above design, ExecForeignTruncate() don't needs
frels_extra and behavior arguments.

> +#define TRUNCATE_REL_CONTEXT_NORMAL   0x01
> +#define TRUNCATE_REL_CONTEXT_ONLY 0x02
> +#define TRUNCATE_REL_CONTEXT_CASCADING 0x04
>
> With the patch, these are defined as flag bits. But ExecuteTruncate() seems 
> to always set the entry in relids_extra to either of them, not the 
> combination of them. So we can define them as enum?
>
Regardless of my above comment, It's a bug.
When list_member_oid(relids, myrelid) == true, we have to set proper flag on the
relevant frels_extra member, not just ignoring.

Best regards,

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-03 Thread Kohei KaiGai

2021年4月4日(日) 13:07 Bharath Rupireddy :
>
> On Sat, Apr 3, 2021 at 8:31 PM Zhihong Yu  wrote:
> > w.r.t. Bharath's question on using hash table, I think the reason is that 
> > the search would be more efficient:
>
> Generally, sequential search would be slower if there are many entries
> in a list. Here, the use case is to store all the foreign table ids
> associated with each foreign server and I'm not sure how many foreign
> tables will be provided in a single truncate command that belong to
> different foreign servers. I strongly feel the count will be less and
> using a list would be easier than to have a hash table. Others may
> have better opinions.
>
https://www.postgresql.org/message-id/20200115081126.gk2...@paquier.xyz

It was originally implemented using a simple list, then modified according to
the comment by Michael.
I think it is just a matter of preference.

> > Should the hash table be released at the end of ExecuteTruncateGuts() ?
>
> If we go with a hash table and think that the frequency of "TRUNCATE"
> commands on foreign tables is heavy in a local session, then it does
> make sense to not destroy the hash, otherwise destroy the hash.
>
In most cases, TRUNCATE is not a command frequently executed.
So, exactly, it is just a matter of preference.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-04-01 Thread Kohei KaiGai

2021年4月1日(木) 18:53 Fujii Masao :
>
> On 2021/04/01 0:09, Kohei KaiGai wrote:
> > What does the "permission checks" mean in this context?
> > The permission checks on the foreign tables involved are already checked
> > at truncate_check_rel(), by PostgreSQL's standard access control.
>
> I meant that's the permission check that happens in the remote server side.
> For example, when the foreign table is defined on postgres_fdw and truncated,
> TRUNCATE command is issued to the remote server via postgres_fdw and
> it checks the permission of the table before performing actual truncation.
>
>
> > Please assume an extreme example below:
> > 1. I defined a foreign table with file_fdw onto a local CSV file.
> > 2. Someone tries to scan the foreign table, and ACL allows it.
> > 3. I disallow the read remission of the CSV using chmod, after the above 
> > step,
> >  but prior to the query execution.
> > 4. Someone's query moved to the execution stage, then IterateForeignScan()
> >  raises an error due to OS level permission checks.
> >
> > FDW is a mechanism to convert from/to external data sources to/from 
> > PostgreSQL's
> > structured data, as literal. Once we checked the permissions of
> > foreign-tables by
> > database ACLs, any other permission checks handled by FDW driver are a part 
> > of
> > execution (like, OS permission check when file_fdw read(2) the
> > underlying CSV files).
> > And, we have no reliable way to check entire permissions preliminary,
> > like OS file
> > permission check or database permission checks by remote server. Even
> > if a checker
> > routine returned a "hint", it may be changed at the execution time.
> > Somebody might
> > change the "truncate" permission at the remote server.
> >
> > How about your opinions?
>
> I agree that something like checker routine might not be so useful and
> also be overkill. I was thinking that it's better to truncate the foreign 
> tables first
> and the local ones later. Otherwise it's a waste of cycles to truncate
> the local tables if the truncation on foreign table causes an permission error
> on the remote server.
>
> But ISTM that the order of tables to truncate that the current patch provides
> doesn't cause any actual bugs. So if you think the current order is better,
> I'm ok with that for now. In this case, the comments for ExecuteTruncate()
> should be updated.
>
It is fair enough for me to reverse the order of actual truncation.

How about the updated comments below?

This is a multi-relation truncate.  We first open and grab exclusive
lock on all relations involved, checking permissions (local database
ACLs even if relations are foreign-tables) and otherwise verifying
that the relation is OK for truncation. In CASCADE mode, ...(snip)...
Finally all the relations are truncated and reindexed. If any foreign-
tables are involved, its callback shall be invoked prior to the truncation
of regular tables.

> BTW, the latest patch doesn't seem to be applied cleanly to the master
> because of commit 27e1f14563. Could you rebase it?
>
Onishi-san, go ahead. :-)

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-03-31 Thread Kohei KaiGai

2021年3月30日(火) 2:53 Fujii Masao :
>
> On 2021/03/28 2:37, Kazutaka Onishi wrote:
> > Fujii-san,
> >
> > Thank you for your review!
> > Now I prepare v5 patch and I'll answer to your each comment. please
> > check this again.
>
> Thanks a lot!
>
> > 5. For example, we can easily do that by truncate foreign tables
> > before local ones. Thought?
> >
> > Umm... yeah, I feel it's better procedure, but not so required because
> > TRUNCATE is NOT called frequently.
> > Certainly, we already have postgresIsForeignUpdatable() to check
> > whether the foreign table is updatable or not.
> > Following this way, we have to add postgresIsForeignTruncatable() to check.
> > However, Unlike UPDATE, TRUNCATE is NOT called frequently. Current
> > procedure is inefficient but works correctly.
> > Thus, I feel postgresIsForeignTruncatable() is not needed.
>
> I'm concerned about the case where permission errors at the remote servers
> rather than that truncatable option is disabled. The comments of
> ExecuteTruncate() explains its design as follows. But the patch seems to break
> this because it truncates the local tables before checking the permission on
> foreign tables (i.e., the local tables in remote servers)... No?
>
>  We first open and grab exclusive
>  lock on all relations involved, checking permissions and otherwise
>  verifying that the relation is OK for truncation
>  Finally all the relations are truncated and reindexed.
>
Fujii-san,

What does the "permission checks" mean in this context?
The permission checks on the foreign tables involved are already checked
at truncate_check_rel(), by PostgreSQL's standard access control.

Please assume an extreme example below:
1. I defined a foreign table with file_fdw onto a local CSV file.
2. Someone tries to scan the foreign table, and ACL allows it.
3. I disallow the read remission of the CSV using chmod, after the above step,
but prior to the query execution.
4. Someone's query moved to the execution stage, then IterateForeignScan()
raises an error due to OS level permission checks.

FDW is a mechanism to convert from/to external data sources to/from PostgreSQL's
structured data, as literal. Once we checked the permissions of
foreign-tables by
database ACLs, any other permission checks handled by FDW driver are a part of
execution (like, OS permission check when file_fdw read(2) the
underlying CSV files).
And, we have no reliable way to check entire permissions preliminary,
like OS file
permission check or database permission checks by remote server. Even
if a checker
routine returned a "hint", it may be changed at the execution time.
Somebody might
change the "truncate" permission at the remote server.

How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-03-29 Thread Kohei KaiGai

2021年3月30日(火) 3:45 Fujii Masao :
>
> On 2021/03/28 2:37, Kazutaka Onishi wrote:
> > Fujii-san,
> >
> > Thank you for your review!
> > Now I prepare v5 patch and I'll answer to your each comment. please
> > check this again.
> > m(_ _)m
> >
> > 1. In postgres-fdw.sgml, "and truncatable" should be appended into the
> > above first description?
> > 2. truncate.sgml should be updated because, for example, it contains
> > the above descriptions.
> >
> > Yeah, you're right. I've fixed it.
> >
> >
> >
> > 3.  Don't we need to document the detail information about frels_extra?
> >
> > I've written about frels_extra into fdwhander.sgml.
> >
> >
> >
> > 4. postgres_fdw determines whether to specify ONLY or not by checking
> > whether the passed extra value is zero or not.
> >
> > Please refer this:
> > https://www.postgresql.org/message-id/CAOP8fzb-t3WVNLjGMC%2B4sV4AZa9S%3DMAQ7Q6pQoADMCf_1jp4ew%40mail.gmail.com
> >> Negative value means that foreign-tables are not specified in the TRUNCATE
> >> command, but truncated due to dependency (like partition's child leaf).
> >
> > I've added this information into fdwhandler.sgml.
>
> Even when a foreign table is specified explicitly in TRUNCATE command,
> its extra value can be negative if it's found as an inherited children firstly
> (i.e., in the case where the partitioned table having that foreign table as
> its partition is specified explicitly in TRUNCATE command).
> Isn't this a problem?
>
> Please imagine the following example;
>
> --
> create extension postgres_fdw;
> create server loopback foreign data wrapper postgres_fdw;
> create user mapping for public server loopback;
>
> create table t (i int, j int) partition by hash (j);
> create table t0 partition of t for values with (modulus 2, remainder 0);
> create table t1 partition of t for values with (modulus 2, remainder 1);
>
> create table test (i int, j int) partition by hash (i);
> create table test0 partition of test for values with (modulus 2, remainder 0);
> create foreign table ft partition of test for values with (modulus 2, 
> remainder 1) server loopback options (table_name 't');
> --
>
> In this example, "truncate ft, test" works fine, but "truncate test, ft" 
> causes
> an error though they should work in the same way basically.
>
(Although it was originally designed by me...)
If frels_extra would be a bit-masked value, we can avoid the problem.

Please assume the three labels below:
#define TRUNCATE_REL_CONTEXT__NORMAL 0x01
#define TRUNCATE_REL_CONTEXT__ONLY   0x02
#define TRUNCATE_REL_CONTEXT__CASCADED 0x04

Then, assign these labels on the extra flag according to the context where
the foreign-tables appeared in the truncate command.
Even if it is specified multiple times in the different context, FDW extension
can handle the best option according to the flags.

> In this example, "truncate ft, test" works fine, but "truncate test, ft" 
> causes

In both cases, ExecForeignTruncate shall be invoked to "ft" with
(NORMAL | CASCADED),
thus, postgres_fdw can determine the remote truncate command shall be
executed without "ONLY" clause.

How about the idea?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-03-29 Thread Kohei KaiGai

2021年3月30日(火) 2:54 Fujii Masao :
>
> On 2021/03/29 13:55, Michael Paquier wrote:
> > On Mon, Mar 29, 2021 at 10:53:14AM +0900, Fujii Masao wrote:
> >> I understand the motivation of this. But the other DMLs like UPDATE also
> >> do the same thing for foreign tables? That is, when those DML commands
> >> are executed on foreign tables, their changes are WAL-logged in a 
> >> publisher side,
> >> e.g., for logical replication? If not, it seems strange to allow only 
> >> TRUNCATE
> >> on foreign tables to be WAL-logged in a publisher side...
> >
> > Executing DMLs on foreign tables does not generate any WAL AFAIK with
> > the backend core code, even with wal_level = logical, as the DML is
> > executed within the FDW callback (see just ExecUpdate() or
> > ExecInsert() in nodeModifyTable.c), and foreign tables don't have an
> > AM set as they have no physical storage.  A FDW may decide to generate
> > some WAL records by itself though when doing the opeation, using the
> > generic WAL interface but that's rather limited.
> >
> > Generating WAL for the truncation of foreign tables sounds also like a
> > strange concept to me.  I think that you should just make the patch
> > work so as the truncation is passed down to the FDW that decides what
> > it needs to do with it, and do nothing more than that.
>
> Agreed.
>
Ok, it's fair enough.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-03-28 Thread Kohei KaiGai

Fujii-san,

> XLOG_HEAP_TRUNCATE record is written even for the truncation of
> a foreign table. Why is this necessary?
>
Foreign-tables are often used to access local data structure, like
columnar data files
on filesystem, not only remote accesses like postgres_fdw.
In case when we want to implement logical replication on this kind of
foreign-tables,
truncate-command must be delivered to subscriber node - to truncate
its local data.

In case of remote-access FDW drivers, truncate-command on the subscriber-side is
probably waste of cycles, however, only FDW driver and DBA who configured the
foreign-table know whether it is necessary, or not.

How about your opinions?

Best regards,

2021年3月25日(木) 3:47 Fujii Masao :
>
>
>
> On 2021/03/13 18:57, Kazutaka Onishi wrote:
> > I have fixed the patch to pass check-world test. :D
>
> Thanks for updating the patch! Here are some review comments from me.
>
>
>By default all foreign tables using postgres_fdw 
> are assumed
>to be updatable.  This may be overridden using the following option:
>
> In postgres-fdw.sgml, "and truncatable" should be appended into
> the above first description? Also "option" in the second description
> should be a plural form "options"?
>
>
>   TRUNCATE is not currently supported for foreign 
> tables.
>   This implies that if a specified table has any descendant tables that 
> are
>   foreign, the command will fail.
>
> truncate.sgml should be updated because, for example, it contains
> the above descriptions.
>
>
> + frels_extra is same length with
> + frels_list, that delivers extra information of
> + the context where the foreign-tables are truncated.
> +
>
> Don't we need to document the detail information about frels_extra?
> Otherwise the developers of FDW would fail to understand how to
> handle the frels_extra when trying to make their FDWs support TRUNCATE.
>
>
> +   relids_extra = lappend_int(relids_extra, (recurse ? 0 : 1));
> +   relids_extra = lappend_int(relids_extra, -1);
>
> postgres_fdw determines whether to specify ONLY or not by checking
> whether the passed extra value is zero or not. That is, for example,
> using only 0 and 1 for extra values is enough for the purpose. But
> ExecuteTruncate() sets three values 0, -1 and 1 as extra ones. Why are
> these three values necessary?
>
>
> With the patch, if both local and foreign tables are specified as
> the target tables to truncate, TRUNCATE command tries to truncate
> foreign tables after truncating local ones. That is, if "truncatable"
> option is set to false or enough permission to truncate is not granted
> yet in the foreign server, an error will be thrown after the local tables
> are truncated. I don't think this is good order of processings. IMO,
> instead, we should check whether foreign tables can be truncated
> before any actual truncation operations. For example, we can easily
> do that by truncate foreign tables before local ones. Thought?
>
>
> XLOG_HEAP_TRUNCATE record is written even for the truncation of
> a foreign table. Why is this necessary?
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign table

2021-03-28 Thread Kohei KaiGai

Onishi-san,

The v5 patch contains full-contents of "src/backend/commands/tablecmds.c.orig".
Please check it.

2021年3月28日(日) 2:37 Kazutaka Onishi :
>
> Fujii-san,
>
> Thank you for your review!
> Now I prepare v5 patch and I'll answer to your each comment. please
> check this again.
> m(_ _)m
>
> 1. In postgres-fdw.sgml, "and truncatable" should be appended into the
> above first description?
> 2. truncate.sgml should be updated because, for example, it contains
> the above descriptions.
>
> Yeah, you're right. I've fixed it.
>
>
>
> 3.  Don't we need to document the detail information about frels_extra?
>
> I've written about frels_extra into fdwhander.sgml.
>
>
>
> 4. postgres_fdw determines whether to specify ONLY or not by checking
> whether the passed extra value is zero or not.
>
> Please refer this:
> https://www.postgresql.org/message-id/CAOP8fzb-t3WVNLjGMC%2B4sV4AZa9S%3DMAQ7Q6pQoADMCf_1jp4ew%40mail.gmail.com
> > Negative value means that foreign-tables are not specified in the TRUNCATE
> > command, but truncated due to dependency (like partition's child leaf).
>
> I've added this information into fdwhandler.sgml.
>
>
>
> 5. For example, we can easily do that by truncate foreign tables
> before local ones. Thought?
>
> Umm... yeah, I feel it's better procedure, but not so required because
> TRUNCATE is NOT called frequently.
> Certainly, we already have postgresIsForeignUpdatable() to check
> whether the foreign table is updatable or not.
> Following this way, we have to add postgresIsForeignTruncatable() to check.
> However, Unlike UPDATE, TRUNCATE is NOT called frequently. Current
> procedure is inefficient but works correctly.
> Thus, I feel postgresIsForeignTruncatable() is not needed.
>
>
> 6. XLOG_HEAP_TRUNCATE record is written even for the truncation of a
> foreign table. Why is this necessary?
>
> Please give us more time to investigate this.
>
> 2021年3月25日(木) 3:47 Fujii Masao :
> >
> >
> >
> > On 2021/03/13 18:57, Kazutaka Onishi wrote:
> > > I have fixed the patch to pass check-world test. :D
> >
> > Thanks for updating the patch! Here are some review comments from me.
> >
> >
> >By default all foreign tables using 
> > postgres_fdw are assumed
> >to be updatable.  This may be overridden using the following option:
> >
> > In postgres-fdw.sgml, "and truncatable" should be appended into
> > the above first description? Also "option" in the second description
> > should be a plural form "options"?
> >
> >
> >   TRUNCATE is not currently supported for foreign 
> > tables.
> >   This implies that if a specified table has any descendant tables that 
> > are
> >   foreign, the command will fail.
> >
> > truncate.sgml should be updated because, for example, it contains
> > the above descriptions.
> >
> >
> > + frels_extra is same length with
> > + frels_list, that delivers extra information of
> > + the context where the foreign-tables are truncated.
> > +
> >
> > Don't we need to document the detail information about frels_extra?
> > Otherwise the developers of FDW would fail to understand how to
> > handle the frels_extra when trying to make their FDWs support TRUNCATE.
> >
> >
> > +   relids_extra = lappend_int(relids_extra, (recurse ? 0 : 1));
> > +   relids_extra = lappend_int(relids_extra, 
> > -1);
> >
> > postgres_fdw determines whether to specify ONLY or not by checking
> > whether the passed extra value is zero or not. That is, for example,
> > using only 0 and 1 for extra values is enough for the purpose. But
> > ExecuteTruncate() sets three values 0, -1 and 1 as extra ones. Why are
> > these three values necessary?
> >
> >
> > With the patch, if both local and foreign tables are specified as
> > the target tables to truncate, TRUNCATE command tries to truncate
> > foreign tables after truncating local ones. That is, if "truncatable"
> > option is set to false or enough permission to truncate is not granted
> > yet in the foreign server, an error will be thrown after the local tables
> > are truncated. I don't think this is good order of processings. IMO,
> > instead, we should check whether foreign tables can be truncated
> > before any actual truncation operations. For example, we can easily
> > do that by truncate foreign tables before local ones. Thought?
> >
> >
> > XLOG_HEAP_TRUNCATE record is written even for the truncation of
> > a foreign table. Why is this necessary?
> >
> > Regards,
> >
> > --
> > Fujii Masao
> > Advanced Computing Technology Center
> > Research and Development Headquarters
> > NTT DATA CORPORATION



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: How to retain lesser paths at add_path()?

2021-03-06 Thread Kohei KaiGai

2020年11月6日(金) 0:40 Dmitry Dolgov <9erthali...@gmail.com>:
>
> > On Tue, Jan 14, 2020 at 12:46:02AM +0900, Kohei KaiGai wrote:
> > The v2 patch is attached.
> >
> > This adds two dedicated lists on the RelOptInfo to preserve lesser paths
> > if extension required to retain the path-node to be removed in usual manner.
> > These lesser paths are kept in the separated list, so it never expand the 
> > length
> > of pathlist and partial_pathlist. That was the arguable point in the 
> > discussion
> > at the last October.
> >
> > The new hook is called just before the path-node removal operation, and
> > gives extension a chance for extra decision.
> > If extension considers the path-node to be removed can be used in the upper
> > path construction stage, they can return 'true' as a signal to preserve this
> > lesser path-node.
> > In case when same kind of path-node already exists in the preserved_pathlist
> > and the supplied lesser path-node is cheaper than the old one, extension can
> > remove the worse one arbitrarily to keep the length of preserved_pathlist.
> > (E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
> > pathlist for further opportunity of combined usage with GpuPreAgg path-node.
> > It just needs "the best GpuJoin path-node" somewhere, not two or more.)
> >
> > Because PostgreSQL core has no information which preserved path-node can
> > be removed, extensions that uses path_removal_decision_hook() has 
> > responsibility
> > to keep the length of preserved_(partial_)pathlist reasonable.
>
> Hi,
>
> Thanks for the patch! I had a quick look at it and have a few questions:
>
Sorry for the very late response. It's my oversight.

> * What would be the exact point/hook at which an extension can use
>   preserved pathlists? I guess it's important, since I can imagine it's
>   important for one of the issues mentioned in the thread about such an
>   extension have to re-do significant part of the calculations from
>   add_path.
>
set_join_pathlist_hook and create_upper_paths_hook

For example, even if GpuPreAgg may be able to generate cheaper path
with GpuJoin result, make_one_rel() may drop GpuJoin results due to
its own cost estimation. In this case, if lesser GpuJoin path would be
preserved somewhere, the extension invoked by create_upper_paths_hook
can make GpuPreAgg path with GpuJoin sub-path; that can reduce
data transfer between CPU and GPU.

> * Do you have any benchmark results with some extension using this
>   hook? The idea with another pathlist of "discarded" paths sounds like
>   a lightweight solution, and indeed I've performed few tests with two
>   workloads (simple queries, queries with joins of 10 tables) and the
>   difference between master and patched versions is rather small (no
>   stable difference for the former, couple of percent for the latter).
>   But it's of course with an empty hook, so it would be good to see
>   other benchmarks as well.
>
Not yet. And, an empty hook will not affect so much.

Even if the extension uses the hook, it shall be more lightweight than
its own alternative implementation. In case of PG-Strom, it also saves
Gpu-related paths in its own hash-table, then we look at the hash-table
also to find out the opportunity to merge multiple GPU invocations into
single invocation.

> * Does it make sense to something similar with add_path_precheck,
>   which also in some situations excluding paths?
>
This logic allows to skip the paths creation that will obviously have
expensive cost. Its decision is based on the cost estimation.

The path_removal_decision_hook also gives extensions a chance to
preserve pre-built paths that can be used later even if cost is not best.
This decision is not only based on the cost. In my expectations, it allows
to preserve the best path in the gpu related ones.

> * This part sounds dangerous for me:
>
> > Because PostgreSQL core has no information which preserved path-node can
> > be removed, extensions that uses path_removal_decision_hook() has 
> > responsibility
> > to keep the length of preserved_(partial_)pathlist reasonable.
>
>   since an extension can keep limited number of paths in the list, but
>   then the same hook could be reused by another extension which will
>   also try to limit such paths, but together they'll explode.
>
If Path node has a flag to indicate whether it is referenced by any other
upper node, we can simplify the check whether it is safe to release.
In the current implementation, the lesser paths except for IndexPath are
released at the end of add_path.

On the other hand, I'm uncertain whether the pfree(new_path) at the tail
of add_path makes sense on the modern hardware, because they allow to
recycle just small amount of memory, then entire memory consumption
by the optimizer shall be released by MemoryContext mechanism.
If add_path does not release path-node, the above portion is much simpler.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

2021年3月6日(土) 11:21 Tom Lane :
>
> Kohei KaiGai  writes:
> > 2021年3月6日(土) 1:41 Tom Lane :
> >> Scanning the code, I have a couple of gripes.  I'm not sure it's
> >> a good plan to just send the "header" field raw like that ---
> >> would breaking it into a dimension field and a point bool be
> >> better?  In any case, the receive function has to be more careful
> >> than this about accepting only valid header values.
> >>
> > I have a different opinion here.
> > Do we never reinterpret the unused header fields (bits 8-30) for another 
> > purpose
> > in the future version?
>
> Right, that's what to be concerned about.
>
> The best way might be to send the header as-is, as you've done,
> but for cube_recv to throw error if the reserved bits aren't
> all zero.  That way we can't get into a situation where we
> aren't sure what's in stored values.  If we do expand the header
> in future, values should be forward compatible.
>
Ok, the attached v4 sends the raw header as-is, then cube_recv
validates the header.
If num-of-dimension is larger than CUBE_MAX_DIM, it is obviously
unused bits (8-30)
are used or out of the range.

It also changes the manner of offsetof() as you suggested.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-cube-binary-inout-handler.v4.patch
Description: Binary data

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

2021年3月6日(土) 1:41 Tom Lane :
>
> Kohei KaiGai  writes:
> > Thanks, the attached patch add cube--1.5 for binary send/recv functions and
> > modification of cube type using the new ALTER TYPE.
>
> Hm, that was already superseded by events (112d411fb).
> As long as we get this done for v14, we can just treat it
> as an add-on to cube 1.5, so here's a quick rebase onto HEAD.
>
Thanks for this revising.

> Scanning the code, I have a couple of gripes.  I'm not sure it's
> a good plan to just send the "header" field raw like that ---
> would breaking it into a dimension field and a point bool be
> better?  In any case, the receive function has to be more careful
> than this about accepting only valid header values.
>
I have a different opinion here.
Do we never reinterpret the unused header fields (bits 8-30) for another purpose
in the future version?
If application saves the raw header field as-is, at least, it can keep
the header field
without information loss.
On the other hand, if cube_send() individually sent num-of-dimension
and point flag,
an application (that is built for the current version) will drop the
bit fields currently unused,
but the new version of server may reinterpret the field for something.

Of course, it's better to have more careful validation at cuda_recv()
when it received
the header field.

> Also, I don't think "offsetof(NDBOX, x[nitems])" is per project
> style.  It at least used to be true that MSVC couldn't cope
> with that, so we prefer
>
> offsetof(NDBOX, x) + nitems * sizeof(whatever)
>
Ok, I'll fix it on the next patch.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: contrib/cube - binary input/output handlers

2021-03-05 Thread Kohei KaiGai

Thanks, the attached patch add cube--1.5 for binary send/recv functions and
modification of cube type using the new ALTER TYPE.

Best regards,

2021年3月4日(木) 0:45 Tom Lane :
>
> I wrote:
> > You can add the I/O functions with ALTER TYPE nowadays.
>
> To be concrete, see 949a9f043eb70a4986041b47513579f9a13d6a33
>
> regards, tom lane

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

pgsql-cube-binary-inout-handler.v2.patch
Description: Binary data

Re: contrib/cube - binary input/output handlers

2021-03-03 Thread Kohei KaiGai

2021年3月3日(水) 23:33 Peter Eisentraut :
>
> On 24.02.21 04:18, Kohei KaiGai wrote:
> > This patch adds cube_send / cube_recv handlers on the contrib/cube data 
> > type.
> > Once this patch was applied to, the libpq client can obtain the table
> > data using binary mode.
>
> Seems reasonable.  But you need to write an extension upgrade script and
> bump the extension version.
>
Thanks for your review.

One thing not straightforward is that a new definition of cube type
needs to drop
the old definition once, then it leads cascaded deletion to the
objects that depends
on the "cube" type declared at the cube--1.2.sql.
Do you have any good ideas?

Idea-1) modify system catalog by UPDATE pg_type carefully.
  It can avoid cascaded deletion.

Idea-2) copy & paste all the declaration after CREATE TYPE in
cube--1.2.sql to the
  new script, then create these objects again.

Best regards,

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

contrib/cube - binary input/output handlers

2021-02-23 Thread Kohei KaiGai

Hello,

I noticed that contrib/cube data type does not support binary
input/output handler
when I tried to dump a table with cube columns, using a tool [*1] that
uses binary data
over libpq.

$ pg2arrow -d postgres -t my_table
../utils/pgsql_client.c:351  SQL execution failed: ERROR:  no binary
output function available for type cube

This patch adds cube_send / cube_recv handlers on the contrib/cube data type.
Once this patch was applied to, the libpq client can obtain the table
data using binary mode.

$ pg2arrow -d postgres -t my_table
NOTICE: -o, --output=FILENAME option was not given,
so a temporary file '/tmp/CdC68Q.arrow' was built instead.

The internal layout of cube, a kind of varlena, has a leading 32bit
header and the following float8
array. (array size is embedded in the header field).
So, cube_send just put the data stream according to the internal
layout, then cube_recv reconstructs
the values inverse.

Best regards,

[*1] pg2arrow - a utility to convert PostgreSQL table to Apache Arrow
http://heterodb.github.io/pg-strom/arrow_fdw/#using-pg2arrow
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-cube-binary-inout-handler.v1.patch
Description: Binary data

Re: TRUNCATE on foreign table

2021-02-10 Thread Kohei KaiGai

2021年2月10日(水) 13:55 Ashutosh Bapat :
>
> On Tue, Feb 9, 2021 at 7:45 PM Kazutaka Onishi  wrote:
> >
> > > IIUC, "truncatable" would be set to "false" for relations which do not
> > > have physical storage e.g. views but will be true for regular tables.
> >
> > "truncatable" option is just for the foreign table and it's not related 
> > with whether it's on a physical storage or not.
> > "postgres_fdw" already has "updatable" option to make the table read-only.
> > However, "updatable" is for DML, and it's not suitable for TRUNCATE.
> > Therefore new options "truncatable" was added.
> >
> > Please refer to this message for details.
> > https://www.postgresql.org/message-id/20200128040346.GC1552%40paquier.xyz
> >
> > > DELETE is very different from TRUNCATE. Application may want to DELETE
> > > based on a join with a local table and hence it can not be executed on
> > > a foreign server. That's not true with TRUNCATE.
> >
> > Yeah, As you say, Applications doesn't need  TRUNCATE.
> > We're focusing for analytical use, namely operating huge data.
> > TRUNCATE can erase rows faster than DELETE.
>
> The question is why can't that truncate be run on the foreign server
> itself rather than local server?
>
At least, PostgreSQL applies different access permissions on TRUNCATE.
If unconditional DELETE implicitly promotes to TRUNCATE, DB administrator
has to allow TRUNCATE permission on the remote table also.

Also, TRUNCATE acquires stronger lock the DELETE.
DELETE still allows concurrent accesses to the table, even though TRUNCATE
takes AccessExclusive lock, thus, FDW driver has to control the
concurrent accesses
by itself, if we have no dedicated TRUNCATE interface.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

2021-02-04 Thread Kohei KaiGai

2021年2月4日(木) 23:45 Tom Lane :
>
> Amit Langote  writes:
> > On Thu, Feb 4, 2021 at 4:24 PM Kohei KaiGai  wrote:
> >> I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
> >> tables. How about your opinions?
>
> > My first reaction to this was a suspicion that the
> > MaxHeapAttributeNumber limit would be too ingrained in PostgreSQL's
> > architecture to consider this matter lightly, but actually browsing
> > the code, that may not really be the case.
>
> You neglected to search for MaxTupleAttributeNumber...
>
> I'm quite skeptical of trying to raise this limit significantly.
>
> In the first place, you'd have to worry about the 2^15 limit on
> int16 AttrNumbers --- and keep in mind that that has to be enough
> for reasonable-size joins, not only an individual table.  If you
> join a dozen or so max-width tables, you're already most of the way
> to that limit.
>
free_parsestate() also prevents to use target-list more than
MaxTupleAttributeNumber.
(But it is reasonable restriction because we cannot guarantee that
HeapTupleTableSlot
is not used during query execution.)

> In the second place, as noted by the comment you quoted, there are
> algorithms in various places that are O(N^2) (or maybe even worse?)
> in the number of columns they're dealing with.
>
Only table creation time, isn't it?
If N is not small (probably >100), we can use temporary HTAB to ensure
duplicated column-name is not supplied.

> In the third place, I've yet to see a use-case that didn't represent
> crummy table design.  Pushing the table off to a remote server doesn't
> make it less crummy design.
>
I met this limitation to create a foreign-table that try to map Apache
Arrow file that
contains ~2,500 attributes of scientific observation data.
Apache Arrow internally has columnar format, and queries to this
data-set references
up to 10-15 columns on average. So, it shall make the query execution much more
efficient.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

2021-02-03 Thread Kohei KaiGai

Hello,

I noticed that CheckAttributeNamesTypes() prevents to create a table that has
more than MaxHeapAttributeNumber (1600) columns, for foreign-table also.
IIUC, this magic number comes from length of the null-bitmap can be covered
with t_hoff in HeapTupleHeaderData.
For heap-tables, it seems to me a reasonable restriction to prevent overrun of
null-bitmap. On the other hand, do we have proper reason to apply same
restrictions on foreign-tables also?

Foreign-tables have their own unique internal data structures instead of
the PostgreSQL's heap-table, and some of foreign-data can have thousands
attributes in their structured data.
I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
tables. How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-03-02 Thread Kohei KaiGai

Fujita-san,

> Unfortunately, I didn't have time to work on that (and won't in the
> development cycle for PG13.)
>
> > Indeed, it is not an ideal query plan to execute for each updated rows...
> >
> > postgres=# explain select * from rtable_parent where tableoid = 126397
> > and ctid = '(0,11)'::tid;
> >QUERY PLAN
> > -
> >  Append  (cost=0.00..5.18 rows=2 width=50)
> >->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
> >  Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
> >->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
> >  TID Cond: (ctid = '(0,11)'::tid)
> >  Filter: (tableoid = '126397'::oid)
> > (6 rows)
>
> IIRC, I think one of Tom's concerns about the solution I proposed was
> that it added the tableoid restriction clause to the remote
> UPDATE/DELETE query even if the remote table is not an inheritance
> set.  To add the clause only if the remote table is an inheritance
> set, what I have in mind is to 1) introduce a new postgres_fdw table
> option to indicate whether the remote table is an inheritance set or
> not, and 2) determine whether to add the clause or not, using the
> option.
>
I don't think the new options in postgres_fdw is a good solution because
remote table structure is flexible regardless of the local configuration in
foreign-table options. People may add inherited child tables after the
declaration of foreign-tables. It can make configuration mismatch.
Even if we always add tableoid=OID restriction on the remote query,
it shall be evaluated after the TidScan fetched the row pointed by ctid.
Its additional cost is limited.

And, one potential benefit is tableoid=OID restriction can be used to prune
unrelated partition leafs/inherited children at the planner stage.
Probably, it is a separated groundwork from postgres_fdw.
One planner considers the built-in rule for this kind of optimization,
enhancement at postgres_fdw will be quite simple, I guess.

How about your thought?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-02-29 Thread Kohei KaiGai

Hi Amit,

Thanks, I didn't check the thread.

It looks to me the latest patch was submitted by Fujita-san, Oct-2018.
Then, Tom pointer out this simple approach has a problem of inefficient remote
query plan because of no intelligence on the structure of remote tables mapped
by postgres_fdw. After that, the patch has been left for a year.

Indeed, it is not an ideal query plan to execute for each updated rows...

postgres=# explain select * from rtable_parent where tableoid = 126397
and ctid = '(0,11)'::tid;
   QUERY PLAN
-
 Append  (cost=0.00..5.18 rows=2 width=50)
   ->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
 Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
   ->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
 TID Cond: (ctid = '(0,11)'::tid)
 Filter: (tableoid = '126397'::oid)
(6 rows)

Rather than the refactoring at postgres_fdw, is it possible to have a
built-in partition
pruning rule when "tableoid = " was supplied?
If partition mechanism would have the feature, it should not be a
complicated problem.

Best regards,

2020年3月1日(日) 12:39 Amit Langote :
>
> Hi,
>
> On Sun, Mar 1, 2020 at 12:00 PM Kohei KaiGai  wrote:
> >
> > Hello,
> >
> > I noticed the following scenario under the development of truncate
> > support on FDW.
> >
> > In case when 'ftable' maps a remote table that has inherited children,...
> >
> > postgres=# create table rtable_parent (id int, label text, x text);
> > CREATE TABLE
> > postgres=# create table rtable_child () inherits (rtable_parent);
> > CREATE TABLE
> > postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
> > from generate_series(1,10) x);
> > INSERT 0 10
> > postgres=# insert into rtable_child (select x, 'child', md5(x::text)
> > from generate_series(6,15) x);
> > INSERT 0 10
> > postgres=# create foreign table ftable (id int, label text, x text)
> >server loopback options (table_name 'rtable_parent');
> > CREATE FOREIGN TABLE
> >
> > The 'ftable' shows the results from both of the parent and children.
> > postgres=# select * from ftable;
> >  id | label  |x
> > ++--
> >   1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >   2 | parent | c81e728d9d4c2f636f067f89cc14862c
> >   3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
> >   4 | parent | a87ff679a2f3e71d9181a67b7542122c
> >   5 | parent | e4da3b7fbbce2345d7772b0674a318d5
> >   6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | parent | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | parent | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | parent | d3d9446802a44259755d38e6d163e820
> >   6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
> >   7 | child  | 8f14e45fceea167a5a36dedd4bea2543
> >   8 | child  | c9f0f895fb98ab9159f51fd0297e236d
> >   9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
> >  10 | child  | d3d9446802a44259755d38e6d163e820
> >  11 | child  | 6512bd43d9caa6e02c990b0a82652dca
> >  12 | child  | c20ad4d76fe97759aa27a0c99bff6710
> >  13 | child  | c51ce410c124a10e0db5e4b97fc2af39
> >  14 | child  | aab3238922bcc25a6f606eb525ffdc56
> >  15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
> > (20 rows)
> >
> > When we try to update the foreign-table without DirectUpdate mode,
> > remote query tries to update the rows specified by "ctid" system column.
> > However, it was not a unique key in this case.
> >
> > postgres=# explain update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> >  QUERY PLAN
> > -
> >  Update on ftable  (cost=100.00..133.80 rows=414 width=74)
> >->  Result  (cost=100.00..133.80 rows=414 width=74)
> >  One-Time Filter: (pg_backend_pid() > 0)
> >  ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
> > (4 rows)
> >
> > [*] Note that pg_backend_pid() prevent direct update.
> >
> > postgres=# update ftable set x = 'updated' where id > 10 and
> > pg_backend_pid() > 0;
> > UPDATE 5
> > postgres=# select ctid,* from ftable;
> >   ctid  | id | label  |x
> > +++--
> >  (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
> >  (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc

[BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-02-29 Thread Kohei KaiGai

Hello,

I noticed the following scenario under the development of truncate
support on FDW.

In case when 'ftable' maps a remote table that has inherited children,...

postgres=# create table rtable_parent (id int, label text, x text);
CREATE TABLE
postgres=# create table rtable_child () inherits (rtable_parent);
CREATE TABLE
postgres=# insert into rtable_parent (select x, 'parent', md5(x::text)
from generate_series(1,10) x);
INSERT 0 10
postgres=# insert into rtable_child (select x, 'child', md5(x::text)
from generate_series(6,15) x);
INSERT 0 10
postgres=# create foreign table ftable (id int, label text, x text)
   server loopback options (table_name 'rtable_parent');
CREATE FOREIGN TABLE

The 'ftable' shows the results from both of the parent and children.
postgres=# select * from ftable;
 id | label  |x
++--
  1 | parent | c4ca4238a0b923820dcc509a6f75849b
  2 | parent | c81e728d9d4c2f636f067f89cc14862c
  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | parent | a87ff679a2f3e71d9181a67b7542122c
  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
  6 | parent | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | parent | 8f14e45fceea167a5a36dedd4bea2543
  8 | parent | c9f0f895fb98ab9159f51fd0297e236d
  9 | parent | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | parent | d3d9446802a44259755d38e6d163e820
  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 10 | child  | d3d9446802a44259755d38e6d163e820
 11 | child  | 6512bd43d9caa6e02c990b0a82652dca
 12 | child  | c20ad4d76fe97759aa27a0c99bff6710
 13 | child  | c51ce410c124a10e0db5e4b97fc2af39
 14 | child  | aab3238922bcc25a6f606eb525ffdc56
 15 | child  | 9bf31c7ff062936a96d3c8bd1f8f2ff3
(20 rows)

When we try to update the foreign-table without DirectUpdate mode,
remote query tries to update the rows specified by "ctid" system column.
However, it was not a unique key in this case.

postgres=# explain update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
 QUERY PLAN
-
 Update on ftable  (cost=100.00..133.80 rows=414 width=74)
   ->  Result  (cost=100.00..133.80 rows=414 width=74)
 One-Time Filter: (pg_backend_pid() > 0)
 ->  Foreign Scan on ftable  (cost=100.00..133.80 rows=414 width=42)
(4 rows)

[*] Note that pg_backend_pid() prevent direct update.

postgres=# update ftable set x = 'updated' where id > 10 and
pg_backend_pid() > 0;
UPDATE 5
postgres=# select ctid,* from ftable;
  ctid  | id | label  |x
+++--
 (0,1)  |  1 | parent | c4ca4238a0b923820dcc509a6f75849b
 (0,2)  |  2 | parent | c81e728d9d4c2f636f067f89cc14862c
 (0,3)  |  3 | parent | eccbc87e4b5ce2fe28308fd9f2a7baf3
 (0,4)  |  4 | parent | a87ff679a2f3e71d9181a67b7542122c
 (0,5)  |  5 | parent | e4da3b7fbbce2345d7772b0674a318d5
 (0,11) |  6 | parent | updated
 (0,12) |  7 | parent | updated
 (0,13) |  8 | parent | updated
 (0,14) |  9 | parent | updated
 (0,15) | 10 | parent | updated
 (0,1)  |  6 | child  | 1679091c5a880faf6fb5e6087eb1b2dc
 (0,2)  |  7 | child  | 8f14e45fceea167a5a36dedd4bea2543
 (0,3)  |  8 | child  | c9f0f895fb98ab9159f51fd0297e236d
 (0,4)  |  9 | child  | 45c48cce2e2d7fbdea1afc51c7c6ad26
 (0,5)  | 10 | child  | d3d9446802a44259755d38e6d163e820
 (0,11) | 11 | child  | updated
 (0,12) | 12 | child  | updated
 (0,13) | 13 | child  | updated
 (0,14) | 14 | child  | updated
 (0,15) | 15 | child  | updated
(20 rows)

The WHERE-clause (id > 10) should affect only child table.
However, it updated the rows in the parent table with same ctid.

How about your thought?
Probably, we need to fetch a pair of tableoid and ctid to identify
the remote table exactly, if not direct-update cases.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-02-29 Thread Kohei KaiGai

Hello,

The attached is revised version.

> > If callback is invoked with a foreign-relation that is specified by TRUNCATE
> > command with ONLY, it seems to me reasonable that remote TRUNCATE
> > command specifies the relation on behalf of the foreign table with ONLY.
> >
> > So, if ExecForeignTruncate() has another list to inform the context for each
> > relation, postgres_fdw can build proper remote query that may specify the
> > remote tables with ONLY-clause.
>
> Yeah, TRUNCATE can specify ONLY on a per-table basis, so having a
> second list makes sense.  Then in the FDW, just make sure to
> elog(ERROR) if the lengths do no match, and then use forboth() to loop
> over them.  One thing that you need to be careful about is that tables
> which are added to the list because of inheritance should not be
> marked with ONLY when generating the command to the remote.
>
The v5 patch added separated list for the FDW callback, to inform the context
when relations are specified by TRUNCATE command. The frels_extra
argument is a list of integers. 0 means that relevant foreign-table is specified
without "ONLY" clause. and positive means specified with "ONLY" clause.
Negative value means that foreign-tables are not specified in the TRUNCATE
command, but truncated due to dependency (like partition's child leaf).

The remote SQL generates TRUNCATE command according to the above
"extra" information. So, "TRUNCATE ONLY ftable" generate a remote query
with "TRUNCATE ONLY mapped_remote_table".
On the other hand, it can make strange results, although it is a corner case.
The example below shows the result of TRUNCATE ONLY on a foreign-table
that mapps a remote table with an inherited children.
The rows id < 10 belongs to the parent table, thus TRUNCATE ONLY tru_ftable
eliminated the remote parent, however, it looks the tru_ftable still
contains rows
after TRUNCATE command.

I wonder whether it is tangible behavior for users. Of course, "ONLY" clause
controls local hierarchy of partitioned / inherited tables, however, I'm not
certain whether the concept shall be expanded to the structure of remote tables.

+SELECT * FROM tru_ftable;
+ id |x
++--
+  5 | e4da3b7fbbce2345d7772b0674a318d5
+  6 | 1679091c5a880faf6fb5e6087eb1b2dc
+  7 | 8f14e45fceea167a5a36dedd4bea2543
+  8 | c9f0f895fb98ab9159f51fd0297e236d
+  9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ 10 | d3d9446802a44259755d38e6d163e820
+ 11 | 6512bd43d9caa6e02c990b0a82652dca
+ 12 | c20ad4d76fe97759aa27a0c99bff6710
+ 13 | c51ce410c124a10e0db5e4b97fc2af39
+ 14 | aab3238922bcc25a6f606eb525ffdc56
+(10 rows)
+
+TRUNCATE ONLY tru_ftable;  -- truncate only parent portion
+SELECT * FROM tru_ftable;
+ id |x
++--
+ 10 | d3d9446802a44259755d38e6d163e820
+ 11 | 6512bd43d9caa6e02c990b0a82652dca
+ 12 | c20ad4d76fe97759aa27a0c99bff6710
+ 13 | c51ce410c124a10e0db5e4b97fc2af39
+ 14 | aab3238922bcc25a6f606eb525ffdc56
+(5 rows)

> > Regarding to the other comments, it's all Ok for me. I'll update the patch.
> > And, I forgot "updatable" option at postgres_fdw. It should be checked on
> > the truncate also, right?
>
> Hmm.  Good point.  Being able to filter that silently through a
> configuration parameter is kind of interesting.  Now I think that this
> should be a separate option because updatable applies to DMLs.  Like,
> truncatable?
>
Ok, "truncatable" option was added.
Please check the regression test and documentation updates.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql13-truncate-on-foreign-table.v5.patch
Description: Binary data

PL/Python - lifetime of variables?

2020-02-18 Thread Kohei KaiGai

Hello,

I noticed that variables in PL/Python are not released at the end of procedure.
Does it expected behavior?

See this example below:
https://github.com/heterodb/pg-strom/blob/master/test/input/arrow_python.source#L53

This PL/Python function maps a GPU buffer as cupy.ndarray object by
cupy_strom.ipc_import()
at the L59. It shall be stored in X. I have expected that X shall be
released at end of the
procedure invocation, but not happen.

The object X internally hold IpcMemory class,
  https://github.com/heterodb/pg-strom/blob/master/python/cupy_strom.pyx
And, it has destructor routine that unmaps the above GPU buffer using CUDA API.
  https://github.com/heterodb/pg-strom/blob/master/python/cupy_ipcmem.c#L242
Because of the restriction by CUDA API, we cannot map a certain GPU buffer twice
on the same process space. So, I noticed that the second invocation of
the PL/Python
procedure on the same session failed.
The L103 explicitly reset X, by X=0, to invoke the destructor manually.

I wonder whether it is an expected behavior, or oversight of something.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-02-10 Thread Kohei KaiGai

2020年2月10日(月) 13:53 Craig Ringer :
>
> On Thu, 6 Feb 2020 at 11:09, Andres Freund  wrote:
>
> > I wasn't advocating for making plannodes.h etc frontend usable. I think
> > that's a fairly different discussion than making enum NodeTag,
> > pg_list.h, memutils.h available.  I don't see them having access to the
> > numerical value of node tag for backend structs as something actually
> > problematic (I'm pretty sure you can do that today already if you really
> > want to - but why would you?).
> >
> > I don't buy that having a separate magic number for various types that
> > we may want to use both frontend and backend is better than largely just
> > having one set of such magic type identifiers.
>
> Simply using MemoryContext as the NodeTag seems very sensible based on
> the above discussion.
>
> But rather than adding a const char * name to point to some constant
> for the implementation name as was proposed earlier, I think the
> existing pointer MemoryContextData->methods is sufficient to identify
> the context type. We could add a NameData field to
> MemoryContextMethods that the initializer sets to the implementation
> name for convenience.
>
> It's trivial to see when debugging with a   p ctx->methods->name   .
> We keep the MemoryContextData size down and we lose nothing. Though
> gdb is smart enough to annotate a pointer to the symbol
> AllocSetMethods as such when it sees it in a debuginfo build there's
> no harm in having a single static string in the const-data segment per
> memory context type.
>
> I'd also like to add a
>
>bool (*instanceof)(MemoryContext context, MemoryContextMethods 
> context_type);
>
> to MemoryContextMethods . Then replace all use of   IsA(ctx,
> AllocSetContext)   etc with a test like:
>
> #define Insanceof_AllocSetContext(ctx) \
> (ctx->methods == AllocSetMethods || ctx->is_a(AllocSetMethods));
>
AllocSetMethods is statically defined at utils/mmgr/aset.c, so this macro
shall be available only in this source file.

Isn't it sufficient to have the macro below?

#define Insanceof_AllocSetContext(ctx) \
  (IsA(ctx, MemoryContext) && \
   strcmp(((MemoryContext)(ctx))->methods->name, "AllocSetMethods") == 0)

As long as an insane extension does not define a different memory context
with the same name, it will work.


> In other words, we ask the target object what it is.
>
> This would make it possible to create wrapper implementations of
> existing contexts that do extra memory accounting or other limited
> sorts of extensions. The short-circuit testing for the known concrete
> AllocSetMethods should make it pretty much identical in performance
> terms, which is of course rather important.
>
> The OO-alike naming is not a coincidence.
>
> I can't help but notice that we're facing some of the same issues
> faced by early OO patterns. Not too surprising given that Pg uses a
> lot of pseudo-OO - some level of structural inheritance and
> behavioural inheritance, but no encapsulation, no messaging model, no
> code-to-data binding. I'm no OO purist, I don't care much so long as
> it works and is consistent.
>
> In OO terms what we seem to be facing is difficulty with extending
> existing object types into new subtypes without modifying all the code
> that knows how to work with the parent types. MemoryContext is one
> example of this, Node is another. The underlying issue is similar.
>
> Being able to do this is something I'm much more interested in being
> able to do for plan and parse nodes etc than for MemoryContext tbh,
> but the same principles apply.
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  2ndQuadrant - PostgreSQL Solutions for the Enterprise



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai

2020年1月29日(水) 11:06 Thomas Munro :
>
> On Wed, Jan 29, 2020 at 2:49 PM Kohei KaiGai  wrote:
> > 2020年1月29日(水) 4:27 Thomas Munro :
> > > On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> > > FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> > > exactly the same problem while making nearly exactly the same kind of
> > > thing (namely, a MemoryContext backed by space in the main shm area,
> > > in this case reusing the dsa.c allocator).
> > >
> > Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
> > months ago, however, missed the thread right now.
> >
> > The main point of the differences from this approach is portability of 
> > pointers
> > to shared memory chunks. (If I understand correctly)
> > PG-Strom preserves logical address space, but no physical pages, on startup
> > time, then maps shared memory segment on the fixed address on demand.
> > So, pointers are portable to all the backend processes, thus, suitable to 
> > build
> > tree structure on shared memory also.
>
> Very interesting.  PostgreSQL's DSM segments could potentially have
> been implemented that way (whereas today they are not mapped with
> MAP_FIXED), but I suppose people were worried about portability
> problems and ASLR.  The Windows port struggles with that stuff.
>
Yes. I'm not certain whether Windows can support equivalen behavior
to mmap(..., PROT_NONE) and SIGSEGV/SIGBUS handling.
It is also a reason why PG-Strom (that is only for Linux) wants to have
own shared memory management logic, at least, right now.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai

2020年1月29日(水) 4:27 Thomas Munro :
>
> On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> > I noticed MemoryContextIsValid() called by various kinds of memory context
> > routines checks its node-tag as follows:
> >
> > #define MemoryContextIsValid(context) \
> > ((context) != NULL && \
> >  (IsA((context), AllocSetContext) || \
> >   IsA((context), SlabContext) || \
> >   IsA((context), GenerationContext)))
> >
> > It allows only "known" memory context methods, even though the memory 
> > context
> > mechanism enables to implement custom memory allocator by extensions.
> > Here is a node tag nobody used: T_MemoryContext.
> > It looks to me T_MemoryContext is a neutral naming for custom memory 
> > context,
> > and here is no reason why memory context functions prevents custom methods.
> >
> >
> > https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> > I recently implemented a custom memory context for shared memory allocation
> > with portable pointers. It shall be used for cache of pre-built gpu
> > binary code and
> > metadata cache of apache arrow files.
> > However, the assertion check above requires extension to set a fake node-tag
> > to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, 
> > but
> > feel a bit bad.
>
> FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> exactly the same problem while making nearly exactly the same kind of
> thing (namely, a MemoryContext backed by space in the main shm area,
> in this case reusing the dsa.c allocator).
>
Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
months ago, however, missed the thread right now.

The main point of the differences from this approach is portability of pointers
to shared memory chunks. (If I understand correctly)
PG-Strom preserves logical address space, but no physical pages, on startup
time, then maps shared memory segment on the fixed address on demand.
So, pointers are portable to all the backend processes, thus, suitable to build
tree structure on shared memory also.

This article below introduces how our "shared memory context" works.
It is originally written in Japanese, and Google translate may
generate unnatural
English. However, its figures probably explain how it works.
https://translate.google.co.jp/translate?hl=ja=auto=en=http%3A%2F%2Fkaigai.hatenablog.com%2Fentry%2F2016%2F06%2F19%2F095127

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai

2020年1月29日(水) 2:18 Robert Haas :
>
> On Tue, Jan 28, 2020 at 11:24 AM Tom Lane  wrote:
> > I strongly object to having the subtype field be just "char".
> > I want it to be declared "MemoryContextType", so that gdb will
> > still be telling me explicitly what struct type this really is.
> > I realize that you probably did that for alignment reasons, but
> > maybe we could shave the magic number down to 2 bytes, and then
> > rearrange the field order?  Or just not sweat so much about
> > wasting a longword here.  Having those bools up at the front
> > is pretty ugly anyway.
>
> I kind of dislike having the magic number not be the first thing in
> the struct on aesthetic grounds, and possibly on the grounds that
> somebody might be examining the initial bytes manually to try to
> figure out what they've got, and having the magic number there makes
> it easier. Regarding space consumption, I guess this structure is
> already over 64 bytes and not close to 128 bytes, so adding another 8
> bytes probably isn't very meaningful, but I don't love it. One thing
> that concerns me a bit is that if somebody adds their own type of
> memory context, then MemoryContextType will have a value that is not
> actually in the enum. If compilers are optimizing the code on the
> assumption that this cannot occur, do we need to worry about undefined
> behavior?
>
> Actually, I have what I think is a better idea. I notice that the
> "type" field is actually completely unused. We initialize it, and then
> nothing in the code ever checks it or relies on it for any purpose.
> So, we could have a bug in the code that initializes that field with
> the wrong value, for a new context type or in general, and only a
> developer with a debugger would ever notice. That's because the thing
> that really matters is the 'methods' array. So what I propose is that
> we nuke the type field from orbit. If a developer wants to figure out
> what type of context they've got, they should print
> context->methods[0]; gdb will tell you the function names stored
> there, and then you'll know *for real* what type of context you've
> got.
>
> Here's a v2 that approaches the problem that way.
>
How about to have "const char *name" in MemoryContextMethods?
It is more human readable for debugging, than raw function pointers.
We already have similar field to identify the methods at CustomScanMethods.
(it is also used for EXPLAIN, not only debugging...)

I love the idea to identify the memory-context type with single identifiers
rather than two. If we would have sub-field Id and memory-context methods
separately, it probably needs Assert() to check the consistency of
them, isn't it?

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai

2020年1月29日(水) 0:56 Tomas Vondra :
>
> On Tue, Jan 28, 2020 at 11:32:49PM +0900, Kohei KaiGai wrote:
> >2020年1月28日(火) 23:09 Tomas Vondra :
> >>
> >> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >> >Hello,
> >> >
> >> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >> >routines checks its node-tag as follows:
> >> >
> >> >#define MemoryContextIsValid(context) \
> >> >((context) != NULL && \
> >> > (IsA((context), AllocSetContext) || \
> >> >  IsA((context), SlabContext) || \
> >> >  IsA((context), GenerationContext)))
> >> >
> >> >It allows only "known" memory context methods, even though the memory 
> >> >context
> >> >mechanism enables to implement custom memory allocator by extensions.
> >> >Here is a node tag nobody used: T_MemoryContext.
> >> >It looks to me T_MemoryContext is a neutral naming for custom memory 
> >> >context,
> >> >and here is no reason why memory context functions prevents custom 
> >> >methods.
> >> >
> >>
> >> Good question. I don't think there's an explicit reason not to allow
> >> extensions to define custom memory contexts, and using T_MemoryContext
> >> seems like a possible solution. It's a bit weird though, because all the
> >> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> >> T_CustomMemoryContext would be a better choice, but that only works in
> >> master, of course.
> >>
> >From the standpoint of extension author, reuse of T_MemoryContext and
> >back patching the change on MemoryContextIsValid() makes us happy. :)
> >However, even if we add a new node-tag here, the custom memory-context
> >can leave to use fake node-tag a few years later. It's better than nothing.
> >
>
> Oh, right. I forgot we still need to backpatch this bit. But that seems
> like a fairly small amount of code, so it should work.
>
> I think we can't backpatch the addition of T_CustomMemoryContext anyway
> as it essentially breaks ABI, as it changes the values assigned to T_
> constants.
>
> >> Also, it won't work if we need to add memory contexts to equalfuncs.c
> >> etc. but maybe won't need that - it's more a theoretical issue.
> >>
> >Right now, none of nodes/XXXfuncs.c support these class of nodes related to
> >MemoryContext. It shall not be a matter.
> >
>
> Yes. I did not really mean it as argument against the patch, it was
> meant more like "This could be an issue, but it actually is not." Sorry
> if that wasn't clear.
>
> >> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >> >I recently implemented a custom memory context for shared memory 
> >> >allocation
> >> >with portable pointers. It shall be used for cache of pre-built gpu
> >> >binary code and
> >> >metadata cache of apache arrow files.
> >> >However, the assertion check above requires extension to set a fake 
> >> >node-tag
> >> >to avoid backend crash. Right now, it is harmless to set 
> >> >T_AllocSetContext, but
> >> >feel a bit bad.
> >> >
> >>
> >> Interesting. Does that mean the hared memory contexts are part of the
> >> same hierarchy as "normal" contexts? That would be a bit confusing, I
> >> think.
> >>
> >If this shared memory context is a child of normal context, likely, it 
> >should be
> >reset or deleted as usual. However, if this shared memory context performs
> >as a parent of normal context, it makes a problem when different process 
> >tries
> >to delete this context, because its child (normal context) exists at the 
> >creator
> >process only. So, it needs to be used carefully.
> >
>
> Yeah, handling life cycle of a mix of those contexts may be quite tricky.
>
> But my concern was a bit more general - is it a good idea to hide the
> nature of the memory context behind the same API. If you call palloc()
> shouldn't you really know whether you're allocating the stuff in regular
> or shared memory context?
>
> Maybe it makes perfect sense, but my initial impression is that those
> seem rather different, so maybe we should keep them separate (in
> separate hierarchies or something). But I admit I don't have much
> experience with use cases that would require such shmem contexts.
>
Yeah, as you mentioned, we have no way to distinguish whether a particular
memory chunk is private memory or shared memory, right now.
It is the responsibility of software developers, and I assume the shared-
memory chunks are applied on a limited usages where it has a certain reason
why it should be shared.
On the other hand, it is the same situation even if private memory.
We should pay attention to the memory context to allocate a memory chunk from.
For example, state object to be valid during query execution must be allocated
from estate->es_query_cxt. If someone allocates it from CurrentMemoryContext,
then implicitly released, we shall consider it is a bug.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai

2020年1月28日(火) 23:09 Tomas Vondra :
>
> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >Hello,
> >
> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >routines checks its node-tag as follows:
> >
> >#define MemoryContextIsValid(context) \
> >((context) != NULL && \
> > (IsA((context), AllocSetContext) || \
> >  IsA((context), SlabContext) || \
> >  IsA((context), GenerationContext)))
> >
> >It allows only "known" memory context methods, even though the memory context
> >mechanism enables to implement custom memory allocator by extensions.
> >Here is a node tag nobody used: T_MemoryContext.
> >It looks to me T_MemoryContext is a neutral naming for custom memory context,
> >and here is no reason why memory context functions prevents custom methods.
> >
>
> Good question. I don't think there's an explicit reason not to allow
> extensions to define custom memory contexts, and using T_MemoryContext
> seems like a possible solution. It's a bit weird though, because all the
> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> T_CustomMemoryContext would be a better choice, but that only works in
> master, of course.
>
>From the standpoint of extension author, reuse of T_MemoryContext and
back patching the change on MemoryContextIsValid() makes us happy. :)
However, even if we add a new node-tag here, the custom memory-context
can leave to use fake node-tag a few years later. It's better than nothing.

> Also, it won't work if we need to add memory contexts to equalfuncs.c
> etc. but maybe won't need that - it's more a theoretical issue.
>
Right now, none of nodes/XXXfuncs.c support these class of nodes related to
MemoryContext. It shall not be a matter.

> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >I recently implemented a custom memory context for shared memory allocation
> >with portable pointers. It shall be used for cache of pre-built gpu
> >binary code and
> >metadata cache of apache arrow files.
> >However, the assertion check above requires extension to set a fake node-tag
> >to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, 
> >but
> >feel a bit bad.
> >
>
> Interesting. Does that mean the hared memory contexts are part of the
> same hierarchy as "normal" contexts? That would be a bit confusing, I
> think.
>
If this shared memory context is a child of normal context, likely, it should be
reset or deleted as usual. However, if this shared memory context performs
as a parent of normal context, it makes a problem when different process tries
to delete this context, because its child (normal context) exists at the creator
process only. So, it needs to be used carefully.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Is custom MemoryContext prohibited?

2020-01-27 Thread Kohei KaiGai

Hello,

I noticed MemoryContextIsValid() called by various kinds of memory context
routines checks its node-tag as follows:

#define MemoryContextIsValid(context) \
((context) != NULL && \
 (IsA((context), AllocSetContext) || \
  IsA((context), SlabContext) || \
  IsA((context), GenerationContext)))

It allows only "known" memory context methods, even though the memory context
mechanism enables to implement custom memory allocator by extensions.
Here is a node tag nobody used: T_MemoryContext.
It looks to me T_MemoryContext is a neutral naming for custom memory context,
and here is no reason why memory context functions prevents custom methods.


https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
I recently implemented a custom memory context for shared memory allocation
with portable pointers. It shall be used for cache of pre-built gpu
binary code and
metadata cache of apache arrow files.
However, the assertion check above requires extension to set a fake node-tag
to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
feel a bit bad.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-27 Thread Kohei KaiGai

2020年1月21日(火) 15:38 Michael Paquier :
>
> On Mon, Jan 20, 2020 at 10:50:21PM +0900, Michael Paquier wrote:
> > I have spent a good amount of time polishing 0001, tweaking the docs,
> > comments, error messages and a bit its logic.  I am getting
> > comfortable with it, but it still needs an extra lookup, an indent run
> > which has some noise and I lacked of time today.  0002 has some of its
> > issues fixed and I have not reviewed it fully yet.  There are still
> > some places not adapted in it (why do you use "Bug?" in all your
> > elog() messages by the way?), so the postgres_fdw part needs more
> > attention.  Could you think about some docs for it by the way?
>
> I have more comments about the postgres_fdw that need to be
> addressed.
>
> +   if (!OidIsValid(server_id))
> +   {
> +   server_id = GetForeignServerIdByRelId(frel_oid);
> +   user = GetUserMapping(GetUserId(), server_id);
> +   conn = GetConnection(user, false);
> +   }
> +   else if (server_id != GetForeignServerIdByRelId(frel_oid))
> +   elog(ERROR, "Bug? inconsistent Server-IDs were supplied");
> I agree here that an elog() looks more adapted than an assert.
> However I would change the error message to be more like "incorrect
> server OID supplied by the TRUNCATE callback" or something similar.
> The server OID has to be valid anyway, so don't you just bypass any
> errors if it is not set?
>
> +-- truncate two tables at a command
> What does this sentence mean?  Isn't that "truncate two tables in one
> single command"?
>
> The table names in the tests are rather hard to parse.  I think that
> it would be better to avoid underscores at the beginning of the
> relation names.
>
> It would be nice to have some coverage with inheritance, and also
> track down in the tests what we expect when ONLY is specified in that
> case (with and without ONLY, both parent and child relations).
>
> Anyway, attached is the polished version for 0001 that I would be fine
> to commit, except for one point: are there objections if we do not
> have extra handling for ONLY when it comes to foreign tables with
> inheritance?  As the patch stands, the list of relations is first
> built, with an inheritance recursive lookup done depending on if ONLY
> is used or not.  Hence, if using "TRUNCATE ONLY foreign_tab, ONLY
> foreign_tab2", then only those two tables would be passed down to the
> FDW.  If ONLY is removed, both tables as well as their children are
> added to the lists of relations split by server OID.  One problem is
> that this could be confusing for some users I guess?  For example,
> with a 1:1 mapping in the schema of the local and remote servers, a
> user asking for TRUNCATE ONLY foreign_tab would pass down to the
> remote just the equivalent of "TRUNCATE foreign_tab" using
> postgres_fdw, causing the full inheritance tree to be truncated on the
> remote side.  The concept of ONLY mixed with inherited foreign tables
> is rather blurry (point raised by Stephen upthread).
>
Hmm. Do we need to deliver another list to inform why these relations are
trundated?
If callback is invoked with a foreign-relation that is specified by TRUNCATE
command with ONLY, it seems to me reasonable that remote TRUNCATE
command specifies the relation on behalf of the foreign table with ONLY.

Foreign-tables can be truncated because ...
1. it is specified by user with ONLY-clause.
2. it is specified by user without ONLY-clause.
3. it is inherited child of the relations specified at 2.
4. it depends on the relations picked up at 1-3.

So, if ExecForeignTruncate() has another list to inform the context for each
relation, postgres_fdw can build proper remote query that may specify the
remote tables with ONLY-clause.

Regarding to the other comments, it's all Ok for me. I'll update the patch.
And, I forgot "updatable" option at postgres_fdw. It should be checked on
the truncate also, right?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-19 Thread Kohei KaiGai

2020年1月20日(月) 11:09 Kyotaro Horiguchi :
>
> Hello.
>
> At Fri, 17 Jan 2020 22:49:41 +0900, Kohei KaiGai  wrote 
> in
> > The v3 patch updated the points below:
>
> Did you attached it?
>
Sorry, it was a midnight job on Friday.
Please check the attached patch.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql13-truncate-on-foreign-table.v3.patch
Description: Binary data

Re: TRUNCATE on foreign tables

2020-01-17 Thread Kohei KaiGai

Hi,

The v3 patch updated the points below:
- 2nd arg of ExecForeignTruncate was changed to DropBehavior, not bool
- ExecuteTruncateGuts() uses a local hash table to track a pair of server-id
  and list of the foreign tables managed by the server.
- Error message on truncate_check_rel() was revised as follows:
   "foreign data wrapper \"%s\" on behalf of the foreign table \"%s\"
does not support TRUNCATE"
- deparseTruncateSql() of postgres_fdw generates entire remote SQL as
like other commands.
- Document SGML was updated.

Best regards,

2020年1月16日(木) 14:40 Michael Paquier :
>
> On Wed, Jan 15, 2020 at 11:33:07PM +0900, Kohei KaiGai wrote:
> > 2020年1月15日(水) 17:11 Michael Paquier :
> >> I have done a quick read through the patch.  You have modified the
> >> patch to pass down to the callback a list of relation OIDs to execute
> >> one command for all, and there are tests for FKs so that coverage
> >> looks fine.
> >>
> >> Regression tests are failing with this patch:
> >>  -- TRUNCATE doesn't work on foreign tables, either directly or
> >>  recursively
> >>  TRUNCATE ft2;  -- ERROR
> >> -ERROR:  "ft2" is not a table
> >> +ERROR:  foreign-data wrapper "dummy" has no handler
> >> You visibly just need to update the output because no handlers are
> >> available for truncate in this case.
> >>
> > What error message is better in this case? It does not print "ft2" anywhere,
> > so user may not notice that "ft2" is the source of the error.
> > How about 'foreign table "ft2" does not support truncate' ?
>
> It sounds to me that this message is kind of right.  This FDW "dummy"
> does not have any FDW handler at all, so it complains about it.
> Having no support for TRUNCATE is something that may happen after
> that.  Actually, this error message from your patch used for a FDW
> which has a  handler but no TRUNCATE support could be reworked:
> +   if (!fdwroutine->ExecForeignTruncate)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +errmsg("\"%s\" is not a supported foreign table",
> +   relname)));
> Something like "Foreign-data wrapper \"%s\" does not support
> TRUNCATE"?
>
> >> + frels_list is a list of foreign tables that are
> >> + connected to a particular foreign server; thus, these foreign tables
> >> + should have identical foreign server ID
> >> The list is built by the backend code, so that has to be true.
> >>
> >> +   foreach (lc, frels_list)
> >> +   {
> >> +   Relationfrel = lfirst(lc);
> >> +   Oid frel_oid = RelationGetRelid(frel);
> >> +
> >> +   if (server_id == GetForeignServerIdByRelId(frel_oid))
> >> +   {
> >> +   frels_list = foreach_delete_current(frels_list, lc);
> >> +   curr_frels = lappend(curr_frels, frel);
> >> +   }
> >> +   }
> >> Wouldn't it be better to fill in a hash table for each server with a
> >> list of relations?
> >
> > It's just a matter of preference. A temporary hash-table with server-id
> > and list of foreign-tables is an idea. Let me try to revise.
>
> Thanks.  It would not matter much for relations without inheritance
> children, but if truncating a partition tree with many foreign tables
> using various FDWs that could matter performance-wise.
> --
> Michael



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-15 Thread Kohei KaiGai

2020年1月15日(水) 17:11 Michael Paquier :
>
> On Tue, Jan 14, 2020 at 06:16:17PM +0900, Kohei KaiGai wrote:
> > The "frels_list" is a list of foreign tables that are connected to a 
> > particular
> > foreign server, thus, the server-id pulled out by foreign tables id should 
> > be
> > identical for all the relations in the list.
> > Due to the API design, this callback shall be invoked for each foreign 
> > server
> > involved in the TRUNCATE command, not per table basis.
> >
> > The 2nd and 3rd arguments also informs FDW driver other options of the
> > command. If FDW has a concept of "cascaded truncate" or "restart sequence",
> > it can adjust its remote query. In postgres_fdw, it follows the manner of
> > usual TRUNCATE command.
>
> I have done a quick read through the patch.  You have modified the
> patch to pass down to the callback a list of relation OIDs to execute
> one command for all, and there are tests for FKs so that coverage
> looks fine.
>
> Regression tests are failing with this patch:
>  -- TRUNCATE doesn't work on foreign tables, either directly or
>  recursively
>  TRUNCATE ft2;  -- ERROR
> -ERROR:  "ft2" is not a table
> +ERROR:  foreign-data wrapper "dummy" has no handler
> You visibly just need to update the output because no handlers are
> available for truncate in this case.
>
What error message is better in this case? It does not print "ft2" anywhere,
so user may not notice that "ft2" is the source of the error.
How about 'foreign table "ft2" does not support truncate' ?

> +void
> +deparseTruncateSql(StringInfo buf, Relation rel)
> +{
> +   deparseRelation(buf, rel);
> +}
> Don't see much point in having this routine.
>
deparseRelation() is a static function in postgres_fdw/deparse.c
On the other hand, it may be better to move entire logic to construct
remote TRUNCATE command in the deparse.c side like other commands.

> + If FDW does not provide this callback, PostgreSQL considers
> + TRUNCATE is not supported on the foreign table.
> +
> This sentence is weird.  Perhaps you meant "as not supported"?
>
Yes.
If FDW does not provide this callback, PostgreSQL performs as if TRUNCATE
is not supported on the foreign table.

> + frels_list is a list of foreign tables that are
> + connected to a particular foreign server; thus, these foreign tables
> + should have identical foreign server ID
> The list is built by the backend code, so that has to be true.
>
> +   foreach (lc, frels_list)
> +   {
> +   Relationfrel = lfirst(lc);
> +   Oid frel_oid = RelationGetRelid(frel);
> +
> +   if (server_id == GetForeignServerIdByRelId(frel_oid))
> +   {
> +   frels_list = foreach_delete_current(frels_list, lc);
> +   curr_frels = lappend(curr_frels, frel);
> +   }
> +   }
> Wouldn't it be better to fill in a hash table for each server with a
> list of relations?
>
It's just a matter of preference. A temporary hash-table with server-id
and list of foreign-tables is an idea. Let me try to revise.

> +typedef void (*ExecForeignTruncate_function) (List *frels_list,
> + bool is_cascade,
> + bool restart_seqs);
> I would recommend to pass down directly DropBehavior instead of a
> boolean to the callback.  That's more extensible.
>
Ok,

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-14 Thread Kohei KaiGai

Hello,

The attached patch is the revised version of TRUNCATE on foreign tables.

Definition of the callback is revised as follows:

typedef void (*ExecForeignTruncate_function) (List *frels_list,
 bool is_cascade,
 bool restart_seqs);

The "frels_list" is a list of foreign tables that are connected to a particular
foreign server, thus, the server-id pulled out by foreign tables id should be
identical for all the relations in the list.
Due to the API design, this callback shall be invoked for each foreign server
involved in the TRUNCATE command, not per table basis.

The 2nd and 3rd arguments also informs FDW driver other options of the
command. If FDW has a concept of "cascaded truncate" or "restart sequence",
it can adjust its remote query. In postgres_fdw, it follows the manner of
usual TRUNCATE command.

Best regards,

2020年1月8日(水) 1:08 Kohei KaiGai :
>
> 2020年1月7日(火) 16:03 Michael Paquier :
> >
> > On Mon, Jan 06, 2020 at 04:32:39PM -0500, Stephen Frost wrote:
> > > RESTRICT, yes.  I don't know about ONLY being sensible as we don't
> > > really deal with inheritance and foreign tables very cleanly today, as I
> > > said up-thread, so I'm not sure if we really want to put in the effort
> > > that it'd require to figure out how to make ONLY make sense.
> >
> > True enough.
> >
> > > The question about how to handle IDENTITY is a good one.  I suppose
> > > we could just pass that down and let the FDW sort it out..?
> >
> > Looking at the code, ExecuteTruncateGuts() passes down restart_seqs,
> > so it sounds sensible to me to just pass down that to the FDW
> > callback and the callback decide what to do.
> >
> It looks to me the local sequences owned by a foreign table shall be restarted
> by the core, regardless of relkind of the owner relation. So, even if FDW 
> driver
> is buggy, consistency of the local database is kept, right?
> Indeed, "restart_seqs" flag is needed to propagate the behavior, however,
> it shall be processed on the remote side via the secondary "TRUNCATE" command.
> Is it so sensitive?
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei 

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

pgsql13-truncate-on-foreign-table.v2.patch
Description: Binary data

Re: How to retain lesser paths at add_path()?

2020-01-13 Thread Kohei KaiGai

The v2 patch is attached.

This adds two dedicated lists on the RelOptInfo to preserve lesser paths
if extension required to retain the path-node to be removed in usual manner.
These lesser paths are kept in the separated list, so it never expand the length
of pathlist and partial_pathlist. That was the arguable point in the discussion
at the last October.

The new hook is called just before the path-node removal operation, and
gives extension a chance for extra decision.
If extension considers the path-node to be removed can be used in the upper
path construction stage, they can return 'true' as a signal to preserve this
lesser path-node.
In case when same kind of path-node already exists in the preserved_pathlist
and the supplied lesser path-node is cheaper than the old one, extension can
remove the worse one arbitrarily to keep the length of preserved_pathlist.
(E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
pathlist for further opportunity of combined usage with GpuPreAgg path-node.
It just needs "the best GpuJoin path-node" somewhere, not two or more.)

Because PostgreSQL core has no information which preserved path-node can
be removed, extensions that uses path_removal_decision_hook() has responsibility
to keep the length of preserved_(partial_)pathlist reasonable.

BTW, add_path() now removes the lesser path-node by pfree(), not only detach
from the path-list. (IndexPath is an exception)
Does it really make sense? It only releases the path-node itself, so may not
release entire objects. So, efficiency of memory usage is limited. And
ForeignScan
/ CustomScan may references the path-node to be removed. It seems to me here
is no guarantee lesser path-nodes except for IndexPath nodes are safe
to release.

Best regards,

2020年1月11日(土) 21:27 Tomas Vondra :
>
> On Sat, Jan 11, 2020 at 05:07:11PM +0900, Kohei KaiGai wrote:
> >Hi,
> >
> >The proposition I posted at 10th-Oct proposed to have a separate list to 
> >retain
> >lesser paths not to expand the path_list length, but here are no comments by
> >others at that time.
> >Indeed, the latest patch has not been updated yet.
> >Please wait for a few days. I'll refresh the patch again.
> >
>
> OK, thanks for the update. I've marked the patch as "waiting on author".
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

pgsql13-path_removal_decision_hook.v2.patch
Description: Binary data

Re: How to retain lesser paths at add_path()?

2020-01-11 Thread Kohei KaiGai

Hi,

The proposition I posted at 10th-Oct proposed to have a separate list to retain
lesser paths not to expand the path_list length, but here are no comments by
others at that time.
Indeed, the latest patch has not been updated yet.
Please wait for a few days. I'll refresh the patch again.

Thanks,

2020年1月11日(土) 11:01 Tomas Vondra :
>
> Hi,
>
> I wonder what is the status of this patch/thread. There was quite a bit
> of discussion about possible approaches, but we currently don't have any
> patch for review, AFAICS. Not sure what's the plan?
>
> So "needs review" status seems wrong, and considering we haven't seen
> any patch since August (so in the past two CFs) I propose marking it as
> returned with feedback. Any objections?
>
> FWIW I think we may well need a more elaborate logic which paths to
> keep, but I'd prefer re-adding it back to the CF when we actually have a
> new patch.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-07 Thread Kohei KaiGai

2020年1月7日(火) 16:03 Michael Paquier :
>
> On Mon, Jan 06, 2020 at 04:32:39PM -0500, Stephen Frost wrote:
> > RESTRICT, yes.  I don't know about ONLY being sensible as we don't
> > really deal with inheritance and foreign tables very cleanly today, as I
> > said up-thread, so I'm not sure if we really want to put in the effort
> > that it'd require to figure out how to make ONLY make sense.
>
> True enough.
>
> > The question about how to handle IDENTITY is a good one.  I suppose
> > we could just pass that down and let the FDW sort it out..?
>
> Looking at the code, ExecuteTruncateGuts() passes down restart_seqs,
> so it sounds sensible to me to just pass down that to the FDW
> callback and the callback decide what to do.
>
It looks to me the local sequences owned by a foreign table shall be restarted
by the core, regardless of relkind of the owner relation. So, even if FDW driver
is buggy, consistency of the local database is kept, right?
Indeed, "restart_seqs" flag is needed to propagate the behavior, however,
it shall be processed on the remote side via the secondary "TRUNCATE" command.
Is it so sensitive?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-02 Thread Kohei KaiGai

2020年1月2日(木) 20:56 Alvaro Herrera :
>
> On 2020-Jan-02, Kohei KaiGai wrote:
>
> > 2020年1月2日(木) 12:16 Alvaro Herrera :
> > >
> > > I think this would need to preserve the notion of multi-table truncates.
> > > Otherwise it won't be possible to truncate tables linked by FKs.  I
> > > think this means the new entrypoint needs to receive a list of rels to
> > > truncate, not just one.  (Maybe an alternative is to make it "please
> > > truncate rel X, and be aware that relations Y,Z are also being
> > > truncated at the same time".)
> >
> > Please check at ExecuteTruncateGuts(). It makes a list of relations to be
> > truncated, including the relations that references the specified table by 
> > FK,
> > prior to invocation of the new FDW callback.
> > So, if multiple foreign tables are involved in a single TRUNCATE command,
> > this callback can be invoked multiple times.
>
> Yeah, that's my concern: if you have postgres_fdw tables linked by FKs
> in the remote server, the truncate will fail because it'll try to
> truncate them in separate commands instead of using a multi-table
> truncate.
>
Ah, it makes sense.
Probably, backend can make sub-list of the foreign tables to be
truncated for each
pair of FDW and Server, then invoke the FDW callback only once with this list.
FDW driver can issue multi-tables truncate on all the foreign tables
supplied, with
nothing difficult to do.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2020-01-01 Thread Kohei KaiGai

2020年1月2日(木) 12:16 Alvaro Herrera :
>
> On 2020-Jan-01, Kohei KaiGai wrote:
>
> > Hello,
> >
> > The attached patch adds TRUNCATE support on foreign table.
> >
> > This patch adds an optional callback ExecForeignTruncate(Relation rel)
> > to FdwRoutine.
> > It is invoked during ExecuteTruncateGuts, then FDW driver hands over
> > the jobs related to complete "truncate on the foreign table".
>
> I think this would need to preserve the notion of multi-table truncates.
> Otherwise it won't be possible to truncate tables linked by FKs.  I
> think this means the new entrypoint needs to receive a list of rels to
> truncate, not just one.  (Maybe an alternative is to make it "please
> truncate rel X, and be aware that relations Y,Z are also being
> truncated at the same time".)
>
Please check at ExecuteTruncateGuts(). It makes a list of relations to be
truncated, including the relations that references the specified table by FK,
prior to invocation of the new FDW callback.
So, if multiple foreign tables are involved in a single TRUNCATE command,
this callback can be invoked multiple times.

> Looking at apache arrow documentation, it doesn't appear that it has
> anything like FK constraints.
>
Yes. It is just a bunch of columnar data.
In Apache Arrow, no constraint are defined except for "NOT NULL".
(In case when Field::nullable == false, all the values are considered
valid date.)

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: TRUNCATE on foreign tables

2019-12-31 Thread Kohei KaiGai

Hello,

The attached patch adds TRUNCATE support on foreign table.

This patch adds an optional callback ExecForeignTruncate(Relation rel)
to FdwRoutine.
It is invoked during ExecuteTruncateGuts, then FDW driver hands over
the jobs related
to complete "truncate on the foreign table".
Of course, it is not clear to define the concept of "truncate" on some
FDW drivers.
In this case, TRUNCATE command prohibits to apply these foreign tables.

2019 is not finished at everywhere on the earth yet, so I believe it
is Ok to add this patch
to CF-2020:Jan.

Best regards,

2020年1月1日(水) 11:46 Kohei KaiGai :
>
> Hello,
>
> We right now don't support TRUNCATE on foreign tables.
> It may be a strange missing piece and restriction of operations.
> For example, if a partitioned table contains some foreign tables in its leaf,
> user cannot use TRUNCATE command to clean up the partitioned table.
>
> Probably, API design is not complicated. We add a new callback for truncate
> on the FdwRoutine, and ExecuteTruncateGuts() calls it if relation is foreign-
> table. In case of postgres_fdw, it also issues "TRUNCATE" command on the
> remote side in the transaction block [*1].
>
> [*1] But I hope oracle_fdw does not follow this implementation as is. :-)
>
> How about your thought?
>
> I noticed this restriction when I'm working on Arrow_Fdw enhancement for
> "writable" capability. Because Apache Arrow [*2] is a columnar file format,
> it is not designed for UPDATE/DELETE, but capable to bulk-INSERT.
> It is straightforward idea to support only INSERT, and clear data by TRUNCATE.
>
> [*2] Apache Arrow - https://arrow.apache.org/docs/format/Columnar.html
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei 



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql13-truncate-on-foreign-table.v1.patch
Description: Binary data

TRUNCATE on foreign tables

2019-12-31 Thread Kohei KaiGai

Hello,

We right now don't support TRUNCATE on foreign tables.
It may be a strange missing piece and restriction of operations.
For example, if a partitioned table contains some foreign tables in its leaf,
user cannot use TRUNCATE command to clean up the partitioned table.

Probably, API design is not complicated. We add a new callback for truncate
on the FdwRoutine, and ExecuteTruncateGuts() calls it if relation is foreign-
table. In case of postgres_fdw, it also issues "TRUNCATE" command on the
remote side in the transaction block [*1].

[*1] But I hope oracle_fdw does not follow this implementation as is. :-)

How about your thought?

I noticed this restriction when I'm working on Arrow_Fdw enhancement for
"writable" capability. Because Apache Arrow [*2] is a columnar file format,
it is not designed for UPDATE/DELETE, but capable to bulk-INSERT.
It is straightforward idea to support only INSERT, and clear data by TRUNCATE.

[*2] Apache Arrow - https://arrow.apache.org/docs/format/Columnar.html

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Asymmetric partition-wise JOIN

2019-12-26 Thread Kohei KaiGai

Hello,

This crash was reproduced on our environment also.
It looks to me adjust_child_relids_multilevel() didn't expect a case
when supplied 'relids'
(partially) indicate normal and non-partitioned relation.
It tries to build a new 'parent_relids' that is a set of
appinfo->parent_relid related to the
supplied 'child_relids'. However, bits in child_relids that indicate
normal relations are
unintentionally dropped here. Then, adjust_child_relids_multilevel()
goes to an infinite
recursion until stack limitation.

The attached v2 fixed the problem, and regression test finished correctly.

Best regards,

2019年12月1日(日) 12:24 Michael Paquier :
>
> On Sat, Aug 24, 2019 at 05:33:01PM +0900, Kohei KaiGai wrote:
> > On the other hands, it eventually consumes almost equivalent amount
> > of memory to load the inner relations, if no leafs are pruned, and if we
> > could extend the Hash-node to share the hash-table with sibling
> > join-nodess.
>
> The patch crashes when running the regression tests, per the report of
> the automatic patch tester.  Could you look at that?  I have moved the
> patch to nexf CF, waiting on author.
> --
> Michael



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 
#12912 0x006f9471 in adjust_child_relids_multilevel (
root=root@entry=0x2115240, relids=relids@entry=0x2106df8,
child_relids=child_relids@entry=0x0,
top_parent_relids=top_parent_relids@entry=0x20afd88) at appendinfo.c:602
#12913 0x006f9471 in adjust_child_relids_multilevel (
root=root@entry=0x2115240, relids=relids@entry=0x2106df8,
child_relids=child_relids@entry=0x0,
top_parent_relids=top_parent_relids@entry=0x20afd88) at appendinfo.c:602
#12914 0x006f9471 in adjust_child_relids_multilevel (
root=root@entry=0x2115240, relids=relids@entry=0x2106df8,
child_relids=child_relids@entry=0x0,
top_parent_relids=top_parent_relids@entry=0x20afd88) at appendinfo.c:602
#12915 0x006f9471 in adjust_child_relids_multilevel (
root=root@entry=0x2115240, relids=relids@entry=0x2106df8,
child_relids=child_relids@entry=0x20b0bb8,
top_parent_relids=top_parent_relids@entry=0x20afd88) at appendinfo.c:602
#12916 0x006f9471 in adjust_child_relids_multilevel (
root=root@entry=0x2115240, relids=relids@entry=0x2106df8,
child_relids=child_relids@entry=0x20af9d8,
top_parent_relids=top_parent_relids@entry=0x20afd88) at appendinfo.c:602
#12917 0x006c9805 in add_child_join_rel_equivalences (
root=root@entry=0x2115240, nappinfos=1, appinfos=appinfos@entry=0x20afdb0,
parent_joinrel=parent_joinrel@entry=0x20ae420,
child_joinrel=child_joinrel@entry=0x20aef10) at equivclass.c:2458
#12918 0x0070bc07 in build_child_join_rel (root=root@entry=0x2115240,
outer_rel=outer_rel@entry=0x210a318, inner_rel=inner_rel@entry=0x21051a0,
parent_joinrel=parent_joinrel@entry=0x20ae420,
restrictlist=restrictlist@entry=0x20afcd8, sjinfo=sjinfo@entry=0x20afa00,
jointype=JOIN_INNER) at relnode.c:896
#12919 0x006d2dcf in extract_asymmetric_partitionwise_subjoin (
append_path=0x20a4000, extra=0x7ffea461c710, extra=0x7ffea461c710,
jointype=JOIN_INNER, inner_rel=0x21051a0, joinrel=0x20ae420,
root=0x2115240) at joinrels.c:1573
#12920 try_asymmetric_partitionwise_join (root=root@entry=0x2115240,
joinrel=joinrel@entry=0x20ae420, outer_rel=outer_rel@entry=0x207cce0,
inner_rel=inner_rel@entry=0x21051a0, jointype=jointype@entry=JOIN_INNER,
extra=extra@entry=0x7ffea461c710) at joinrels.c:1641
#12921 0x006cf296 in add_paths_to_joinrel (root=root@entry=0x2115240,
joinrel=joinrel@entry=0x20ae420, outerrel=outerrel@entry=0x207cce0,
innerrel=innerrel@entry=0x21051a0, jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x7ffea461c8c0, restrictlist=0x20ae898)
at joinpath.c:333
#12922 0x006d1a3a in populate_joinrel_with_paths (root=0x2115240,
rel1=0x207cce0, rel2=0x21051a0, joinrel=0x20ae420, sjinfo=0x7ffea461c8c0,
restrictlist=0x20ae898) at joinrels.c:804
#12923 0x006d2505 in make_join_rel (root=root@entry=0x2115240,
rel1=rel1@entry=0x207cce0, rel2=rel2@entry=0x21051a0) at joinrels.c:757
#12924 0x006d278d in make_rels_by_clause_joins (
other_rels=, other_rels_list=0x20ae390, old_rel=0x207cce0,
root=0x2115240) at joinrels.c:309
#12925 join_search_one_level (root=root@entry=0x2115240, level=level@entry=2)
at joinrels.c:120
#12926 0x006bdd4b in standard_join_search (root=0x2115240,
levels_needed=3, initial_rels=) at allpaths.c:2879
#12927 0x006be1c4 in make_one_rel (root=root@entry=0x2115240,
joinlist=joinlist@entry=0x2106b80) at allpaths.c:227
#12928 0x006e1cdb in query_planner (root=root@entry=0x2115240,
qp_callback=qp_callback@entry=0x6e22b0 ,
qp_extra=qp_extra@entry=0x7ffea461cb60) at planmain.c:269
#12929 0x006e6962 in grouping_planner (root=,
inhe

Re: How to retain lesser paths at add_path()?

2019-10-09 Thread Kohei KaiGai

2019年10月8日(火) 1:56 Tom Lane :
>
> Kohei KaiGai  writes:
> > 2019年10月7日(月) 23:44 Robert Haas :
> >> But if we want to stick with the ad-hoc method, we could also just
> >> have four possible return values: dominates, dominated-by, both, or
> >> neither.
>
> > It seems to me this is a bit different from the purpose of this hook.
> > I never intend to overwrite existing cost-based decision by this hook.
> > The cheapest path at a particular level is the cheapest one regardless
> > of the result of this hook. However, this hook enables to prevent
> > immediate elimination of a particular path that we (extension) want to
> > use it later and may have potentially cheaper cost (e.g; a pair of
> > custom GpuAgg + GpuJoin by reduction of DMA cost).
>
> I do not think this will work for the purpose you wish, for the reason
> Tomas already pointed out: if you don't also modify the accept_new
> decision, there's no guarantee that the path you want will get into
> the relation's path list in the first place.
>
Ah, it is right, indeed. We may need to have a variation of add_path()
that guarantee to preserve a path newly added.
We may be utilize the callback to ask extension whether it allows the
new path to be dominated by the existing cheapest path also.

> Another problem with trying to manage this only by preventing removals
> is that it is likely to lead to keeping extra paths and thereby wasting
> planning effort.  What if there's more than one path having the property
> you want to keep?
>
My assumption is that upper path tries to walk on the path-list, not only
cheapest one, to construct upper paths with lesser paths if they are capable
for special optimization.
Of course, it is not a cheap way, however, majority of path-nodes are not
interested in the lesser paths, as its sub-path. Only limited number of
extension will walk on the lesser path also?
A separated list is probably an idea, to keep the lesser paths. It is not
referenced at the usual path, however, extension can walk on the list
to lookup another opportunity more than the cheapest path.
In this case, length of the path_list is not changed.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: How to retain lesser paths at add_path()?

2019-10-07 Thread Kohei KaiGai

2019年10月7日(月) 23:44 Robert Haas :
>
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane  wrote:
> > We could imagine, maybe, that a hook for the purpose of allowing an
> > additional dimension to be considered would be essentially a path
> > comparison function, returning -1, +1, or 0 depending on whether
> > path A is dominated by path B (on this new dimension), dominates
> > path B, or neither.  However, I do not see how multiple extensions
> > could usefully share use of such a hook.
>
> Typically, we support hook-sharing mostly by ad-hoc methods: when
> installing a hook, you remember the previous value of the function
> pointer, and arrange to call that function yourself.  That's not a
> great method.  One problem with it is that you can't reasonably
> uninstall a hook function, which would be a nice thing to be able to
> do. We could do better by reusing the technique from on_shmem_exit or
> RegisterXactCallbacks: keep an array of pointers, and call them in
> order. I wish we'd retrofit all of our hooks to work more like that;
> being able to unload shared libraries would be a good feature.
>
> But if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.
>
It seems to me this is a bit different from the purpose of this hook.
I never intend to overwrite existing cost-based decision by this hook.
The cheapest path at a particular level is the cheapest one regardless
of the result of this hook. However, this hook enables to prevent
immediate elimination of a particular path that we (extension) want to
use it later and may have potentially cheaper cost (e.g; a pair of
custom GpuAgg + GpuJoin by reduction of DMA cost).

So, I think expected behavior when multiple extensions would use
this hook is clear. If any of call-chain on the hook wanted to preserve
the path, it should be kept on the pathlist. (Of couse, it is not a cheapest
one)

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.
>
> I wonder whether we might get a nicer solution to this problem if our
> method of managing paths looked less something invented by a LISP
> hacker, but I don't have a specific proposal in mind.
>
One other design in my mind is, add a callback function pointer on Path
structure. Only if Path structure has valid pointer (not-NULL), add_path()
calls extension's own logic to determine whether the Path can be
eliminated now.
This design may minimize the number of callback invocation.

One potential downside of this approach is, function pointer makes
hard to implement readfuncs of Path nodes, even though we have
no read handler of them, right now.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add a MAC check for TRUNCATE

2019-09-02 Thread Kohei KaiGai

Hello Yuli,

2019年7月25日(木) 3:52 Yuli Khodorkovskiy :
> Since all DAC checks should have corresponding MAC, this patch adds a
> hook to allow extensions to implement a MAC check on TRUNCATE. I have
> also implemented this access check in the sepgsql extension.
>
> One important thing to note is that refpolicy [1] and Redhat based
> distributions do not have the SELinux permission for db_table {truncate}
> implemented.
>
How db_table:{delete} permission is different from truncate?
>From the standpoint of data access, TRUNCATE is equivalent to DELETE
without WHERE, isn't it?
Of course, there are some differences between them. TRUNCATE takes
exclusive locks and eliminates underlying data blocks, on the other hands,
DELETE removes rows under MVCC manner. However, both of them
eventually removes data from the target table.

I like to recommend to reuse "db_table:{delete}" permission for TRUNCATE.
How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Asymmetric partition-wise JOIN

2019-08-24 Thread Kohei KaiGai

2019年8月24日(土) 7:02 Thomas Munro :
>
> On Fri, Aug 23, 2019 at 4:05 AM Kohei KaiGai  wrote:
> > We can consider the table join ptable X t1 above is equivalent to:
> >   (ptable_p0 + ptable_p1 + ptable_p2) X t1
> > = (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
> > It returns an equivalent result, however, rows are already reduced by 
> > HashJoin
> > in the individual leaf of Append, so CPU-cycles consumed by Append node can
> > be cheaper.
> >
> > On the other hands, it has a downside because t1 must be read 3 times and
> > hash table also must be built 3 times. It increases the expected cost,
> > so planner
> > may not choose the asymmetric partition-wise join plan.
>
> What if you include the partition constraint as a filter on t1?  So you get:
>
> ptable X t1 =
>   (ptable_p0 X (σ hash(dist)%4=0 (t1))) +
>   (ptable_p1 X (σ hash(dist)%4=1 (t1))) +
>   (ptable_p2 X (σ hash(dist)%4=2 (t1))) +
>   (ptable_p3 X (σ hash(dist)%4=3 (t1)))
>
> Pros:
> 1.  The hash tables will not contain unnecessary junk.
> 2.  You'll get the right answer if t1 is on the outer side of an outer join.
> 3.  If this runs underneath a Parallel Append and t1 is big enough
> then workers will hopefully cooperate and do a synchronised scan of
> t1.
> 4.  The filter might enable a selective and efficient plan like an index scan.
>
> Cons:
> 1.  The filter might not enable a selective and efficient plan, and
> therefore cause extra work.
>
> (It's a little weird in this example because don't usually see hash
> functions in WHERE clauses, but that could just as easily be dist
> BETWEEN 1 AND 99 or any other partition constraint.)
>
It requires the join-key must include the partition key and also must be
equality-join, doesn't it?
If ptable and t1 are joined using ptable.dist = t1.foo, we can distribute
t1 for each leaf table with "WHERE hash(foo)%4 = xxx" according to
the partition bounds, indeed.

In case when some of partition leafs are pruned, it is exactly beneficial
because relevant rows to be referenced by the pruned child relations
are waste of memory.

On the other hands, it eventually consumes almost equivalent amount
of memory to load the inner relations, if no leafs are pruned, and if we
could extend the Hash-node to share the hash-table with sibling join-nodess.

> > One idea I have is, sibling HashJoin shares a hash table that was built once
> > by any of the sibling Hash plan. Right now, it is not implemented yet.
>
> Yeah, I've thought a little bit about that in the context of Parallel
> Repartition.  I'm interested in combining intra-node partitioning
> (where a single plan node repartitions data among workers on the fly)
> with inter-node partitioning (like PWJ, where partitions are handled
> by different parts of the plan, considered at planning time); you
> finish up needing to have nodes in the plan that 'receive' tuples for
> each partition, to match up with the PWJ plan structure.  That's not
> entirely unlike CTE references, and not entirely unlike your idea of
> somehow sharing the same hash table.  I ran into a number of problems
> while thinking about that, which I should write about in another
> thread.
>
Hmm. Do you intend the inner-path may have different behavior according
to the partition bounds definition where the outer-path to be joined?
Let me investigate its pros & cons.

The reasons why I think the idea of sharing the same hash table is reasonable
in this scenario are:
1. We can easily extend the idea for parallel optimization. A hash table on DSM
segment, once built, can be shared by all the siblings in all the
parallel workers.
2. We can save the memory consumption regardless of the join-keys and
partition-keys, even if these are not involved in the query.

On the other hands, below are the downside. Potentially, combined use of
your idea may help these cases:
3. Distributed inner-relation cannot be outer side of XXX OUTER JOIN.
4. Hash table contains rows to be referenced by only pruned partition leafs.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Asymmetric partition-wise JOIN

2019-08-22 Thread Kohei KaiGai

Hello,

Even though nobody has respond the thread, I tried to make a prototype of
the asymmetric partition-wise join support.
This feature tries to join non-partitioned and partitioned relation
before append.

See the example below:

create table ptable (dist int, a int, b int) partition by hash (dist);
create table ptable_p0 partition of ptable for values with (modulus 3,
remainder 0);
create table ptable_p1 partition of ptable for values with (modulus 3,
remainder 1);
create table ptable_p2 partition of ptable for values with (modulus 3,
remainder 2);
create table t1 (aid int, label text);
create table t2 (bid int, label text);
insert into ptable (select x, (1000*random())::int,
(1000*random())::int from generate_series(1,100) x);
insert into t1 (select x, md5(x::text) from generate_series(1,50) x);
insert into t2 (select x, md5(x::text) from generate_series(1,50) x);
vacuum analyze ptable;
vacuum analyze t1;
vacuum analyze t2;

ptable.a has values between 0 and 1000, and t1.aid has values between 1 and 50.
Therefore, tables join on ptable and t1 by a=aid can reduce almost 95% rows.
On the other hands, t1 is not partitioned and join-keys are not partition keys.
So, Append must process million rows first, then HashJoin processes
the rows read
from the partitioned table, and 95% of them are eventually dropped.
On the other words, 95% of jobs by Append are waste of time and CPU cycles.

postgres=# explain select * from ptable, t1 where a = aid;
  QUERY PLAN
--
 Hash Join  (cost=2.12..24658.62 rows=49950 width=49)
   Hash Cond: (ptable_p0.a = t1.aid)
   ->  Append  (cost=0.00..20407.00 rows=100 width=12)
 ->  Seq Scan on ptable_p0  (cost=0.00..5134.63 rows=333263 width=12)
 ->  Seq Scan on ptable_p1  (cost=0.00..5137.97 rows=333497 width=12)
 ->  Seq Scan on ptable_p2  (cost=0.00..5134.40 rows=333240 width=12)
   ->  Hash  (cost=1.50..1.50 rows=50 width=37)
 ->  Seq Scan on t1  (cost=0.00..1.50 rows=50 width=37)
(8 rows)

The asymmetric partitionwise join allows to join non-partitioned tables and
partitioned tables prior to Append.

postgres=# set enable_partitionwise_join = on;
SET
postgres=# explain select * from ptable, t1 where a = aid;
  QUERY PLAN
--
 Append  (cost=2.12..19912.62 rows=49950 width=49)
   ->  Hash Join  (cost=2.12..6552.96 rows=16647 width=49)
 Hash Cond: (ptable_p0.a = t1.aid)
 ->  Seq Scan on ptable_p0  (cost=0.00..5134.63 rows=333263 width=12)
 ->  Hash  (cost=1.50..1.50 rows=50 width=37)
   ->  Seq Scan on t1  (cost=0.00..1.50 rows=50 width=37)
   ->  Hash Join  (cost=2.12..6557.29 rows=16658 width=49)
 Hash Cond: (ptable_p1.a = t1.aid)
 ->  Seq Scan on ptable_p1  (cost=0.00..5137.97 rows=333497 width=12)
 ->  Hash  (cost=1.50..1.50 rows=50 width=37)
   ->  Seq Scan on t1  (cost=0.00..1.50 rows=50 width=37)
   ->  Hash Join  (cost=2.12..6552.62 rows=16645 width=49)
 Hash Cond: (ptable_p2.a = t1.aid)
 ->  Seq Scan on ptable_p2  (cost=0.00..5134.40 rows=333240 width=12)
 ->  Hash  (cost=1.50..1.50 rows=50 width=37)
   ->  Seq Scan on t1  (cost=0.00..1.50 rows=50 width=37)
(16 rows)

We can consider the table join ptable X t1 above is equivalent to:
  (ptable_p0 + ptable_p1 + ptable_p2) X t1
= (ptable_p0 X t1) + (ptable_p1 X t1) + (ptable_p2 X t1)
It returns an equivalent result, however, rows are already reduced by HashJoin
in the individual leaf of Append, so CPU-cycles consumed by Append node can
be cheaper.

On the other hands, it has a downside because t1 must be read 3 times and
hash table also must be built 3 times. It increases the expected cost,
so planner
may not choose the asymmetric partition-wise join plan.

One idea I have is, sibling HashJoin shares a hash table that was built once
by any of the sibling Hash plan. Right now, it is not implemented yet.

How about your thought for this feature?

Best regards,

2019年8月12日(月) 15:03 Kohei KaiGai :
>
> Hello,
>
> PostgreSQL optimizer right now considers join pairs on only
> non-partition - non-partition or
> partition-leaf - partition-leaf relations. On the other hands, it is
> harmless and makes sense to
> consider a join pair on non-partition - partition-leaf.
>
> See the example below. ptable is partitioned by hash, and contains 10M
> rows. ftable is not
> partitioned and contains 50 rows. Most of ptable::fkey shall not have
> matched rows in this
> join.
>
> create table ptable (fkey int, dist text) partition by hash (dist);
> create table ptable_p0 partition of ptable for values with (modulus 3,
> remainder 0);
> create table ptable_p1 partitio

Asymmetric partition-wise JOIN

2019-08-12 Thread Kohei KaiGai

Hello,

PostgreSQL optimizer right now considers join pairs on only
non-partition - non-partition or
partition-leaf - partition-leaf relations. On the other hands, it is
harmless and makes sense to
consider a join pair on non-partition - partition-leaf.

See the example below. ptable is partitioned by hash, and contains 10M
rows. ftable is not
partitioned and contains 50 rows. Most of ptable::fkey shall not have
matched rows in this
join.

create table ptable (fkey int, dist text) partition by hash (dist);
create table ptable_p0 partition of ptable for values with (modulus 3,
remainder 0);
create table ptable_p1 partition of ptable for values with (modulus 3,
remainder 1);
create table ptable_p2 partition of ptable for values with (modulus 3,
remainder 2);
insert into ptable (select x % 1, md5(x::text) from
generate_series(1,1000) x);

create table ftable (pkey int primary key, memo text);
insert into ftable (select x, 'ftable__#' || x::text from
generate_series(1,50) x);
vacuum analyze;

postgres=# explain analyze select count(*) from ptable p, ftable f
where p.fkey = f.pkey;
QUERY PLAN
---
 Aggregate  (cost=266393.38..266393.39 rows=1 width=8) (actual
time=2333.193..2333.194 rows=1 loops=1)
   ->  Hash Join  (cost=2.12..260143.38 rows=250 width=0) (actual
time=0.056..2330.079 rows=5 loops=1)
 Hash Cond: (p.fkey = f.pkey)
 ->  Append  (cost=0.00..25.00 rows=1000 width=4)
(actual time=0.012..1617.268 rows=1000 loops=1)
   ->  Seq Scan on ptable_p0 p  (cost=0.00..61101.96
rows=3332796 width=4) (actual time=0.011..351.137 rows=3332796
loops=1)
   ->  Seq Scan on ptable_p1 p_1  (cost=0.00..61106.25
rows=025 width=4) (actual time=0.005..272.925 rows=025
loops=1)
   ->  Seq Scan on ptable_p2 p_2  (cost=0.00..61126.79
rows=3334179 width=4) (actual time=0.006..416.141 rows=3334179
loops=1)
 ->  Hash  (cost=1.50..1.50 rows=50 width=4) (actual
time=0.033..0.034 rows=50 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 10kB
   ->  Seq Scan on ftable f  (cost=0.00..1.50 rows=50
width=4) (actual time=0.004..0.017 rows=50 loops=1)
 Planning Time: 0.286 ms
 Execution Time: 2333.264 ms
(12 rows)

We can manually rewrite this query as follows:

postgres=# explain analyze select count(*) from (
  select * from ptable_p0 p, ftable f where p.fkey =
f.pkey union all
  select * from ptable_p1 p, ftable f where p.fkey =
f.pkey union all
  select * from ptable_p2 p, ftable f where p.fkey = f.pkey) subqry;

Because Append does not process tuples that shall have no matched
tuples in ftable,
this query has cheaper cost and short query execution time.
(2333ms --> 1396ms)

postgres=# explain analyze select count(*) from (
  select * from ptable_p0 p, ftable f where p.fkey =
f.pkey union all
  select * from ptable_p1 p, ftable f where p.fkey =
f.pkey union all
  select * from ptable_p2 p, ftable f where p.fkey = f.pkey) subqry;
   QUERY PLAN
-
 Aggregate  (cost=210478.25..210478.26 rows=1 width=8) (actual
time=1396.024..1396.024 rows=1 loops=1)
   ->  Append  (cost=2.12..210353.14 rows=50042 width=0) (actual
time=0.058..1393.008 rows=5 loops=1)
 ->  Subquery Scan on "*SELECT* 1"  (cost=2.12..70023.66
rows=16726 width=0) (actual time=0.057..573.197 rows=16789 loops=1)
   ->  Hash Join  (cost=2.12..69856.40 rows=16726
width=72) (actual time=0.056..571.718 rows=16789 loops=1)
 Hash Cond: (p.fkey = f.pkey)
 ->  Seq Scan on ptable_p0 p  (cost=0.00..61101.96
rows=3332796 width=4) (actual time=0.009..255.791 rows=3332796
loops=1)
 ->  Hash  (cost=1.50..1.50 rows=50 width=4)
(actual time=0.034..0.035 rows=50 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 10kB
   ->  Seq Scan on ftable f  (cost=0.00..1.50
rows=50 width=4) (actual time=0.004..0.019 rows=50 loops=1)
 ->  Subquery Scan on "*SELECT* 2"  (cost=2.12..70027.43
rows=16617 width=0) (actual time=0.036..409.712 rows=16578 loops=1)
   ->  Hash Join  (cost=2.12..69861.26 rows=16617
width=72) (actual time=0.036..408.626 rows=16578 loops=1)
 Hash Cond: (p_1.fkey = f_1.pkey)
 ->  Seq Scan on ptable_p1 p_1
(cost=0.00..61106.25 rows=025 width=4) (actual time=0.005..181.422
rows=025 loops=1)
 ->  Hash  (cost=1.50..1.50 rows=50 width=4)
(actual time=0.020..0.020 rows=50 loops=1)

Re: How to retain lesser paths at add_path()?

2019-08-11 Thread Kohei KaiGai

Hello,

For implementation of the concept, this patch puts a hook on add_path
/ add_partial_path
to override the path removal decision by extensions, according to its
own viewpoint.
We don't add any metrics other than path's cost and path keys, so
set_cheapest() still picks
up paths based on its cost for each depth.
As we are currently doing, extensions (FDW / CSP) are responsible to
construct and add
paths with reasonable cost values, then PostgreSQL optimizer chooses
the "best" path
according to the (self-reported) cost. On the other hands, an
expensive path at a particular
level is not always expensive from upper viewpoint, if combination of
path-A and path-B
has special optimization, like a reduction of DMA transfer between
host and device, or omit
of network transfer between local and remote.
In these cases, extension can get a control to override a decision
whether old_path that
is dominated by new_path shall be removed, or not. If old_path
survived, extension can
re-use the path at the upper level to construct a special path.

Best regards,

2019年8月1日(木) 21:14 Kohei KaiGai :
>
> 2019年8月1日(木) 19:24 Tomas Vondra :
> >
> > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> > >2019年8月1日(木) 16:19 Richard Guo :
> > >>
> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai  wrote:
> > >>>
> > >>> 2019年8月1日(木) 1:41 Tom Lane :
> > >>> >
> > >>> > Robert Haas  writes:
> > >>> > > Yeah, but I have to admit that this whole design makes me kinda
> > >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> > >>> > > merit, it increases not only the number of paths retained but also 
> > >>> > > the
> > >>> > > cost of comparing two paths to possibly reject one of them. A few
> > >>> > > years ago, you came up with the (good) idea of rejecting some join
> > >>> > > paths before actually creating the paths, and I wonder if we ought 
> > >>> > > to
> > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, 
> > >>> > > has
> > >>> > > been saying, we ought to think about planning top-down with
> > >>> > > memoization instead of bottom up (yeah, I know that's a huge 
> > >>> > > change).
> > >>> > > It just feels like the whole idea of a list of paths ordered by cost
> > >>> > > breaks down when there are so many ways that a not-cheapest path can
> > >>> > > still be worth keeping. Not sure exactly what would be better, 
> > >>> > > though.
> > >>> >
> > >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> > >>> > know what to do instead though.  Changing to a top-down design
> > >>> > sounds like it would solve some problems while introducing others
> > >>> > (not to mention the amount of work and breakage involved).
> > >>> >
> > >>> Hmm... It looks the problem we ought to revise about path construction
> > >>> is much larger than my expectation, and uncertain for me how much works
> > >>> are needed.
> > >>>
> > >>> Although it might be a workaround until fundamental reconstruction,
> > >>> how about to have a margin of estimated cost to reject paths?
> > >>> Current add_path() immediately rejects lesser paths if its cost is
> > >>> even a little more expensive than the compared one. One the other hands,
> > >>
> > >>
> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> > >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> > >> user-controllable.
> > >>
> > >Ah, sorry, I oversight this logic...
> > >
> >
> > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> > solution, because how would you know what value is the right one? Why ould
> > 10% be the right threshold, for example? In my experience these these
> > hard-coded coefficients imply behavior that's difficult to predict and
> > explain to users.
> >
> Ah... That's exactly hard task to explain to users.
>
> > >>> I understand it is not an essential re-design of path-construction 
> > >>> logic, and
> > >>> may have limitation. However, amount of works are reasonable and no 
> > >>> side-
> > >&g

Re: How to retain lesser paths at add_path()?

2019-08-01 Thread Kohei KaiGai

2019年8月1日(木) 19:24 Tomas Vondra :
>
> On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> >2019年8月1日(木) 16:19 Richard Guo :
> >>
> >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai  wrote:
> >>>
> >>> 2019年8月1日(木) 1:41 Tom Lane :
> >>> >
> >>> > Robert Haas  writes:
> >>> > > Yeah, but I have to admit that this whole design makes me kinda
> >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> >>> > > merit, it increases not only the number of paths retained but also the
> >>> > > cost of comparing two paths to possibly reject one of them. A few
> >>> > > years ago, you came up with the (good) idea of rejecting some join
> >>> > > paths before actually creating the paths, and I wonder if we ought to
> >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> >>> > > been saying, we ought to think about planning top-down with
> >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> >>> > > It just feels like the whole idea of a list of paths ordered by cost
> >>> > > breaks down when there are so many ways that a not-cheapest path can
> >>> > > still be worth keeping. Not sure exactly what would be better, though.
> >>> >
> >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> >>> > know what to do instead though.  Changing to a top-down design
> >>> > sounds like it would solve some problems while introducing others
> >>> > (not to mention the amount of work and breakage involved).
> >>> >
> >>> Hmm... It looks the problem we ought to revise about path construction
> >>> is much larger than my expectation, and uncertain for me how much works
> >>> are needed.
> >>>
> >>> Although it might be a workaround until fundamental reconstruction,
> >>> how about to have a margin of estimated cost to reject paths?
> >>> Current add_path() immediately rejects lesser paths if its cost is
> >>> even a little more expensive than the compared one. One the other hands,
> >>
> >>
> >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> >> user-controllable.
> >>
> >Ah, sorry, I oversight this logic...
> >
>
> FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> solution, because how would you know what value is the right one? Why ould
> 10% be the right threshold, for example? In my experience these these
> hard-coded coefficients imply behavior that's difficult to predict and
> explain to users.
>
Ah... That's exactly hard task to explain to users.

> >>> I understand it is not an essential re-design of path-construction logic, 
> >>> and
> >>> may have limitation. However, amount of works are reasonable and no side-
> >>> effect. (current behavior = 0% threshold).
> >>> How about your opinions?
> >>>
> >>
> >> How's about Tom's suggestion on adding another dimension in add_path()
> >> to be considered, just like how it considers paths of better sort order
> >> or parallel-safe?
> >>
> >Robert also mentioned it makes comparison operation more complicated.
> >If we try to have another dimension here, a callback function in Path node
> >may be able to tell the core optimizer whether "dominated path" shall be
> >dropped or not, without further complexity. It is just an idea.
> >
>
> I think adding a hook to add_path() allowing to override the decidion
> should be OK. The chance of getting that committed in the near future
> seems much higher than for a patch that completely reworks add_path().
>
> There's one caveat, though - AFAICS various places in the planner use
> things like cheapest_total_path, cheapest_startup_path and even
> get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> considers startup/total cost. It might happen that even after keeping
> additional paths, the planner still won't use them :-(
>
Even if existing code looks at only cheapest_xxx_path, I don't think it is
a problematic behavior because these paths are exactly cheapest at a level,
but they may use more expensive paths in the deeper level.
If a hook can prevent dropping a path, not cheapest, in a particular level,
this path shall not appear on the cheapest_

Re: How to retain lesser paths at add_path()?

2019-08-01 Thread Kohei KaiGai

2019年8月1日(木) 16:19 Richard Guo :
>
> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai  wrote:
>>
>> 2019年8月1日(木) 1:41 Tom Lane :
>> >
>> > Robert Haas  writes:
>> > > Yeah, but I have to admit that this whole design makes me kinda
>> > > uncomfortable.  Every time somebody comes up with a new figure of
>> > > merit, it increases not only the number of paths retained but also the
>> > > cost of comparing two paths to possibly reject one of them. A few
>> > > years ago, you came up with the (good) idea of rejecting some join
>> > > paths before actually creating the paths, and I wonder if we ought to
>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>> > > been saying, we ought to think about planning top-down with
>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>> > > It just feels like the whole idea of a list of paths ordered by cost
>> > > breaks down when there are so many ways that a not-cheapest path can
>> > > still be worth keeping. Not sure exactly what would be better, though.
>> >
>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>> > know what to do instead though.  Changing to a top-down design
>> > sounds like it would solve some problems while introducing others
>> > (not to mention the amount of work and breakage involved).
>> >
>> Hmm... It looks the problem we ought to revise about path construction
>> is much larger than my expectation, and uncertain for me how much works
>> are needed.
>>
>> Although it might be a workaround until fundamental reconstruction,
>> how about to have a margin of estimated cost to reject paths?
>> Current add_path() immediately rejects lesser paths if its cost is
>> even a little more expensive than the compared one. One the other hands,
>
>
> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> costs of two paths, although the fuzz factor (1%) is hard coded and not
> user-controllable.
>
Ah, sorry, I oversight this logic...

>> I understand it is not an essential re-design of path-construction logic, and
>> may have limitation. However, amount of works are reasonable and no side-
>> effect. (current behavior = 0% threshold).
>> How about your opinions?
>>
>
> How's about Tom's suggestion on adding another dimension in add_path()
> to be considered, just like how it considers paths of better sort order
> or parallel-safe?
>
Robert also mentioned it makes comparison operation more complicated.
If we try to have another dimension here, a callback function in Path node
may be able to tell the core optimizer whether "dominated path" shall be
dropped or not, without further complexity. It is just an idea.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: How to retain lesser paths at add_path()?

2019-08-01 Thread Kohei KaiGai

2019年8月1日(木) 1:41 Tom Lane :
>
> Robert Haas  writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,
a little cost difference may turn back final optimizer decision in some cases.
For example, if we retain lesser paths that have less then 10% expensive
cost than the current cheapest path in the same level, we may be able to
keep the number of lesser paths retained and still use simple cost comparison
for path survive decision.

I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

How to retain lesser paths at add_path()?

2019-07-30 Thread Kohei KaiGai

Hello,

When we add a new path using add_path(), it checks estimated cost and path-keys,
then it also removes dominated paths, if any.
Do we have a reasonable way to retain these "dominated" paths? Once it
is considered
lesser paths at a level, however, it may have a combined cheaper cost
with upper pathnode.

In my case, PG-Strom adds CustomPath to support JOIN/GROUP BY
workloads that utilizes
GPU and NVME storage. If GpuPreAgg and GpuJoin are executed
continuously, output buffer
of GpuJoin simultaneously performs as input buffer of GpuPreAgg on GPU
device memory.
So, it allows to avoid senseless DMA transfer between GPU and CPU/RAM.
This behavior
affects cost estimation during path construction steps - GpuPreAgg
discount DMA cost if its
input path is GpuJoin.
On the other hands, it looks to me add_path() does not consider upper
level optimization
other than sorting path-keys. As long as we can keep these "lesser"
pathnodes that has
further optimization chance, it will help making more reasonable query plan.

Do we have any reasonable way to retain these paths at add_path() even
if it is dominated
by other paths? Any idea welcome.

Best regards,

[*] GpuJoin + GpuPreAgg combined GPU kernel
https://www.slideshare.net/kaigai/20181016pgconfeussd2gpumulti/13
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: How to know referenced sub-fields of a composite type?

2019-05-30 Thread Kohei KaiGai

2019/05/30 16:33、Amit Langote のメール:

>> On 2019/05/29 15:50, Kohei KaiGai wrote:
>> 2019年5月29日(水) 13:26 Amit Langote :
>>>> It means we can skip to load the sub-fields unreferenced, if
>>>> query-planner can handle
>>>> referenced and unreferenced sub-fields correctly.
>>>> On the other hands, it looks to me RelOptInfo or other optimizer
>>>> related structure don't have
>>>> this kind of information. RelOptInfo->attr_needed tells extension
>>>> which attributes are referenced
>>>> by other relation, however, its granularity is not sufficient for 
>>>> sub-fields.
>>> 
>>> Isn't that true for some other cases as well, like when a query accesses
>>> only some sub-fields of a json(b) column?  In that case too, planner
>>> itself can't optimize away access to other sub-fields.  What it can do
>>> though is match a suitable index to the operator used to access the
>>> individual sub-fields, so that the index (if one is matched and chosen)
>>> can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
>>> that the optimizer leaves it up to the indexes (and plan nodes) to further
>>> optimize access to within a field.  How is this case any different?
>> 
>> I think it is a little bit different scenario.
>> Even if an index on sub-fields can indicate the tuples to be fetched,
>> the fetched tuple contains all the sub-fields because heaptuple is
>> row-oriented data.
>> 
>> For example, if WHERE-clause checks a sub-field: "x" then aggregate
>> function references other sub-field "y", Scan/Join node has to return
>> a tuple that contains both "x" and "y". IndexScan also pops up a tuple
>> with a full composite type, so here is no problem if we cannot know
>> which sub-fields are referenced in the later stage.
>> Maybe, if IndexOnlyScan supports to return a partial composite type,
>> it needs similar infrastructure that can be used for a better composite
>> type support on columnar storage.
> 
> Ah indeed.  I think I had misunderstood your intent.  Indexes have to do
> with optimizing the "filtering" of complex/nested type (json, Arrow
> Struct, etc.) values, where unnecessary sub-fields need not be read before
> filtering, whereas you're interested in optimizing "projections" of
> complex types, where sub-fields that are not used anywhere in the query
> need not be read from the stored values.
> 
>>>> Probably, all we can do right now is walk-on the RelOptInfo list to
>>>> lookup FieldSelect node
>>>> to see the referenced sub-fields. Do we have a good idea instead of
>>>> this expensive way?
>>>> # Right now, PG-Strom loads all the sub-fields of Struct column from
>>>> arrow_fdw foreign-table
>>>> # regardless of referenced / unreferenced sub-fields. Just a second best.
>>> 
>>> I'm missing something, but if PG-Strom/arrow_fdw does look at the
>>> FieldSelect nodes to see which sub-fields are referenced, why doesn't it
>>> generate a plan that will only access those sub-fields or why can't it?
>>> 
>> Likely, it is not a technical problem but not a smart implementation.
>> If I missed some existing infrastructure we can apply, it may be more 
>> suitable
>> than query/expression tree walking.
> 
> There is no infrastructure for this as far as I know.  Maybe, some will be
> built in the future now that storage format is pluggable.

If we design a common infrastructure for both of built-in and extension 
features, it makes sense for the kinds of storage system.
IndexOnlyScan is one of the built-in feature that is beneficial by the 
information of projection. Currently, we always don’t choose IndexOnlyScan if 
index is on sub-field of composite.

 Best regards,

Re: How to know referenced sub-fields of a composite type?

2019-05-29 Thread Kohei KaiGai

Hi Amit,

2019年5月29日(水) 13:26 Amit Langote :
>
> Kaigai-san,
>
> On 2019/05/29 12:13, Kohei KaiGai wrote:
> > One interesting data type in Apache Arrow is "Struct" data type. It is
> > equivalent to composite
> > type in PostgreSQL. The "Struct" type has sub-fields, and individual
> > sub-fields have its own
> > values array for each.
> >
> > It means we can skip to load the sub-fields unreferenced, if
> > query-planner can handle
> > referenced and unreferenced sub-fields correctly.
> > On the other hands, it looks to me RelOptInfo or other optimizer
> > related structure don't have
> > this kind of information. RelOptInfo->attr_needed tells extension
> > which attributes are referenced
> > by other relation, however, its granularity is not sufficient for 
> > sub-fields.
>
> Isn't that true for some other cases as well, like when a query accesses
> only some sub-fields of a json(b) column?  In that case too, planner
> itself can't optimize away access to other sub-fields.  What it can do
> though is match a suitable index to the operator used to access the
> individual sub-fields, so that the index (if one is matched and chosen)
> can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
> that the optimizer leaves it up to the indexes (and plan nodes) to further
> optimize access to within a field.  How is this case any different?
>
I think it is a little bit different scenario.
Even if an index on sub-fields can indicate the tuples to be fetched,
the fetched tuple contains all the sub-fields because heaptuple is
row-oriented data.
For example, if WHERE-clause checks a sub-field: "x" then aggregate
function references other sub-field "y", Scan/Join node has to return
a tuple that contains both "x" and "y". IndexScan also pops up a tuple
with a full composite type, so here is no problem if we cannot know
which sub-fields are referenced in the later stage.
Maybe, if IndexOnlyScan supports to return a partial composite type,
it needs similar infrastructure that can be used for a better composite
type support on columnar storage.

> > Probably, all we can do right now is walk-on the RelOptInfo list to
> > lookup FieldSelect node
> > to see the referenced sub-fields. Do we have a good idea instead of
> > this expensive way?
> > # Right now, PG-Strom loads all the sub-fields of Struct column from
> > arrow_fdw foreign-table
> > # regardless of referenced / unreferenced sub-fields. Just a second best.
>
> I'm missing something, but if PG-Strom/arrow_fdw does look at the
> FieldSelect nodes to see which sub-fields are referenced, why doesn't it
> generate a plan that will only access those sub-fields or why can't it?
>
Likely, it is not a technical problem but not a smart implementation.
If I missed some existing infrastructure we can apply, it may be more suitable
than query/expression tree walking.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

How to know referenced sub-fields of a composite type?

2019-05-28 Thread Kohei KaiGai

Hello,

A recent revision of PG-Strom has its columnar-storage using Apache
Arrow format files on
FDW infrastructure. Because of the columnar nature, it allows to load
the values which are
referenced by the query, thus, maximizes efficiency of the storage bandwidth.
http://heterodb.github.io/pg-strom/arrow_fdw/

Apache Arrow defines various primitive types that can be mapped on
PostgreSQL data types.
For example, FloatingPoint (precision=Single) on Arrow is equivalent
to float4 of PostgreSQL.
One interesting data type in Apache Arrow is "Struct" data type. It is
equivalent to composite
type in PostgreSQL. The "Struct" type has sub-fields, and individual
sub-fields have its own
values array for each.

It means we can skip to load the sub-fields unreferenced, if
query-planner can handle
referenced and unreferenced sub-fields correctly.
On the other hands, it looks to me RelOptInfo or other optimizer
related structure don't have
this kind of information. RelOptInfo->attr_needed tells extension
which attributes are referenced
by other relation, however, its granularity is not sufficient for sub-fields.

Probably, all we can do right now is walk-on the RelOptInfo list to
lookup FieldSelect node
to see the referenced sub-fields. Do we have a good idea instead of
this expensive way?
# Right now, PG-Strom loads all the sub-fields of Struct column from
arrow_fdw foreign-table
# regardless of referenced / unreferenced sub-fields. Just a second best.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2019-02-05 Thread Kohei KaiGai

Hello,

Let me remind the thread again.
I'm waiting for the fix getting committed for a month...

2019年1月22日(火) 20:50 Kohei KaiGai :
>
> Let me remind the thread.
> If no more comments, objections, or better ideas, please commit this fix.
>
> Thanks,
>
> 2019年1月17日(木) 18:29 Kyotaro HORIGUCHI :
> >
> > Hello, sorry for the absence.
> >
> > At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas  
> > wrote in 
> > 
> > > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai  wrote:
> > > > 2019年1月11日(金) 5:52 Robert Haas :
> > > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  
> > > > > wrote:
> > > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > > > front of the generate_gather_paths?
> > > > > > If we have no use case for the second hook, here is little necessity
> > > > > > to have the post_rel_pathlist_hook() here.
> > > > > > (At least, PG-Strom will use the first hook only.)
> > > > >
> > > > > +1.  That seems like the best way to be consistent with the principle
> > > > > that we need to have all the partial paths before generating any
> > > > > Gather paths.
> > > > >
> > > > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > > > Please check it.
> > >
> > > Seems reasonable to me.
> >
> > Also seems reasonable to me.  The extension can call
> > generate_gather_paths redundantly as is but it almost doesn't
> > harm, so it is acceptable even in a minor release.
> >
> > regards.
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> >
>
>
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei 

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch
Description: Binary data


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch
Description: Binary data

[ANN] pg2arrow

2019-01-27 Thread Kohei KaiGai

Hello,

I made a utility program to dump PostgreSQL database in Apache Arrow format.

Apache Arrow is a kind of data format for columnar-based structured
data; actively
developed by Spark and comprehensive communities.
It is suitable data representation for static and read-only but large
number of rows.
Many of data analytics tools support Apache Arrow as a common data
exchange format.
See, https://arrow.apache.org/

* pg2arrow
https://github.com/heterodb/pg2arrow

usage:
$ ./pg2arrow -h localhost postgres -c 'SELECT * FROM hogehoge LIMIT
1' -o /tmp/hogehoge.arrow
--> fetch results of the query, then write out "/tmp/hogehoge"
$ ./pg2arrow --dump /tmp/hogehoge
--> shows schema definition of the "/tmp/hogehoge"

$ python
>>> import pyarrow as pa
>>> X = pa.RecordBatchFileReader("/tmp/hogehoge").read_all()
>>> X.schema
id: int32
a: int64
b: double
c: struct
  child 0, x: int32
  child 1, y: double
  child 2, z: decimal(30, 11)
  child 3, memo: string
d: string
e: double
ymd: date32[day]

--> read the Apache Arrow file using PyArrow, then shows its schema definition.


It is also a groundwork for my current development - arrow_fdw; which
allows to scan
on the configured Apache Arrow file(s) as like regular PostgreSQL table.
I expect integration of the arrow_fdw support with SSD2GPU Direct SQL
of PG-Strom
can pull out maximum capability of the latest hardware (NVME and GPU).
Likely, it is an ideal configuration for log-data processing generated
by many sensors.

Please check it.
Comments, ideas, bug-reports, and other feedbacks are welcome.

As an aside, NVIDIA announced their RAPIDS framework; to exchange data frames
on GPU among multiple ML/Analytics solutions. It also uses Apache
Arrow as a common
format for data exchange, and this is also our groundwork for them.
https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2019-01-22 Thread Kohei KaiGai

Let me remind the thread.
If no more comments, objections, or better ideas, please commit this fix.

Thanks,

2019年1月17日(木) 18:29 Kyotaro HORIGUCHI :
>
> Hello, sorry for the absence.
>
> At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas  wrote 
> in 
> > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai  wrote:
> > > 2019年1月11日(金) 5:52 Robert Haas :
> > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  
> > > > wrote:
> > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > > front of the generate_gather_paths?
> > > > > If we have no use case for the second hook, here is little necessity
> > > > > to have the post_rel_pathlist_hook() here.
> > > > > (At least, PG-Strom will use the first hook only.)
> > > >
> > > > +1.  That seems like the best way to be consistent with the principle
> > > > that we need to have all the partial paths before generating any
> > > > Gather paths.
> > > >
> > > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > > Please check it.
> >
> > Seems reasonable to me.
>
> Also seems reasonable to me.  The extension can call
> generate_gather_paths redundantly as is but it almost doesn't
> harm, so it is acceptable even in a minor release.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2019-01-10 Thread Kohei KaiGai

2019年1月11日(金) 5:52 Robert Haas :
>
> On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  wrote:
> > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > front of the generate_gather_paths?
> > If we have no use case for the second hook, here is little necessity
> > to have the post_rel_pathlist_hook() here.
> > (At least, PG-Strom will use the first hook only.)
>
> +1.  That seems like the best way to be consistent with the principle
> that we need to have all the partial paths before generating any
> Gather paths.
>
Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch
Description: Binary data


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch
Description: Binary data

Re: add_partial_path() may remove dominated path but still in use

2019-01-08 Thread Kohei KaiGai

2019年1月9日(水) 13:18 Kyotaro HORIGUCHI :
>
> At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai  wrote 
> in 
> > 2018年12月30日(日) 4:12 Tom Lane :
> > On the other hands, the later hook must be dedicated to add regular paths,
> > and also provides a chance for extensions to manipulate pre-built path-list
> > including Gather-path.
> > As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
> > a particular path-node, including Gather-node, by manipulation of the cost
> > value. Horiguchi-san, is it right?
>
> Mmm. I haven't expected that it is mentioned here.
>
> Actually in the hook, it changes enable_* planner variables, or
> directory manipuraltes path costs or even can clear and
> regenerate the path list and gather paths for the parallel
> case. It will be happy if we had a chance to manpurate partial
> paths before genrating gahter paths.
>
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2019-01-03 Thread Kohei KaiGai

I tried to make a patch to have dual hooks at set_rel_pathlist(), and
adjusted PG-Strom for the new design. It stopped to create GatherPath
by itself, just added a partial path for the base relation.
It successfully made a plan using parallel custom-scan node, without
system crash.

As I mentioned above, it does not use the new "post_rel_pathlist_hook"
because we can add both of partial/regular path-node at the first hook
with no particular problems.

Thanks,

dbt3=# explain select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
and l_discount between 0.08 - 0.01 and 0.08 + 0.01
and l_quantity < 24 limit 1;
  QUERY PLAN

 Limit  (cost=144332.62..144332.63 rows=1 width=4)
   ->  Aggregate  (cost=144332.62..144332.63 rows=1 width=4)
 ->  Gather  (cost=144285.70..144329.56 rows=408 width=4)
   Workers Planned: 2
   ->  Parallel Custom Scan (GpuPreAgg) on lineitem
(cost=143285.70..143288.76 rows=204 width=4)
 Reduction: NoGroup
 Outer Scan: lineitem  (cost=1666.67..143246.16
rows=63254 width=8)
 Outer Scan Filter: ((l_discount >= '0.07'::double
precision) AND
   (l_discount <=
'0.09'::double precision) AND
   (l_quantity <
'24'::double precision) AND
   (l_shipdate <
'1995-01-01'::date) AND
   (l_shipdate >=
'1994-01-01'::date))
(8 rows)

Thanks,

2019年1月2日(水) 22:34 Kohei KaiGai :
>
> 2018年12月31日(月) 22:25 Amit Kapila :
> >
> > On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai  wrote:
> > >
> > > 2018年12月31日(月) 13:10 Amit Kapila :
> > > >
> > > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai  
> > > > wrote:
> > > > > 2018年12月30日(日) 4:12 Tom Lane :
> > > > > >
> > > > > > Kohei KaiGai  writes:
> > > > > > > 2018年12月29日(土) 1:44 Tom Lane :
> > > > > > >> However, first I'd like to know why this situation is arising in 
> > > > > > >> the first
> > > > > > >> place.  To have the situation you're describing, we'd have to 
> > > > > > >> have
> > > > > > >> attempted to make some Gather paths before we have all the 
> > > > > > >> partial paths
> > > > > > >> for the relation they're for.  Why is that a good thing to do?  
> > > > > > >> It seems
> > > > > > >> like such Gathers are necessarily being made with incomplete 
> > > > > > >> information,
> > > > > > >> and we'd be better off to fix things so that none are made till 
> > > > > > >> later.
> > > > > >
> > > > > > > Because of the hook location, Gather-node shall be constructed 
> > > > > > > with built-in
> > > > > > > and foreign partial scan node first, then extension gets a chance 
> > > > > > > to add its
> > > > > > > custom paths (partial and full).
> > > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked 
> > > > > > > next to the
> > > > > > > generate_gather_paths().
> > > > > >
> > > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > > in which extensions are allowed to add partial paths, and that
> > > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > > >
> > > > +1.  This idea sounds sensible to me.
> > > >
> > > > > >
> > > > > I have almost same opinion, but the first hook does not need to be
> > > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we 
> > > > > can
> > > > > add both of partial and regular paths here, then 
> > > > > generate_gather_paths()
> > > > > may generate a Gather-path on top of the best partial-path.
> > > > >
> > > >
> > > > Won't it be confusing for users if we allow both partial and full
> > > > paths in first hook and only full paths

Re: add_partial_path() may remove dominated path but still in use

2019-01-02 Thread Kohei KaiGai

2018年12月31日(月) 22:25 Amit Kapila :
>
> On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai  wrote:
> >
> > 2018年12月31日(月) 13:10 Amit Kapila :
> > >
> > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai  wrote:
> > > > 2018年12月30日(日) 4:12 Tom Lane :
> > > > >
> > > > > Kohei KaiGai  writes:
> > > > > > 2018年12月29日(土) 1:44 Tom Lane :
> > > > > >> However, first I'd like to know why this situation is arising in 
> > > > > >> the first
> > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > >> attempted to make some Gather paths before we have all the partial 
> > > > > >> paths
> > > > > >> for the relation they're for.  Why is that a good thing to do?  It 
> > > > > >> seems
> > > > > >> like such Gathers are necessarily being made with incomplete 
> > > > > >> information,
> > > > > >> and we'd be better off to fix things so that none are made till 
> > > > > >> later.
> > > > >
> > > > > > Because of the hook location, Gather-node shall be constructed with 
> > > > > > built-in
> > > > > > and foreign partial scan node first, then extension gets a chance 
> > > > > > to add its
> > > > > > custom paths (partial and full).
> > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next 
> > > > > > to the
> > > > > > generate_gather_paths().
> > > > >
> > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > in which extensions are allowed to add partial paths, and that
> > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > >
> > > +1.  This idea sounds sensible to me.
> > >
> > > > >
> > > > I have almost same opinion, but the first hook does not need to be
> > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we 
> > > > can
> > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > may generate a Gather-path on top of the best partial-path.
> > > >
> > >
> > > Won't it be confusing for users if we allow both partial and full
> > > paths in first hook and only full paths in the second hook?
> > > Basically, in many cases, the second hook won't be of much use.  What
> > > advantage you are seeing in allowing both partial and full paths in
> > > the first hook?
> > >
> > Two advantages. The first one is, it follows same manner of
> > set_foreign_pathlist()
> > which allows to add both of full and partial path if FDW supports 
> > parallel-scan.
> > The second one is practical. During the path construction, extension needs 
> > to
> > check availability to run (e.g, whether operators in WHERE-clause is 
> > supported
> > on GPU device...), calculate its estimated cost and so on. Not a small
> > portion of
> > them are common for both of full and partial path. So, if the first
> > hook accepts to
> > add both kind of paths at once, extension can share the common properties.
> >
>
> You have a point, though I am not sure how much difference it can
> create for cost computation as ideally, both will have different
> costing model.  I understand there are some savings by avoiding some
> common work, is there any way to cache the required information?
>
I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.

> > Probably, the second hook is only used for a few corner case where an 
> > extension
> > wants to manipulate path-list already built, like pg_hint_plan.
> >
>
> Okay, but it could be some work for extension authors who are using
> the current hook, not sure they would like to divide the work between
> first and second hook.
>
I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2018-12-29 Thread Kohei KaiGai

2018年12月30日(日) 4:12 Tom Lane :
>
> Kohei KaiGai  writes:
> > 2018年12月29日(土) 1:44 Tom Lane :
> >> However, first I'd like to know why this situation is arising in the first
> >> place.  To have the situation you're describing, we'd have to have
> >> attempted to make some Gather paths before we have all the partial paths
> >> for the relation they're for.  Why is that a good thing to do?  It seems
> >> like such Gathers are necessarily being made with incomplete information,
> >> and we'd be better off to fix things so that none are made till later.
>
> > Because of the hook location, Gather-node shall be constructed with built-in
> > and foreign partial scan node first, then extension gets a chance to add its
> > custom paths (partial and full).
> > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > generate_gather_paths().
>
> Hmm.  I'm inclined to think that we should have a separate hook
> in which extensions are allowed to add partial paths, and that
> set_rel_pathlist_hook should only be allowed to add regular paths.
>
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.

On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?
Likely, this kind of extension needs to use the later hook.

I expect these hooks are located as follows:

set_rel_pathlist(...)
{
:

:
/* for partial / regular paths */
if (set_rel_pathlist_hook)
  (*set_rel_pathlist_hook) (root, rel, rti, rte);
/* generate Gather-node */
if (rel->reloptkind == RELOPT_BASEREL)
generate_gather_paths(root, rel);
/* for regular paths and manipulation */
if (post_rel_pathlist_hook)
  (*post_rel_pathlist_hook) (root, rel, rti, rte);

set_cheapest();
}

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: add_partial_path() may remove dominated path but still in use

2018-12-28 Thread Kohei KaiGai

2018年12月29日(土) 1:44 Tom Lane :
>
> Kohei KaiGai  writes:
> > I've investigated a crash report of PG-Strom for a few days, then I doubt
> > add_partial_path() can unexpectedly release dominated old partial path
> > but still referenced by other Gather node, and it leads unexpected system
> > crash.
>
> Hm.  This seems comparable to the special case in plain add_path, where it
> doesn't attempt to free IndexPaths because of the risk that they're still
> referenced.  So maybe we should just drop the pfree here.
>
> However, first I'd like to know why this situation is arising in the first
> place.  To have the situation you're describing, we'd have to have
> attempted to make some Gather paths before we have all the partial paths
> for the relation they're for.  Why is that a good thing to do?  It seems
> like such Gathers are necessarily being made with incomplete information,
> and we'd be better off to fix things so that none are made till later.
>
Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths(). Even if extension adds some partial paths later,
the first generate_gather_paths() has to generate Gather node based on
incomplete information.
If we could ensure Gather node shall be made after all the partial nodes
are added, it may be a solution for the problem.

Of course, relocation of the hook may have a side-effect. Anyone may
expect the pathlist contains all the path-node including Gather node, for
editorialization of the pathlist.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

add_partial_path() may remove dominated path but still in use

2018-12-27 Thread Kohei KaiGai

Hello,

I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.

Please check at the gpuscan.c:373
https://github.com/heterodb/pg-strom/blob/master/src/gpuscan.c#L373

The create_gpuscan_path() constructs a partial custom-path, then it is
added to the partial_pathlist of the baserel.
If both of old and new partial paths have no pathkeys and old-path has
larger cost, add_partial_path detaches the old path from the list, then
calls pfree() to release old_path itself.

On the other hands, the baserel may have GatherPath which references
the partial-path on its pathlist. Here is no check whether the old partial-
paths are referenced by others, or not.

To ensure my assumption, I tried to inject elog() before/after the call of
add_partial_path() and just before the pfree(old_path) in add_partial_path().


dbt3=# explain select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
and l_discount between 0.08 - 0.01 and 0.08 + 0.01
and l_quantity < 24 limit 1;
INFO:  GpuScan:389  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{PATH :pathtype 18 :parent_relids (b 1) :required_outer (b)
:parallel_aware true :parallel_safe true :parallel_workers 2 :rows
63254 :startup_cost 0.00 :total_cost 325579.73 :pathkeys <>}
:single_copy false :num_workers 2}
INFO:  add_partial_path:830 old_path(0x28f3f88) is removed
WARNING:  could not dump unrecognized node type: 2139062143
INFO:  GpuScan:401  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{(HOGE)} :single_copy false :num_workers 2}


At the L389, GatherPath in the baresel->pathlist is healthy. Its
subpath (0x28f3f88) is
a valid T_Scan path node.
Then, gpuscan.c adds a cheaper path-node so add_partial_path()
considers the above
subpath (0x28f3f88) is dominated by the new custom-path, and release it.
So, elog() at L401 says subpath has unrecognized node type: 2139062143
== 0x7f7f7f7f
that implies the memory region was already released by pfree().

Reference counter or other mechanism to tack referenced paths may be an idea
to avoid unintentional release of path-node.
On the other hands, it seems to me the pfree() at add_path /
add_partial_path is not
a serious memory management because other objects referenced by the path-node
are not released here.
It is sufficient if we detach dominated path-node from the pathlist /
partial_pathlist.

How about your opinions?

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: lbound1 default in buildint2vector/buildoidvector

2018-11-15 Thread Kohei KaiGai

2018年11月15日(木) 12:41 Tom Lane :
>
> Kohei KaiGai  writes:
> > I noticed buildint2vector / buildoidvector assigns lbound1=0 as default
> > value, but array type shall have lbound1=1 in the default.
> > Is there some reasons for the difference?
>
> Backwards compatibility.
>
Ah, yes, I got the point.

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

lbound1 default in buildint2vector/buildoidvector

2018-11-14 Thread Kohei KaiGai

Hello,

I noticed buildint2vector / buildoidvector assigns lbound1=0 as default
value, but array type shall have lbound1=1 in the default.
Is there some reasons for the difference?

When I made a simple C-function that returns result of int2vector which
carries attribute numbers of the argument.

postgres=# select attnums_of('t0','{aid,cid,bid}');
  attnums_of
---
 [0:2]={3,5,4}
(1 row)

Once it assigns lbound1=1 manually,

postgres=# select attnums_of('t0','{aid,cid,bid}');
 attnums_of

 {3,5,4}
(1 row)

Maybe, the later one is natural.
Of course, these APIs are mostly internal, so lbound1 setting is not
significant so much.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [report] memory leaks in COPY FROM on partitioned table

2018-08-05 Thread Kohei KaiGai

2018-08-06 1:50 GMT+09:00 Alvaro Herrera :
>> Now, it consumed about 60MB rss at the beginning of COPY FROM, and it
>> grows up very slowly during the COPY FROM execution, then grew up to
>> 250MB before completion.
>> We may have another memory blocks which are not released during
>> execution, however,
>> I could not identify whether it is really memory leaking or not, and
>> who's jobs.
>
> Most likely, this is a different memory leak.
>
> I sugges that one way to track this down is first figure out *which*
> context is the one growing, which you can see by running
> MemoryContextStats a few times and noting for meaningful differences.
> Then we can try to narrow down what is allocating stuff in that context.
>
Yes, but the hardest is identification of which memory context is growing
up time by time. Once we could identify, MemoryContextStats() tells us
the name of problematic context and details.

Of course, above my observation is just based on rss usage of postgresql.
It can increase physical page allocation by page fault on the virtual address
space correctly allocated.

>> It may be an idea to put a debug code that raises a notice when
>> MemoryContext allocates more than the threshold.
>
> I don't think this is really practical, because whatever the threshold
> we set, there would be some corner-case scenario where the threshold is
> legitimately crossed.  And some memory leak scenarios that don't cross
> any thresholds.
>
I assume this threshold is configurable by GUC, and disabled on the default.
Once a user found suspicious memory usage increase, we can set a threshold
value. In above case, we may be able to see something around 120MB threshold.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [report] memory leaks in COPY FROM on partitioned table

2018-08-04 Thread Kohei KaiGai

2018-08-03 12:38 GMT+09:00 Alvaro Herrera :
> On 2018-Aug-03, Kohei KaiGai wrote:
>
>> 2018-08-02 5:38 GMT+09:00 Alvaro Herrera :
>> > On 2018-Aug-01, Alvaro Herrera wrote:
>> >
>> >> Right, makes sense.  Pushed that way.
>> >
>> > KaiGai, if you can please confirm that the pushed change fixes your test
>> > case, I'd appreciate it.
>>
>> Can you wait for a few days? I can drop the test dataset and reuse the 
>> storage
>> once benchmark test is over
>
> Of course -- take your time.
>
I could load the same data (544GB csv, 789GB heap) using COPY FROM successfully.
When I reported the problem, rss usage of postgresql process increased
about 10MB/s ratio,
then OOM killer eliminated after a few hours.

Now, it consumed about 60MB rss at the beginning of COPY FROM, and it grows up
very slowly during the COPY FROM execution, then grew up to 250MB
before completion.
We may have another memory blocks which are not released during
execution, however,
I could not identify whether it is really memory leaking or not, and who's jobs.

It may be an idea to put a debug code that raises a notice when
MemoryContext allocates
more than the threshold.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [report] memory leaks in COPY FROM on partitioned table

2018-08-02 Thread Kohei KaiGai

2018-08-02 5:38 GMT+09:00 Alvaro Herrera :
> On 2018-Aug-01, Alvaro Herrera wrote:
>
>> Right, makes sense.  Pushed that way.
>
> KaiGai, if you can please confirm that the pushed change fixes your test
> case, I'd appreciate it.
>
Can you wait for a few days? I can drop the test dataset and reuse the storage
once benchmark test is over
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: [report] memory leaks in COPY FROM on partitioned table

2018-07-23 Thread Kohei KaiGai

Further investigation I did:

CopyFrom() calls ExecFindPartition() to identify the destination child
table of partitioned table.
Then, it internally calls get_partition_for_tuple() to get partition
index according to the key value.
This invocation is not under the per-tuple context.

In case of hash-partitioning, get_partition_for_tuple() calls
hash-function of key data type; which is hash_numeric in my case.
The hash_numeric fetches the key value using PG_GETARG_NUMERIC(0). It
internally calls pg_detoast_datum() which may allocate new memory if
varlena datum is not uncompressed long (32bit) format.

Once this patch attached, PostgreSQL backend process has been working
with about 130MB memory consumption for 20min right now (not finished
yet...)
Before the patch applied, its memory consumption grows up about
10BM/sec, then terminated a few hours later.

P.S,
I think do_convert_tuple() in ExecFindPartition() and
ConvertPartitionTupleSlot() may also allocate memory out of the
per-tuple context, however, I could not confirmed yet, because my test
case has TupleConversionMap == NULL.

Thanks,

2018-07-24 10:43 GMT+09:00 Michael Paquier :
> On Tue, Jul 24, 2018 at 09:41:52AM +0900, Kohei KaiGai wrote:
>> In PG11beta2, my backend process gets terminated during COPY FROM of
>> large text file (544GB) on partitioned table.
>> The kernel log says OOM Killer send SIGKILL due to memory pressure.
>> In fact, 63GB of system physical 64GB was consumed by the PostgreSQL
>> backend just before the termination.
>
> Hmm..  That's not nice.  Let's add an open item.
> --
> Michael

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

fix-leak-copyfrom.patch
Description: Binary data

[report] memory leaks in COPY FROM on partitioned table

2018-07-23 Thread Kohei KaiGai

Hello,

In PG11beta2, my backend process gets terminated during COPY FROM of
large text file (544GB) on partitioned table.
The kernel log says OOM Killer send SIGKILL due to memory pressure.
In fact, 63GB of system physical 64GB was consumed by the PostgreSQL
backend just before the termination.

OOM Killer says:
[69267.687791] Out of memory: Kill process 23881 (postgres) score 977
or sacrifice child
[69267.687860] Killed process 23881 (postgres) total-vm:105210868kB,
anon-rss:63372320kB, file-rss:0kB, shmem-rss:126144kB

Configurations are below:
The 'lineorder' partition table has three child tables by
hash-partitioning on lo_orderkey (numeric).
Each child table has its own tablespace. 'lineorder_p0' is built on
the tablespace 'nvme0' on behalf of independent SSD device for
instance.
The query I run is:
copy lineorder from '/opt/nvme0/ssbm/lineorder.tbl' delimiter '|';

So, its contents shall be distributed to individual child tables,
based on in-database evaluation of hash-keys.

To track which memory-context consumes too much memory more than usual
expectations, I put elog() to report dying message prior to OOM
Killer.
See "aset-track.patch". It raises a warning message when memory
consumption per memory-context goes across watermark.

It says 'PortalContext' consumed 25GB at 04:26, then it grows up to
34GB at 05:21, and terminated at 05:51.
It looks to me somewhere allocates memory our of per-tuple memory
context, but I'm still under the investigation.

Any ideas?

2018-07-25 04:26:54.096 JST [23881] WARNING:  memory context
'PortalContext' grows up 25769803784 bytes
2018-07-25 04:26:54.096 JST [23881] CONTEXT:  COPY lineorder, line
1610626836: 
"1610614887|1|18487099|541334|1474684|19980523|3-MEDIUM|0|30|4125930|30528526|2|4043411|82518|8|19980..."
WARNING:  memory context 'PortalContext' grows up 25769803784 bytes
2018-07-25 04:27:07.202 JST [23865] LOG:  checkpoints are occurring
too frequently (25 seconds apart)
:
  
:
2018-07-25 05:21:22.423 JST [23881] WARNING:  memory context
'PortalContext' grows up 34359738384 bytes
2018-07-25 05:21:22.423 JST [23881] CONTEXT:  COPY lineorder, line
2147497762: 
"2147498439|7|910553|962168|773580|19971006|1-URGENT|0|46|5658552|38894795|1|5601966|73807|2|19971201..."
:
  
:
2018-07-25 05:51:07.264 JST [23837] LOG:  server process (PID 23881)
was terminated by signal 9: Killed
2018-07-25 05:51:07.264 JST [23837] DETAIL:  Failed process was
running: copy lineorder from '/opt/nvme0/ssbm/lineorder.tbl' delimiter
'|';


-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


aset-track.patch
Description: Binary data


OOMKiller.log
Description: Binary data

[ANN] PG-Strom v2.0 is released

2018-04-17 Thread Kohei KaiGai

Hello,

PG-Strom v2.0 is released. It is an extension module for PostgreSQL
to accelerate reporting and analytics workloads towards large scale
data set using GPU.

Major enhancement in PG-Strom v2.0 includes:

- Overall redesign of the internal infrastructure to manage GPU and
stabilization
- CPU+GPU hybrid parallel execution (PgSQL v9.6 / v10 support)
- SSD-to-GPU Direct SQL Execution
- In-memory columnar cache
- GPU memory store (gstore_fdw)
- Redesign of GpuJoin and GpuPreAgg for more efficient GPU usage
- GpuPreAgg + GpuJoin + GpuScan combined GPU kernel

See the slides below for the summary of new features and benchmarks.

  http://heterodb.github.io/pg-strom/blob/20180417_PGStrom_v2.0_TechBrief.pdf

Related URL:
  github: https://github.com/heterodb/pg-strom
  document: http://heterodb.github.io/pg-strom/

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: FP16 Support?

2018-01-08 Thread Kohei KaiGai

Just for your information.

I tried to implement "float2" according to IEEE 754 specification,
as a custom data type of PG-Strom.

https://github.com/heterodb/pg-strom/blob/master/src/float2.c
https://github.com/heterodb/pg-strom/blob/master/sql/float2.sql

The recent GPU device (Maxwell or later) supports "half" precision
data type by the hardware, so it might be valuable for somebody.

Thanks,

2017-11-14 14:49 GMT+09:00 Kohei KaiGai <kai...@heterodb.com>:
> 2017-11-14 10:33 GMT+09:00 Thomas Munro <thomas.mu...@enterprisedb.com>:
>> On Tue, Nov 14, 2017 at 1:11 PM, Kohei KaiGai <kai...@heterodb.com> wrote:
>>> Any opinions?
>>
>> The only reason I can think of for having it in core is that you might
>> want to use standard SQL notation FLOAT(10) to refer to it.  Right now
>> our parser converts that to float4 but it could map precisions up to
>> 10 to float2.  The need for such special treatment is one of my
>> arguments for considering SQL:2016 DECFLOAT(n) in core PostgreSQL.
>> But this case is different: FLOAT(10) already works, it just maps to a
>> type with a larger significand, as permitted by the standard.  So why
>> not just do these short floats as an extension type?
>>
> Our extension will be able to provide its own "half" or "float2" data type
> using CREATE TYPE, indeed. I thought it is useful to other people, even
> if they are not interested in the in-database analytics with GPU, to reduce
> amount of storage consumption.
>
> Of course, it is my opinion.
>
> Thanks,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kai...@heterodb.com>



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Re: FP16 Support?

2017-11-13 Thread Kohei KaiGai

2017-11-14 10:33 GMT+09:00 Thomas Munro <thomas.mu...@enterprisedb.com>:
> On Tue, Nov 14, 2017 at 1:11 PM, Kohei KaiGai <kai...@heterodb.com> wrote:
>> Any opinions?
>
> The only reason I can think of for having it in core is that you might
> want to use standard SQL notation FLOAT(10) to refer to it.  Right now
> our parser converts that to float4 but it could map precisions up to
> 10 to float2.  The need for such special treatment is one of my
> arguments for considering SQL:2016 DECFLOAT(n) in core PostgreSQL.
> But this case is different: FLOAT(10) already works, it just maps to a
> type with a larger significand, as permitted by the standard.  So why
> not just do these short floats as an extension type?
>
Our extension will be able to provide its own "half" or "float2" data type
using CREATE TYPE, indeed. I thought it is useful to other people, even
if they are not interested in the in-database analytics with GPU, to reduce
amount of storage consumption.

Of course, it is my opinion.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Re: FP16 Support?

2017-11-13 Thread Kohei KaiGai

2017-11-14 10:21 GMT+09:00 Tom Lane <t...@sss.pgh.pa.us>:
> Kohei KaiGai <kai...@heterodb.com> writes:
>> How about your thought for support of half-precision floating point,
>> FP16 in short?
>
> This sounds like a whole lotta work for little if any gain.  There's not
> going to be any useful performance gain from using half-width floats
> except in an environment where it's the individual FLOPs that dominate
> your costs.  PG is not designed for that sort of high-throughput
> number-crunching, and it's not likely to get there anytime soon.
>
> When we can show real workloads where float32 ops are actually the
> dominant time sink, it would be appropriate to think about whether
> float16 is a useful solution.  I don't deny that we could get there
> someday, but I think putting in float16 now would be a fine example
> of having your priorities reversed.
>
A typical workload I expect is, a data-scientist stores more than million
records which contains feature vectors hundreds or thousands dimension.
It consumes storage space according to the width and number of items,
and it also affects to the scan performance. In addition, it may need
extra data format conversion if user's script wants to process the feature
vector as half-precision floating point.

If a record contains a feature vector with 1000 dimension by FP32,
a million records consume 4GB storage space.
In case when FP16 is sufficient, it consumes only half of space, thus,
it takes half of time to export data arrays from the database.

Of course, our own extension can define own "half" or "float2" data types
regardless of the core feature, however, I thought it is a useful feature
for other people also.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

FP16 Support?

2017-11-13 Thread Kohei KaiGai

Hello,

How about your thought for support of half-precision floating point,
FP16 in short?
  https://en.wikipedia.org/wiki/Half-precision_floating-point_format

Probably, it does not make sense for most of our known workloads. Our supported
hardware platform does not support FP16 operations, thus, it will be executed by
FP32 logic instead.

On the other hands, folks of machine-learning said FP32 values provides too much
accuracy than necessity, and FP16 can pull out twice calculation
throughput than FP32.
In fact, recent GPU models begin to support FP16 operations by the wired logic.
  https://en.wikipedia.org/wiki/Pascal_(microarchitecture)

People often make inquiries about management of the data-set to be processed
for machine-learning. As literal, DBMS is software for database management.
There are some advantages, like flexible selection of parent population, pre- or
post-processing of the data-set (some algorithms requires to normalize the input
data in [0.0 - 1.0] range), and so on.

If we allow to calculate/manipulate/store the FP16 data in binary
compatible form,
it is much efficient way to fetch binary data for machine-learning engines.

No special operations for machine-learning are needed, but usual arithmetic
operations, type cast, array operations will be useful, even though it
internally
uses FP32 hardware operations on CPUs.

Any opinions?

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

91 matches

Mail list logo