Re: Partial aggregates pushdown

2024-08-22 Thread Tomas Vondra


On 8/22/24 22:07, Bruce Momjian wrote:
> On Thu, Aug 22, 2024 at 09:54:02PM +0200, Tomas Vondra wrote:
>> On 8/22/24 20:56, Bruce Momjian wrote:
>>> You make a very good point above.  Would there ever be cases where a
>>> targetlist would have multiple aggregates, and some can be pushed down,
>>> and some have to return all matching rows so the sender can compute the
>>> aggregate?  If so, how would we handle that?  How does parallelism
>>> handle that now?
>>
>> I think the only sane way to handle this is to disable partial pushdown
>> for that query, fetch all rows and do the aggregate locally.
> 
> Okay.
> 
>>> It is more than different versions.  Different CPU architectures can
>>> store data types differently in binary.  It would be kind of interesting
>>> to have a serial/deserial that was in text format.  I guess that is
>>> where my RECORD idea came from that I just emailed to the list.
>>>
>>
>> I believe everything in this thread assumes "binary" in the same sense
>> as the protocol, with the same rules for sending data in text/binary
>> formats, etc. With binary for integers meaning "network order" and that
>> sort of stuff. Which is mostly independent of PG version.
> 
> If the binary is tranferable between servers of the same PG version but
> perhaps different CPU architectures, you are saying we only are worrying
> about not using the exact same aggregate passing method we use for
> parallelism, except for PG version differences.  Seems we should just
> assume the same version for this optimization and call it done.

IMHO restricting this to the exact same version (it'd really have to be
the same minor version, not just major) would be pretty annoying and I'm
afraid it would seriously restrict how often we'd be able to enable this
optimization.

Consider a big sharded cluster using postgres_fdw to run queries. AFAIK
it's not uncommon to do minor version upgrades node by node, possibly
even with failover to make it as smooth as possible. For the duration of
this upgrade the pushdown would be impossible, because at least one node
has a different minor version. That doesn't seem great.

> Except
> we don't always check for the same PG version and that could lead to
> crashes or security problems?
> 

Yes, we'd need to check this comprehensively. I'm not sure if this might
cause crashes - presumably the input functions should be careful about
this (because otherwise it'd be a problem even without this feature).
Also not sure about security issues.

>>> What I would really like is something similar to pg_proc.proargtypes
>>> (which is data type oidvector) for pg_aggregate where we can supply the
>>> oids of the pg_aggregate.aggtranstype and construct a record on the fly
>>> to return to the FDW caller, rather than having to create specialized
>>> functions for every aggregate that needs to return several different
>>> data types, e.g., AVG(interval), #4 in my previous email.  I realize
>>> this would require the creation of data types int128 and int128 array.
>>>
>>
>> Isn't that really just a definition of a composite type?
> 
> Probably.  I am trying to leverage what we already have.
> 
>> I still don't understand which part of the code constructs the record.
>> If the aggregate has aggstate internal, I don't see how could it be
>> anything else than the aggregate itself.
> 
> Yes, but I was unclear if we needed a special function for every
> construction.
> 

OK


-- 
Tomas Vondra




Re: Partial aggregates pushdown

2024-08-22 Thread Bruce Momjian
On Thu, Aug 22, 2024 at 09:54:02PM +0200, Tomas Vondra wrote:
> On 8/22/24 20:56, Bruce Momjian wrote:
> > You make a very good point above.  Would there ever be cases where a
> > targetlist would have multiple aggregates, and some can be pushed down,
> > and some have to return all matching rows so the sender can compute the
> > aggregate?  If so, how would we handle that?  How does parallelism
> > handle that now?
> 
> I think the only sane way to handle this is to disable partial pushdown
> for that query, fetch all rows and do the aggregate locally.

Okay.

> > It is more than different versions.  Different CPU architectures can
> > store data types differently in binary.  It would be kind of interesting
> > to have a serial/deserial that was in text format.  I guess that is
> > where my RECORD idea came from that I just emailed to the list.
> >
> 
> I believe everything in this thread assumes "binary" in the same sense
> as the protocol, with the same rules for sending data in text/binary
> formats, etc. With binary for integers meaning "network order" and that
> sort of stuff. Which is mostly independent of PG version.

If the binary is tranferable between servers of the same PG version but
perhaps different CPU architectures, you are saying we only are worrying
about not using the exact same aggregate passing method we use for
parallelism, except for PG version differences.  Seems we should just
assume the same version for this optimization and call it done.  Except
we don't always check for the same PG version and that could lead to
crashes or security problems?

> > What I would really like is something similar to pg_proc.proargtypes
> > (which is data type oidvector) for pg_aggregate where we can supply the
> > oids of the pg_aggregate.aggtranstype and construct a record on the fly
> > to return to the FDW caller, rather than having to create specialized
> > functions for every aggregate that needs to return several different
> > data types, e.g., AVG(interval), #4 in my previous email.  I realize
> > this would require the creation of data types int128 and int128 array.
> > 
> 
> Isn't that really just a definition of a composite type?

Probably.  I am trying to leverage what we already have.

> I still don't understand which part of the code constructs the record.
> If the aggregate has aggstate internal, I don't see how could it be
> anything else than the aggregate itself.

Yes, but I was unclear if we needed a special function for every
construction.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-22 Thread Tomas Vondra
On 8/22/24 20:56, Bruce Momjian wrote:
> On Wed, Aug 21, 2024 at 05:41:02PM +0200, Tomas Vondra wrote:
>> On 8/8/24 13:48, Jelte Fennema-Nio wrote:
>>> SUMMARY OF THREAD
>>>
>>> The design of patch 0001 is agreed upon by everyone on the thread (so
>>> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
>>> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
>>> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
>>> is only supported for aggregates with a non-internal/pseudo type as
>>> the stype.
>>
>> I don't have a strong opinion on this, but I wonder if someone might
>> object this essentially extends the syntax with something that is not
>> (and never will be) in the SQL standard. I wonder if there's some
>> precedent for encoding such explicit execution instructions into the
>> query itself?
>>
>> That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I
>> don't think it makes much sense to do this only for some aggregates,
>> right? Do we have a way to make sure the query is "consistent"? I'm not
>> sure if doing this on the source (before pushdown) is enough. Could
>> there be a difference in what the remote instance supports?
>>
>> The only alternative that I can think of (and that I believe was already
>> mentioned in this thread) is to set some GUC that forces the top-most
>> query level to do this (all aggregates at that level). That's have the
>> benefit of always affecting all aggregates.
> 
> You make a very good point above.  Would there ever be cases where a
> targetlist would have multiple aggregates, and some can be pushed down,
> and some have to return all matching rows so the sender can compute the
> aggregate?  If so, how would we handle that?  How does parallelism
> handle that now?

I think the only sane way to handle this is to disable partial pushdown
for that query, fetch all rows and do the aggregate locally.

>>> The design for patch 0002 is still under debate. This would expand on
>>> the functionality added by adding support for PARTIAL_AGGREGATE for
>>> aggregates with an internal stype. This is done by returning a byte
>>> array containing the bytes that the serialfunc of the aggregate
>>> returns.
>>>
>>> A competing proposal for 0002 is to instead change aggregates to not
>>> use an internal stype anymore, and create dedicated types. The main
>>> downside here is that infunc and outfunc would need to be added for
>>> text serialization, in addition to the binary serialization. An open
>>> question is: Can we change the requirements for CREATE TYPE, so that
>>> types can be created without infunc and outfunc.
>>>
>>
>> I think it's +0.5 for the new dedicated data types from me.
>>
>> I admit I'm too lazy to read the whole thread from scratch, but I
>> believe we did discuss the possibility to reuse the serial/deserial
>> functions we already have, but the reason against that was the missing
>> cross-version stability. Parallel queries always run within a single
>> instance, hence there are no concerns about other versions. But this is
>> meant to support the remote node having a wildly different version.
> 
> It is more than different versions.  Different CPU architectures can
> store data types differently in binary.  It would be kind of interesting
> to have a serial/deserial that was in text format.  I guess that is
> where my RECORD idea came from that I just emailed to the list.
>

I believe everything in this thread assumes "binary" in the same sense
as the protocol, with the same rules for sending data in text/binary
formats, etc. With binary for integers meaning "network order" and that
sort of stuff. Which is mostly independent of PG version.

When I say version dependency, I mean the structure of the aggregate
state itself. That is, in one version the state may store (A,B), while
in the other version it may be (B,C). For example, we could store
different values in the array, or something like that.


> What I would really like is something similar to pg_proc.proargtypes
> (which is data type oidvector) for pg_aggregate where we can supply the
> oids of the pg_aggregate.aggtranstype and construct a record on the fly
> to return to the FDW caller, rather than having to create specialized
> functions for every aggregate that needs to return several different
> data types, e.g., AVG(interval), #4 in my previous email.  I realize
> this would require the creation of data types int128 and int128 array.
> 

Isn't that really just a definition of a composite type?

I still don't understand which part of the code constructs the record.
If the aggregate has aggstate internal, I don't see how could it be
anything else than the aggregate itself.

>> I guess we might introduce another pair of serial/deserial functions,
>> with this guarantee. I know we (me and Jelte) discussed that in person
>> at some point, and there were arguments for doing the data types. But I
>> managed to forget the details :-(
>>
>>

Re: Partial aggregates pushdown

2024-08-22 Thread Bruce Momjian
On Thu, Aug 22, 2024 at 08:31:11PM +0200, Tomas Vondra wrote:
> > My question is related to #3 and #4.  For #3, if we are going to be
> > building infrastructure to handle passing int128 for AVG, wouldn't it be
> > wiser to create an int128 type and an int128 array type, and then use
> > method #2 to handle those, rather than creating custom code just to
> > read/write int128 values for FDWs aggregate passing alone.
> > 
> 
> Yep, adding int128 as a data type would extend this to aggregates that
> store state as int128 (or array of int128).

Great, I am not too far off then.

> > For #4, can we use or improve the RECORD data type to handle #4 --- that
> > seems preferable to creating custom FDWs aggregate passing code.
> > 
> > I know the open question was whether we should create custom FDWs
> > aggregate passing functions or custom data types for FDWs aggregate
> > passing, but I am asking if we can improve existing facilities, like
> > int128 or record passing, to reduce the need for some of these.
> > 
> 
> But which code would produce the record? AFAIK it can't happen in some
> generic executor code, because that only sees "internal" for each
> aggregate. The exact structure of the aggstate is private within the
> code of each aggregate - the record would have to be built there, no?
> 
> I imagine we'd add this for each aggregate as a new pair of functions to
> build/parse the record, but that's kinda the serial/deserial way we
> discussed earlier.
> 
> Or are you suggesting we'd actually say:
> 
>   CREATE AGGREGATE xyz(...) (
> STYPE = record,
> ...
>   )

So my idea from the email I just sent is to create a
pg_proc.proargtypes-like column (data type oidvector) for pg_aggregate
which stores the oids of the values we want to return, so AVG(interval)
would have an array of the oids for interval and int8, e.g.:

SELECT oid FROM pg_type WHERE typname = 'interval';
 oid
--
 1186

SELECT oid FROM pg_type WHERE typname = 'int8';
 oid
-
  20

SELECT '1186 20'::oidvector;
 oidvector
---
 1186 20

It seems all four methods could use this, again assuming we create
int128/int16 and whatever other types we need.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-22 Thread Bruce Momjian
On Wed, Aug 21, 2024 at 05:41:02PM +0200, Tomas Vondra wrote:
> On 8/8/24 13:48, Jelte Fennema-Nio wrote:
> > SUMMARY OF THREAD
> > 
> > The design of patch 0001 is agreed upon by everyone on the thread (so
> > far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> > cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> > for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> > is only supported for aggregates with a non-internal/pseudo type as
> > the stype.
> 
> I don't have a strong opinion on this, but I wonder if someone might
> object this essentially extends the syntax with something that is not
> (and never will be) in the SQL standard. I wonder if there's some
> precedent for encoding such explicit execution instructions into the
> query itself?
> 
> That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I
> don't think it makes much sense to do this only for some aggregates,
> right? Do we have a way to make sure the query is "consistent"? I'm not
> sure if doing this on the source (before pushdown) is enough. Could
> there be a difference in what the remote instance supports?
> 
> The only alternative that I can think of (and that I believe was already
> mentioned in this thread) is to set some GUC that forces the top-most
> query level to do this (all aggregates at that level). That's have the
> benefit of always affecting all aggregates.

You make a very good point above.  Would there ever be cases where a
targetlist would have multiple aggregates, and some can be pushed down,
and some have to return all matching rows so the sender can compute the
aggregate?  If so, how would we handle that?  How does parallelism
handle that now?

> > The design for patch 0002 is still under debate. This would expand on
> > the functionality added by adding support for PARTIAL_AGGREGATE for
> > aggregates with an internal stype. This is done by returning a byte
> > array containing the bytes that the serialfunc of the aggregate
> > returns.
> > 
> > A competing proposal for 0002 is to instead change aggregates to not
> > use an internal stype anymore, and create dedicated types. The main
> > downside here is that infunc and outfunc would need to be added for
> > text serialization, in addition to the binary serialization. An open
> > question is: Can we change the requirements for CREATE TYPE, so that
> > types can be created without infunc and outfunc.
> > 
> 
> I think it's +0.5 for the new dedicated data types from me.
> 
> I admit I'm too lazy to read the whole thread from scratch, but I
> believe we did discuss the possibility to reuse the serial/deserial
> functions we already have, but the reason against that was the missing
> cross-version stability. Parallel queries always run within a single
> instance, hence there are no concerns about other versions. But this is
> meant to support the remote node having a wildly different version.

It is more than different versions.  Different CPU architectures can
store data types differently in binary.  It would be kind of interesting
to have a serial/deserial that was in text format.  I guess that is
where my RECORD idea came from that I just emailed to the list.

What I would really like is something similar to pg_proc.proargtypes
(which is data type oidvector) for pg_aggregate where we can supply the
oids of the pg_aggregate.aggtranstype and construct a record on the fly
to return to the FDW caller, rather than having to create specialized
functions for every aggregate that needs to return several different
data types, e.g., AVG(interval), #4 in my previous email.  I realize
this would require the creation of data types int128 and int128 array.

> I guess we might introduce another pair of serial/deserial functions,
> with this guarantee. I know we (me and Jelte) discussed that in person
> at some point, and there were arguments for doing the data types. But I
> managed to forget the details :-(
> 
> > WHAT IS NEEDED?
> > 
> > The things needed for this patch are that docs need to be added, and
> > detailed codereview needs to be done.
> 
> Yeah, I think the docs are must-have for a proper review.

I think the docs are on hold until we decide on a transfer method.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-22 Thread Tomas Vondra
On 8/22/24 19:22, Bruce Momjian wrote:
> On Wed, Aug 21, 2024 at 04:59:12PM +0200, Tomas Vondra wrote:
>> On 8/20/24 20:41, Bruce Momjian wrote:
>>> SELECT (oid, relname) FROM pg_class LIMIT 1;
>>>  row
>>> -
>>>  (2619,pg_statistic)
>>>
>>> SELECT pg_typeof((oid, relname)) FROM pg_class LIMIT 1;
>>>  pg_typeof
>>> ---
>>>  record
>>>
>>> SELECT pg_typeof(oid) FROM pg_class LIMIT 1;
>>>  pg_typeof
>>> ---
>>>  oid
>>> 
>>> SELECT pg_typeof(relname) FROM pg_class LIMIT 1;
>>>  pg_typeof
>>> ---
>>>  name
>>>
>>
>> How would this help with the AVG(bigint) case? We don't have int128 as
>> SQL type, so what would be part of the record? Also, which part of the
>> code would produce the record? If the internal state is "internal", that
>> would probably need to be something aggregate specific, and that's kinda
>> what this patch series is adding, no?
>>
>> Or am I missing some cases where the record would make it work?
> 
> Right now, my goal in this thread is to try to concretely explain what
> is being proposed.  Therefore, I think I need to go back and make four
> categories instead of two:
> 
> 1.  cases like MAX(int), where we return only one value, and the FDW
> return value is an existing data type, e.g., int
> 
> 2.  cases like AVG(int) where we return multiple FDW values of the same
> type and can use an array, e.g.,  bigint array
> 
> 3.  cases like AVG(bigint) where we return multiple FDW values of the
> same type (or can), but the one of the FDW return values is not an
> existing data type, e.g.  int128
> 
> 4.  cases like AVG(interval) where we return multiple FDW values of
> different types, e.g. interval and an integral count
> 
> For #1, all MAX cases have aggregate input parameters the same as the
> FDW return types (aggtranstype):
> 
>   SELECT proargtypes[0]::regtype, aggtranstype::regtype
>   FROM pg_aggregate a JOIN pg_proc p ON a.aggfnoid = p.oid
>   WHERE proname = 'max' AND proargtypes[0] != aggtranstype;
> 
>proargtypes | aggtranstype
>   -+--
> 
> For #2-4, we have for AVG:
> 
>   SELECT proargtypes[0]::regtype, aggtranstype::regtype
>   FROM pg_aggregate a JOIN pg_proc p ON a.aggfnoid = p.oid
>   WHERE proname = 'avg';
>   
>  proargtypes|aggtranstype
>   --+
> 3->bigint   | internal
> 2->integer  | bigint[]
> 2->smallint | bigint[]
> 3->numeric  | internal
> 2->real | double precision[]
> 2->double precision | double precision[]
> 4->interval | internal
> 
> You can see which AVG items fall into which categories.  It seems we
> have #1 and #2 handled cleanly in the patch.
> 

Agreed.

> My question is related to #3 and #4.  For #3, if we are going to be
> building infrastructure to handle passing int128 for AVG, wouldn't it be
> wiser to create an int128 type and an int128 array type, and then use
> method #2 to handle those, rather than creating custom code just to
> read/write int128 values for FDWs aggregate passing alone.
> 

Yep, adding int128 as a data type would extend this to aggregates that
store state as int128 (or array of int128).

> For #4, can we use or improve the RECORD data type to handle #4 --- that
> seems preferable to creating custom FDWs aggregate passing code.
> 
> I know the open question was whether we should create custom FDWs
> aggregate passing functions or custom data types for FDWs aggregate
> passing, but I am asking if we can improve existing facilities, like
> int128 or record passing, to reduce the need for some of these.
> 

But which code would produce the record? AFAIK it can't happen in some
generic executor code, because that only sees "internal" for each
aggregate. The exact structure of the aggstate is private within the
code of each aggregate - the record would have to be built there, no?

I imagine we'd add this for each aggregate as a new pair of functions to
build/parse the record, but that's kinda the serial/deserial way we
discussed earlier.

Or are you suggesting we'd actually say:

  CREATE AGGREGATE xyz(...) (
STYPE = record,
...
  )

or something like that? I have no idea if that would work, maybe it
would. In a way it'd not be that different from the "custom data type"
except that it doesn't actually need a custom data type (assuming we
know how to serialize such "generic" records - I'm not familiar with
this code enough to answer that).

The reason why I found the "custom data type" approach interesting is
that it sets clear guarantees / expectations about the stability of the
output between versions. For serial/deserial we make no guarantees, it
can change even in a minor release. But here we need something stable
even for major releases, to allow pushdown when querying older server.
We have that strong expect

Re: Partial aggregates pushdown

2024-08-22 Thread Bruce Momjian
On Wed, Aug 21, 2024 at 04:59:12PM +0200, Tomas Vondra wrote:
> On 8/20/24 20:41, Bruce Momjian wrote:
> > SELECT (oid, relname) FROM pg_class LIMIT 1;
> >  row
> > -
> >  (2619,pg_statistic)
> > 
> > SELECT pg_typeof((oid, relname)) FROM pg_class LIMIT 1;
> >  pg_typeof
> > ---
> >  record
> > 
> > SELECT pg_typeof(oid) FROM pg_class LIMIT 1;
> >  pg_typeof
> > ---
> >  oid
> > 
> > SELECT pg_typeof(relname) FROM pg_class LIMIT 1;
> >  pg_typeof
> > ---
> >  name
> > 
> 
> How would this help with the AVG(bigint) case? We don't have int128 as
> SQL type, so what would be part of the record? Also, which part of the
> code would produce the record? If the internal state is "internal", that
> would probably need to be something aggregate specific, and that's kinda
> what this patch series is adding, no?
> 
> Or am I missing some cases where the record would make it work?

Right now, my goal in this thread is to try to concretely explain what
is being proposed.  Therefore, I think I need to go back and make four
categories instead of two:

1.  cases like MAX(int), where we return only one value, and the FDW
return value is an existing data type, e.g., int

2.  cases like AVG(int) where we return multiple FDW values of the same
type and can use an array, e.g.,  bigint array

3.  cases like AVG(bigint) where we return multiple FDW values of the
same type (or can), but the one of the FDW return values is not an
existing data type, e.g.  int128

4.  cases like AVG(interval) where we return multiple FDW values of
different types, e.g. interval and an integral count

For #1, all MAX cases have aggregate input parameters the same as the
FDW return types (aggtranstype):

SELECT proargtypes[0]::regtype, aggtranstype::regtype
FROM pg_aggregate a JOIN pg_proc p ON a.aggfnoid = p.oid
WHERE proname = 'max' AND proargtypes[0] != aggtranstype;

 proargtypes | aggtranstype
-+--

For #2-4, we have for AVG:

SELECT proargtypes[0]::regtype, aggtranstype::regtype
FROM pg_aggregate a JOIN pg_proc p ON a.aggfnoid = p.oid
WHERE proname = 'avg';

   proargtypes|aggtranstype
--+
3->  bigint   | internal
2->  integer  | bigint[]
2->  smallint | bigint[]
3->  numeric  | internal
2->  real | double precision[]
2->  double precision | double precision[]
4->  interval | internal

You can see which AVG items fall into which categories.  It seems we
have #1 and #2 handled cleanly in the patch.

My question is related to #3 and #4.  For #3, if we are going to be
building infrastructure to handle passing int128 for AVG, wouldn't it be
wiser to create an int128 type and an int128 array type, and then use
method #2 to handle those, rather than creating custom code just to
read/write int128 values for FDWs aggregate passing alone.

For #4, can we use or improve the RECORD data type to handle #4 --- that
seems preferable to creating custom FDWs aggregate passing code.

I know the open question was whether we should create custom FDWs
aggregate passing functions or custom data types for FDWs aggregate
passing, but I am asking if we can improve existing facilities, like
int128 or record passing, to reduce the need for some of these.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-22 Thread Ashutosh Bapat
On Wed, Aug 21, 2024 at 9:11 PM Tomas Vondra  wrote:
>
>
>
> On 8/8/24 13:48, Jelte Fennema-Nio wrote:
> > SUMMARY OF THREAD
> >
> > The design of patch 0001 is agreed upon by everyone on the thread (so
> > far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> > cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> > for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> > is only supported for aggregates with a non-internal/pseudo type as
> > the stype.
> >
>
> I don't have a strong opinion on this, but I wonder if someone might
> object this essentially extends the syntax with something that is not
> (and never will be) in the SQL standard. I wonder if there's some
> precedent for encoding such explicit execution instructions into the
> query itself?

This feature might be a useful feature to run aggregation in a
federated database across many source databases. So people in the
community who participate in SQL standard may add it there. While
implementing the feature, we might think of it as influencing the
execution but I don't see it that way. It's a feature allowing users
to access a pre-finalization state of an aggregate which they can use
to combine with such states from other data sources. There may be
other uses as well. But adding it as a SQL feature means some
standardization of what is partial aggregate for each aggregate
function - the notions of which are intuitive but they need to be
standardized. Of course, going this way means that it will take longer
for the feature to be available but it won't look like a kludge at
least.

-- 
Best Wishes,
Ashutosh Bapat




Re: Partial aggregates pushdown

2024-08-21 Thread Tomas Vondra



On 8/8/24 13:48, Jelte Fennema-Nio wrote:
> SUMMARY OF THREAD
> 
> The design of patch 0001 is agreed upon by everyone on the thread (so
> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> is only supported for aggregates with a non-internal/pseudo type as
> the stype.
> 

I don't have a strong opinion on this, but I wonder if someone might
object this essentially extends the syntax with something that is not
(and never will be) in the SQL standard. I wonder if there's some
precedent for encoding such explicit execution instructions into the
query itself?

That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I
don't think it makes much sense to do this only for some aggregates,
right? Do we have a way to make sure the query is "consistent"? I'm not
sure if doing this on the source (before pushdown) is enough. Could
there be a difference in what the remote instance supports?

The only alternative that I can think of (and that I believe was already
mentioned in this thread) is to set some GUC that forces the top-most
query level to do this (all aggregates at that level). That's have the
benefit of always affecting all aggregates.

> The design for patch 0002 is still under debate. This would expand on
> the functionality added by adding support for PARTIAL_AGGREGATE for
> aggregates with an internal stype. This is done by returning a byte
> array containing the bytes that the serialfunc of the aggregate
> returns.
> 
> A competing proposal for 0002 is to instead change aggregates to not
> use an internal stype anymore, and create dedicated types. The main
> downside here is that infunc and outfunc would need to be added for
> text serialization, in addition to the binary serialization. An open
> question is: Can we change the requirements for CREATE TYPE, so that
> types can be created without infunc and outfunc.
> 

I think it's +0.5 for the new dedicated data types from me.

I admit I'm too lazy to read the whole thread from scratch, but I
believe we did discuss the possibility to reuse the serial/deserial
functions we already have, but the reason against that was the missing
cross-version stability. Parallel queries always run within a single
instance, hence there are no concerns about other versions. But this is
meant to support the remote node having a wildly different version.

I guess we might introduce another pair of serial/deserial functions,
with this guarantee. I know we (me and Jelte) discussed that in person
at some point, and there were arguments for doing the data types. But I
managed to forget the details :-(

> WHAT IS NEEDED?
> 
> The things needed for this patch are that docs need to be added, and
> detailed codereview needs to be done.

Yeah, I think the docs are must-have for a proper review.

> Feedback from more people on the two competing proposals for 0002
> would be very helpful in making a decision.
> 



regards

-- 
Tomas Vondra




Re: Partial aggregates pushdown

2024-08-21 Thread Tomas Vondra



On 8/20/24 20:41, Bruce Momjian wrote:
> On Tue, Aug 20, 2024 at 07:03:56PM +0200, Jelte Fennema-Nio wrote:
>> On Tue, 20 Aug 2024 at 18:50, Bruce Momjian  wrote:
>>> Okay, so we can do MAX easily, and AVG if the count can be represented
>>> as the same data type as the sum?  Is that correct?  Our only problem is
>>> that something like AVG(interval) can't use an array because arrays have
>>> to have the same data type for all array elements, and an interval can't
>>> represent a count?
>>
>> Close, but still not completely correct. AVG(bigint) can also not be
>> supported by patch 1, because the sum and the count for that both
>> stored using an int128. So we'd need an array of int128, and there's
>> currently no int128 SQL type.
> 
> Okay.  Have we considered having the FDW return a record:
> 
>   SELECT (oid, relname) FROM pg_class LIMIT 1;
>row
>   -
>(2619,pg_statistic)
> 
>   SELECT pg_typeof((oid, relname)) FROM pg_class LIMIT 1;
>pg_typeof
>   ---
>record
> 
>   SELECT pg_typeof(oid) FROM pg_class LIMIT 1;
>pg_typeof
>   ---
>oid
>   
>   SELECT pg_typeof(relname) FROM pg_class LIMIT 1;
>pg_typeof
>   ---
>name
> 

How would this help with the AVG(bigint) case? We don't have int128 as
SQL type, so what would be part of the record? Also, which part of the
code would produce the record? If the internal state is "internal", that
would probably need to be something aggregate specific, and that's kinda
what this patch series is adding, no?

Or am I missing some cases where the record would make it work?


regards

-- 
Tomas Vondra




Re: Partial aggregates pushdown

2024-08-20 Thread Bruce Momjian
On Tue, Aug 20, 2024 at 07:03:56PM +0200, Jelte Fennema-Nio wrote:
> On Tue, 20 Aug 2024 at 18:50, Bruce Momjian  wrote:
> > Okay, so we can do MAX easily, and AVG if the count can be represented
> > as the same data type as the sum?  Is that correct?  Our only problem is
> > that something like AVG(interval) can't use an array because arrays have
> > to have the same data type for all array elements, and an interval can't
> > represent a count?
> 
> Close, but still not completely correct. AVG(bigint) can also not be
> supported by patch 1, because the sum and the count for that both
> stored using an int128. So we'd need an array of int128, and there's
> currently no int128 SQL type.

Okay.  Have we considered having the FDW return a record:

SELECT (oid, relname) FROM pg_class LIMIT 1;
 row
-
 (2619,pg_statistic)

SELECT pg_typeof((oid, relname)) FROM pg_class LIMIT 1;
 pg_typeof
---
 record

SELECT pg_typeof(oid) FROM pg_class LIMIT 1;
 pg_typeof
---
 oid

SELECT pg_typeof(relname) FROM pg_class LIMIT 1;
 pg_typeof
---
 name

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-20 Thread Jelte Fennema-Nio
On Tue, 20 Aug 2024 at 18:50, Bruce Momjian  wrote:
> Okay, so we can do MAX easily, and AVG if the count can be represented
> as the same data type as the sum?  Is that correct?  Our only problem is
> that something like AVG(interval) can't use an array because arrays have
> to have the same data type for all array elements, and an interval can't
> represent a count?

Close, but still not completely correct. AVG(bigint) can also not be
supported by patch 1, because the sum and the count for that both
stored using an int128. So we'd need an array of int128, and there's
currently no int128 SQL type.




Re: Partial aggregates pushdown

2024-08-20 Thread Bruce Momjian
On Tue, Aug 20, 2024 at 10:07:32AM +0200, Jelte Fennema-Nio wrote:
> On Thu, 15 Aug 2024 at 23:12, Bruce Momjian  wrote:
> > Third, I would like to show a more specific example to clarify what is
> > being considered above.  If we look at MAX(), we can have FDWs return
> > the max for each FDW, and the coordinator can chose the highest value.
> > This is the patch 1 listed above.  These can return the
> > pg_aggregate.aggtranstype data type using the pg_type.typoutput text
> > output.
> >
> > The second case is for something like AVG(), which must return the SUM()
> > and COUNT(), and we currently have no way to return multiple text values
> > on the wire.  For patch 0002, we have the option of creating functions
> > that can do this and record them in new pg_attribute columns, or we can
> > create a data type with these functions, and assign the data type to
> > pg_aggregate.aggtranstype.
> >
> > Is that accurate?
> 
> It's close to accurate, but not entirely. Patch 1 would actually
> solves some AVG cases too, because some AVG implementations use an SQL
> array type to store the transtype instead of an internal type. And by
> using an SQL array type we *can* send multiple text values on the
> wire. See below for a list of those aggregates:

Okay, so we can do MAX easily, and AVG if the count can be represented
as the same data type as the sum?  Is that correct?  Our only problem is
that something like AVG(interval) can't use an array because arrays have
to have the same data type for all array elements, and an interval can't
represent a count?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-20 Thread Jelte Fennema-Nio
On Thu, 15 Aug 2024 at 23:12, Bruce Momjian  wrote:
> Third, I would like to show a more specific example to clarify what is
> being considered above.  If we look at MAX(), we can have FDWs return
> the max for each FDW, and the coordinator can chose the highest value.
> This is the patch 1 listed above.  These can return the
> pg_aggregate.aggtranstype data type using the pg_type.typoutput text
> output.
>
> The second case is for something like AVG(), which must return the SUM()
> and COUNT(), and we currently have no way to return multiple text values
> on the wire.  For patch 0002, we have the option of creating functions
> that can do this and record them in new pg_attribute columns, or we can
> create a data type with these functions, and assign the data type to
> pg_aggregate.aggtranstype.
>
> Is that accurate?

It's close to accurate, but not entirely. Patch 1 would actually
solves some AVG cases too, because some AVG implementations use an SQL
array type to store the transtype instead of an internal type. And by
using an SQL array type we *can* send multiple text values on the
wire. See below for a list of those aggregates:

> select p.oid::regprocedure
from pg_aggregate a join pg_proc p on a.aggfnoid = p.oid
where aggfinalfn != 0 and aggtranstype::regtype not in ('internal',
'anyenum', 'anyelement', 'anyrange', 'anyarray', 'anymultirange');
oid
───
 avg(integer)
 avg(smallint)
 avg(real)
 avg(double precision)
 avg(interval)
 var_pop(real)
 var_pop(double precision)
 var_samp(real)
 var_samp(double precision)
 variance(real)
 variance(double precision)
 stddev_pop(real)
 stddev_pop(double precision)
 stddev_samp(real)
 stddev_samp(double precision)
 stddev(real)
 stddev(double precision)
 regr_sxx(double precision,double precision)
 regr_syy(double precision,double precision)
 regr_sxy(double precision,double precision)
 regr_avgx(double precision,double precision)
 regr_avgy(double precision,double precision)
 regr_r2(double precision,double precision)
 regr_slope(double precision,double precision)
 regr_intercept(double precision,double precision)
 covar_pop(double precision,double precision)
 covar_samp(double precision,double precision)
 corr(double precision,double precision)
(28 rows)

And to be clear, these are in addition to the MAX type of aggregates
you were describing:
> select p.oid::regprocedure
from pg_aggregate a join pg_proc p on a.aggfnoid = p.oid
where aggfinalfn = 0 and aggtranstype::regtype not in ('internal',
'anyenum', 'anyelement', 'anyrange', 'anyarray', 'anymultirange');
  oid
───
 sum(integer)
 sum(smallint)
 sum(real)
 sum(double precision)
 sum(money)
 sum(interval)
 max(bigint)
 max(integer)
 max(smallint)
 max(oid)
 max(real)
 max(double precision)
 max(date)
 max(time without time zone)
 max(time with time zone)
 max(money)
 max(timestamp without time zone)
 max(timestamp with time zone)
 max(interval)
 max(text)
 max(numeric)
 max(character)
 max(tid)
 max(inet)
 max(pg_lsn)
 max(xid8)
 min(bigint)
 min(integer)
 min(smallint)
 min(oid)
 min(real)
 min(double precision)
 min(date)
 min(time without time zone)
 min(time with time zone)
 min(money)
 min(timestamp without time zone)
 min(timestamp with time zone)
 min(interval)
 min(text)
 min(numeric)
 min(character)
 min(tid)
 min(inet)
 min(pg_lsn)
 min(xid8)
 count("any")
 count()
 regr_count(double precision,double precision)
 bool_and(boolean)
 bool_or(boolean)
 every(boolean)
 bit_and(smallint)
 bit_or(smallint)
 bit_xor(smallint)
 bit_and(integer)
 bit_or(integer)
 bit_xor(integer)
 bit_and(bigint)
 bit_or(bigint)
 bit_xor(bigint)
 bit_and(bit)
 bit_or(bit)
 bit_xor(bit)
 xmlagg(xml)
(65 rows)




Re: Partial aggregates pushdown

2024-08-15 Thread Bruce Momjian
On Thu, Aug  8, 2024 at 01:48:49PM +0200, Jelte Fennema-Nio wrote:
> SUMMARY OF THREAD
> 
> The design of patch 0001 is agreed upon by everyone on the thread (so
> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> is only supported for aggregates with a non-internal/pseudo type as
> the stype.
> 
> The design for patch 0002 is still under debate. This would expand on
> the functionality added by adding support for PARTIAL_AGGREGATE for
> aggregates with an internal stype. This is done by returning a byte
> array containing the bytes that the serialfunc of the aggregate
> returns.
> 
> A competing proposal for 0002 is to instead change aggregates to not
> use an internal stype anymore, and create dedicated types. The main
> downside here is that infunc and outfunc would need to be added for
> text serialization, in addition to the binary serialization. An open
> question is: Can we change the requirements for CREATE TYPE, so that
> types can be created without infunc and outfunc.
> 
> WHAT IS NEEDED?
> 
> The things needed for this patch are that docs need to be added, and
> detailed codereview needs to be done.
> 
> Feedback from more people on the two competing proposals for 0002
> would be very helpful in making a decision.

First, I am sorry to be replying so late --- I have been traveling for
the past four weeks.  Second, I consider this feature a big part of
sharding, and I think sharding is Postgres's biggest missing feature. I
talk about this patch often when asked about what Postgres is working
on.

Third, I would like to show a more specific example to clarify what is
being considered above.  If we look at MAX(), we can have FDWs return
the max for each FDW, and the coordinator can chose the highest value. 
This is the patch 1 listed above.  These can return the
pg_aggregate.aggtranstype data type using the pg_type.typoutput text
output.

The second case is for something like AVG(), which must return the SUM()
and COUNT(), and we currently have no way to return multiple text values
on the wire.  For patch 0002, we have the option of creating functions
that can do this and record them in new pg_attribute columns, or we can
create a data type with these functions, and assign the data type to
pg_aggregate.aggtranstype.

Is that accurate?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-15 Thread Bruce Momjian
On Sun, Jul  7, 2024 at 09:52:27PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Bruce.
> 
> > From: Bruce Momjian 
> > Is there a reason the documentation is no longer a part of this patch?
> > Can I help you keep it current?
> 
> Here are the reasons:
>   Reason1. The approach differs significantly from the previous patch that 
> included documentation, the below.
> 
> https://www.postgresql.org/message-id/attachment/152086/0001-Partial-aggregates-push-down-v34.patch
>   Reason2. I have two options for transmitting the state value and I'm 
> evaluating which one is optimal.
>One is what I presented you in PGConf.dev2024. The other is Jelte's one.
>He listened to my talk at the conference and gave me some useful comments 
> on hackers. I'm very grateful that.
>   Reason3. The implementation and test have been not finished yet.
> Regarding Reason 2, I provided my conclusion in the previous message.
> 
> My plan for advancing the patch involves the following steps:
>   Step1. Decide the approach on transmitting state value.
>   Step2. Implement code (including comments) and tests to support a subset of 
> aggregate functions.
> Specifically, I plan to support avg, sum, and other aggregate functions 
> like min and max which don't need export/import functions.
>   Step3. Add documentations.
> 
> To clarify my approach, should I proceed with Step 3 before Step2?
> I would appreciate your feedback on this.

Thanks, I now understand why the docs were remove, and I agree.  I will
post about the options now in a new email.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-08-08 Thread Jelte Fennema-Nio
SUMMARY OF THREAD

The design of patch 0001 is agreed upon by everyone on the thread (so
far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
is only supported for aggregates with a non-internal/pseudo type as
the stype.

The design for patch 0002 is still under debate. This would expand on
the functionality added by adding support for PARTIAL_AGGREGATE for
aggregates with an internal stype. This is done by returning a byte
array containing the bytes that the serialfunc of the aggregate
returns.

A competing proposal for 0002 is to instead change aggregates to not
use an internal stype anymore, and create dedicated types. The main
downside here is that infunc and outfunc would need to be added for
text serialization, in addition to the binary serialization. An open
question is: Can we change the requirements for CREATE TYPE, so that
types can be created without infunc and outfunc.

WHAT IS NEEDED?

The things needed for this patch are that docs need to be added, and
detailed codereview needs to be done.

Feedback from more people on the two competing proposals for 0002
would be very helpful in making a decision.




Re: Partial aggregates pushdown

2024-07-08 Thread Jelte Fennema-Nio
On Mon, 8 Jul 2024 at 14:12, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> The best thing about Approach2 is that it lets us send state values using the 
> existing data type system.
> I'm worried that if we choose Approach2, we might face some limits because we 
> have to create new types.
> But, we might be able to fix these limits if we look into it more.
>
> Approach1 doesn't make new types, so we can avoid these limits.
> But, it means we have to make export/import functions that are similar to the 
> typsend/typreceive functions.
> So, we need to make sure if we really need this method.
>
> Is this the right understanding?

Yeah, correct. To clarify my reasoning a bit more: IMHO, the main
downside of implementing Approach1 is that we then end up with two
different mechanisms to "take data from memory and serialize it in a
way in which it can be sent over the network". I'd very much prefer if
we could have a single system responsible for that task. So if there's
issues with the current system (e.g. having to implement
typinput/typoutput), then I'd rather address these problems in the
existing system. Only if that turns out to be impossible for some
reason, then I think I'd prefer Approach1.

Personally, even if the Approach2 requires a bit more code, then I'd
still prefer a single serialization system over having two
serializiation systems.




RE: Partial aggregates pushdown

2024-07-08 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte,

Thank you for comments and advises.

> From: Jelte Fennema-Nio 
> Sent: Monday, July 8, 2024 5:31 PM
> On Sun, 7 Jul 2024 at 23:46, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > In my mind, Approach1 is superior. Therefore, if there are no objections 
> > this week, I plan to resume implementing
> Approach1 next week. I would appreciate it if anyone could discuss the topic 
> with me or ask questions.
> 
> Honestly, the more I think about this, the more I like Approach2. Not because 
> I disagree with you about some of the
> limitations of Approach2, but because I'd rather see those limitations fixed 
> in CREATE TYPE, instead of working around
> these limitations in CREATE AGGREGATE. That way more usages can benefit. 
> Detailed explanation and arguments below.
Firstly, I may have jumped to conclusions too quickly. I apologize that.
I would appreciate it if we clarify Approach 1 and Approach 2 more precisely 
and we can proceed with the discussion.

Before we get into the details, let me break down the main differences between 
Approach 1 and Approach 2.

The best thing about Approach2 is that it lets us send state values using the 
existing data type system.
I'm worried that if we choose Approach2, we might face some limits because we 
have to create new types.
But, we might be able to fix these limits if we look into it more. 

Approach1 doesn't make new types, so we can avoid these limits.
But, it means we have to make export/import functions that are similar to the 
typsend/typreceive functions.
So, we need to make sure if we really need this method.

Is this the right understanding?

> From: Jelte Fennema-Nio 
> Sent: Monday, July 8, 2024 5:59 PM
> On Sun, 30 Jun 2024 at 23:42, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Instead, I can make PARTIAL_AGGREGATE an unreserved word by placing it 
> > after the FILTER clause, like avg(c1) FILTER
> (WHERE c2 > 0) PARTIAL_AGGREGATE, and by marking it as an ASLABEL word like 
> FILTER.
> > I attached the patch of the method.
> > If there are no objections, I would like to proceed with the method 
> > described above.
> > I'd appreciate it if anyone comment the method.
> 
> I like this approach of using PARTIAL_AGGREGATE in the same place as the 
> FILTER clause.
Thank you for comment.

> On Sun, 7 Jul 2024 at 23:52, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > My plan for advancing the patch involves the following steps:
> >   Step1. Decide the approach on transmitting state value.
> >   Step2. Implement code (including comments) and tests to support a subset 
> > of aggregate functions.
> > Specifically, I plan to support avg, sum, and other aggregate functions 
> > like min and max which don't need
> export/import functions.
> >   Step3. Add documentations.
> >
> > To clarify my approach, should I proceed with Step 3 before Step2?
> 
> (my opinion, Bruce might have a different one)
> 
> I think it's good that you split the original patch in two:
> 0001: non-internal partial aggregates
> 0002: internal partial aggregates
> 
> I think we're aligned on the general design of 0001. So I think now is 
> definitely the time to include documentation there, so
> we can discuss this patch in more detail, and move it forward.
> 
> I think generally for 0002 it would also be useful to have documentation, I 
> personally like reading it to understand the
> general design and then comparing that to the code. But I also understand 
> that the language differences between Japanese
> and English, makes writing such docs a significant effort for you. So I think 
> it would be fine to skip docs for 0002 for now
> until we decide on the approach we want to take for internal partial 
> aggregates.
At least for 0001, it seems like it would be a good idea to attach a document 
at this stage.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2024-07-08 Thread Jelte Fennema-Nio
On Sun, 7 Jul 2024 at 23:52, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> My plan for advancing the patch involves the following steps:
>   Step1. Decide the approach on transmitting state value.
>   Step2. Implement code (including comments) and tests to support a subset of 
> aggregate functions.
> Specifically, I plan to support avg, sum, and other aggregate functions 
> like min and max which don't need export/import functions.
>   Step3. Add documentations.
>
> To clarify my approach, should I proceed with Step 3 before Step2?

(my opinion, Bruce might have a different one)

I think it's good that you split the original patch in two:
0001: non-internal partial aggregates
0002: internal partial aggregates

I think we're aligned on the general design of 0001. So I think now is
definitely the time to include documentation there, so we can discuss
this patch in more detail, and move it forward.

I think generally for 0002 it would also be useful to have
documentation, I personally like reading it to understand the general
design and then comparing that to the code. But I also understand that
the language differences between Japanese and English, makes writing
such docs a significant effort for you. So I think it would be fine to
skip docs for 0002 for now until we decide on the approach we want to
take for internal partial aggregates.




Re: Partial aggregates pushdown

2024-07-08 Thread Jelte Fennema-Nio
On Sun, 30 Jun 2024 at 23:42, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Instead, I can make PARTIAL_AGGREGATE an unreserved word by placing it after 
> the FILTER clause, like avg(c1) FILTER (WHERE c2 > 0) PARTIAL_AGGREGATE, and 
> by marking it as an ASLABEL word like FILTER.
> I attached the patch of the method.
> If there are no objections, I would like to proceed with the method described 
> above.
> I'd appreciate it if anyone comment the method.

I like this approach of using PARTIAL_AGGREGATE in the same place as
the FILTER clause.




Re: Partial aggregates pushdown

2024-07-08 Thread Jelte Fennema-Nio
On Sun, 7 Jul 2024 at 23:46, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> In my mind, Approach1 is superior. Therefore, if there are no objections this 
> week, I plan to resume implementing Approach1 next week. I would appreciate 
> it if anyone could discuss the topic with me or ask questions.

Honestly, the more I think about this, the more I like Approach2. Not
because I disagree with you about some of the limitations of
Approach2, but because I'd rather see those limitations fixed in
CREATE TYPE, instead of working around these limitations in CREATE
AGGREGATE. That way more usages can benefit. Detailed explanation and
arguments below.

> 1. Extendability
> I believe it is crucial to support scenarios where the local and remote major 
> versions may differ in the future (see the below).
>
> https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

>From my reading, Tom's concern is that different server versions
cannot talk to each other anymore. So as long as this perf
optimization is only enabled when server versions are the same, I
don't think there is a huge problem if we never implement this.
Honestly, I even think making this optimization opt-in at the FDW
server creation level would already solve Tom's concert. I do agree
that it would be good if we could have cross version partial
aggregates though, so it's definitely something to consider.

> Regarding this aspect, I consider Approach1 superior to Approach2. The reason 
> is that:
> ・The data type of an aggregate function's state value may change with each 
> major version increment.
> ・In Approach1, by extending the export/import functionalities to include the 
> major version in which the state value was created (refer to p.16 and p.17 of 
> [1]), I can handle such situations.
> ・On the other hand, it appears that Approach2 fundamentally lacks the 
> capability to support these scenarios.

Approach 2 definitely has some cross-version capabilities, e.g.
jsonb_send includes a version. Such an approach can be used to solve a
newer coordinator talking to an older worker, if the transtypes are
the same.

I personally don't think it's worth supporting this optimization for
an older coordinator talking to a newer worker. Using binary protocol
to talk to from an older server to a newer server doesn't work either.

Finally, based on p.16 & p.17 it's unclear to me how cross-version
with different transtypes would work. That situation seems inherently
incompatible to me.

> 2. Amount of codes
> Regarding this aspect, I find Approach1 to be better than Approach2.
> In Approach1, developers only need to export/import functions and can use a 
> standardized format for transmitting state values.
> In Approach2, developers have two options:
>   Option1: Adding typinput/typoutput and typsend/typreceive.
>   Option2: Adding typinput/typoutput only.
> Option1 requires more lines of code, which may be seen as cumbersome by some 
> developers.
> Option2 restricts developers to using only text representation for 
> transmitting state values, which I consider limiting.

In my opinion this is your strongest argument for Approach1. But you
didn't answer my previous two questions yet:

On Tue, 25 Jun 2024 at 11:28, Jelte Fennema-Nio  wrote:
> So that leaves me two questions:
> 1. Maybe CREATE TYPE should allow types without input/output functions
> as long as send/receive are defined. For these types text
> representation could fall back to the hex representation of bytea.
> 2. If for some reason 1 is undesired, then why are transtypes so
> special. Why is it fine for them to only have send/receive functions
> and not for other types?

Basically: I agree with this argument, but I feel like this being a
problem for this usecase is probably a sign that we should take the
solution a step further and solve this at the CREATE TYPE level
instead of allowing people to hack around CREATE TYPE its limitations
just for these partial aggregates.

> 3. Catalog size
> Regarding this point, I believe Approach1 is better than Approach2.
> In Approach1, theoretically, it is necessary to add export/import functions 
> to pg_proc for each aggregate.
> In Approach2, theoretically, it is necessary to add typoutput/typinput 
> functions (and typsend/typreceive if necessary) to pg_proc and add a native 
> type to pg_type for each aggregate.
> I would like to emphasize that we should consider user-defined functions in 
> addition to built-in aggregate functions.
> I think most developers prefer to avoid bloating catalogs, even if they may 
> not be able to specify exact reasons.
> In fact, in Robert's previous review, he expressed a similar concern (see 
> below).
>
> https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

So, to summarize the difference (assuming we change CREATE TYPE to
allow only typsend/typreceive): "Approach 2 adds an additional pg_type
entry per aggregate"

IMHO this is fine, especially sin

RE: Partial aggregates pushdown

2024-07-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Bruce.

> From: Bruce Momjian 
> Is there a reason the documentation is no longer a part of this patch?
> Can I help you keep it current?

Here are the reasons:
  Reason1. The approach differs significantly from the previous patch that 
included documentation, the below.

https://www.postgresql.org/message-id/attachment/152086/0001-Partial-aggregates-push-down-v34.patch
  Reason2. I have two options for transmitting the state value and I'm 
evaluating which one is optimal.
   One is what I presented you in PGConf.dev2024. The other is Jelte's one.
   He listened to my talk at the conference and gave me some useful comments on 
hackers. I'm very grateful that.
  Reason3. The implementation and test have been not finished yet.
Regarding Reason 2, I provided my conclusion in the previous message.

My plan for advancing the patch involves the following steps:
  Step1. Decide the approach on transmitting state value.
  Step2. Implement code (including comments) and tests to support a subset of 
aggregate functions.
Specifically, I plan to support avg, sum, and other aggregate functions 
like min and max which don't need export/import functions.
  Step3. Add documentations.

To clarify my approach, should I proceed with Step 3 before Step2?
I would appreciate your feedback on this.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2024-07-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte and hackers,

I've reconsidered which of the following two approaches is the best.
  Approach1: Adding export/import functions to transmit state values.
  Approach 2: Adding native types which are equal to state values.

In my mind, Approach1 is superior. Therefore, if there are no objections this 
week, I plan to resume implementing Approach1 next week. I would appreciate it 
if anyone could discuss the topic with me or ask questions.

I believe that while Approach1 has the extendability to support situations 
where local and remote major versions differ, Approach2 lacks this 
extendability. Additionally, it seems that Approach1 requires fewer additional 
lines of code compared to Approach2. I'm also concerned that Approach2 may 
cause the catalog pg_type to bloat.

Although Approach2 offers the benefit of avoiding the addition of columns to 
pg_aggregate, I think this benefit is smaller than the advantages of Approach1 
mentioned above.

Next, I will present my complete comparison. The comparison points are as 
follows:
  1. Extendability
  2. Amount of codes
  3. Catalog size
  4. Developer burden
  5. Additional columns to catalogs

1. Extendability
I believe it is crucial to support scenarios where the local and remote major 
versions may differ in the future (see the below).

https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

Regarding this aspect, I consider Approach1 superior to Approach2. The reason 
is that:
・The data type of an aggregate function's state value may change with each 
major version increment.
・In Approach1, by extending the export/import functionalities to include the 
major version in which the state value was created (refer to p.16 and p.17 of 
[1]), I can handle such situations.
・On the other hand, it appears that Approach2 fundamentally lacks the 
capability to support these scenarios.

2. Amount of codes
Regarding this aspect, I find Approach1 to be better than Approach2.
In Approach1, developers only need to export/import functions and can use a 
standardized format for transmitting state values.
In Approach2, developers have two options:
  Option1: Adding typinput/typoutput and typsend/typreceive.
  Option2: Adding typinput/typoutput only.
Option1 requires more lines of code, which may be seen as cumbersome by some 
developers.
Option2 restricts developers to using only text representation for transmitting 
state values, which I consider limiting.

3. Catalog size
Regarding this point, I believe Approach1 is better than Approach2.
In Approach1, theoretically, it is necessary to add export/import functions to 
pg_proc for each aggregate.
In Approach2, theoretically, it is necessary to add typoutput/typinput 
functions (and typsend/typreceive if necessary) to pg_proc and add a native 
type to pg_type for each aggregate.
I would like to emphasize that we should consider user-defined functions in 
addition to built-in aggregate functions.
I think most developers prefer to avoid bloating catalogs, even if they may not 
be able to specify exact reasons.
In fact, in Robert's previous review, he expressed a similar concern (see 
below).

https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

4. Developer burden.
Regarding this aspect, I believe Approach1 is better than Approach2.
In Approach1, developers have the following additional tasks:
  Task1-1: Create and define export/import functions.

In Approach2, developers have the following additional tasks:
  Task2-1: Create and define typoutput/input functions (and typesend/typreceive 
functions if necessary).
  Task2-2: Define a native type.

Approach1 requires fewer additional tasks, although the difference may be not 
substantial.

5. Additional columns to catalogs.
Regarding this aspect, Approach2 is better than Approach1.
Approach1 requires additional three columns in pg_aggregate, specifically the 
aggpartialpushdownsafe flag, export function reference, and import function 
reference.
Approach2 does not require any additional columns in catalogs.
However, over the past four years of discussions, no one has expressed concerns 
about additional columns in catalogs.

[1] 
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2024-07-05 Thread Bruce Momjian
On Sun, Jun 30, 2024 at 09:42:19PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> On Mon, Jun 24, 2024 at 6:09?PM Jelte Fennema-Nio  wrote:
> > 4. Related to 3, I think it would be good to have some tests of
> > PARTIAL_AGGREGATE that don't involve postgres_fdw at all. I also
> > spotted some comments too that mention FDW, even though they apply to
> > the "pure" PARTIAL_AGGREGATE code.
> > 5. This comment now seems incorrect:
> > -* Apply the agg's finalfn if one is provided, else return transValue.
> > +* If the agg's finalfn is provided and PARTIAL_AGGREGATE keyword is
> > +* not specified, apply the agg's finalfn.
> > +* If PARTIAL_AGGREGATE keyword is specified and the transValue type
> > +* is internal, apply the agg's serialfn. In this case the agg's
> > +* serialfn must not be invalid. Otherwise return transValue.
> >
> > 6. These errors are not on purpose afaict (if they are a comment in
> > the test would be good to explain why)
> >
> > +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
> > +ERROR:  could not connect to server "loopback"
> > +DETAIL:  invalid connection option "partial_aggregate_support"
> Fixed.

Is there a reason the documentation is no longer a part of this patch? 
Can I help you keep it current?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2024-06-30 Thread fujii.y...@df.mitsubishielectric.co.jp
> From: Fujii Yuki 
> Sent: Monday, July 1, 2024 6:42 AM
> Hi hackers.
> 
> On Wed, Jun 5, 2024 at 9:15?AM fujii.y...@df.mitsubishielectric.co.jp 
>  wrote:
> >   Requirement2. Consider appropriate position of the new keyword 
> > "PARTIAL_AGGREGATE". (with Robert)
> > Existing patch: Before the target expression. Ex. avg(PARTIAL_AGGREGATE 
> > c1)
> > Ideal: Before the aggregate function. Ex. PARTIAL_AGGREGATE avg(c1)
> >   Requirement3. Consider to avoid to make the new keyword 
> > "PARTIAL_AGGREGATE" become a reserved word. (with
> Robert)
> > In the existing patch, "PARTIAL_AGGREGATE" is a reserved word.
> I considered the above two requirement.
> Based on my research, there is no way to use PARTIAL_AGGREGATE in front of a 
> function name without making it a
> reserved word.
With this way, I couldn't resolve shift/reduce conflicts.


Re: Partial aggregates pushdown

2024-06-25 Thread Jelte Fennema-Nio
On Tue, 25 Jun 2024 at 08:33, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Actually, I have other tasks about "PARTIAL_AGGREAGATE" keyword
> to respond Requirement1 and Requirement2 in the following mail.
> https://www.postgresql.org/message-id/TYAPR01MB3088755F2281D41F5EEF06D495F92%40TYAPR01MB3088.jpnprd01.prod.outlook.com

No problem. I totally think it makes sense to focus on basic
PARTIAL_AGGREGATE first. Which is also why I suggested splitting the
patchset up in multiple patches. That way it's easier to get everyone
aligned on PARTIAL_AGGREGATE behaviour for non-internal transtypes,
which would already be a huge improvement over the current situation
in my opinion.

> After that tasks, I plan to compare your proposal with mine seriously, with 
> additional POC patches if necessary.

Sounds great! To be clear, I'm not sure which proposal is best. I
mainly thought mine seemed interesting because it doesn't require
additional columns. But maybe the benefits that the extra columns in
your proposal brings are worth adding those extra columns.

> I see. It seems that adding new natives might make it easier to transmit the 
> state values between local and remote have different major versions.
> However, in my opinion, we should be careful to support the case in which 
> local and remote have different major versions,
> because the transtype of an aggregate function would may change in future 
> major version due to
> something related to the implementation.
> Actually, something like that occurs recently, see
> https://github.com/postgres/postgres/commit/519fc1bd9e9d7b408903e44f55f83f6db30742b7
> I think the transtype of an aggregate function quite more changeable than 
> retype.
> Consequently, so far, I want to support the cases in which local and remote 
> have the same major version.
> If we try to resolve the limitation, it seems to need more additional codes.

Hmm, that's a very good point. Indeed cross-major-version partial
aggregates pushdown would not be fully solved with this yet.

> And, I'm afraid that adding typinput/typoutput bothers the developers.
> They also have to create a new native types in addition to create their new 
> aggregate functions.
> I wonder if this concern might outweigh the benefits for debugging.
> And, if skipping send/receive, they have to select only the text 
> representation on
> the transmission of the state value. I think it is narrow.

I kinda agree with this argument. But really this same argument
applies just as well for regular CREATE TYPE. Developers are forced to
implement typinput/typoutput, even though send/receive might really be
enough for their usecase. So in a sense with your proposal, you give
transtypes a special status over regular types: i.e. transtypes are
the only types where only send/receive is necessary.

So that leaves me two questions:
1. Maybe CREATE TYPE should allow types without input/output functions
as long as send/receive are defined. For these types text
representation could fall back to the hex representation of bytea.
2. If for some reason 1 is undesired, then why are transtypes so
special. Why is it fine for them to only have send/receive functions
and not for other types?




RE: Partial aggregates pushdown

2024-06-24 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte, hackers.

Thank you for explanations.

Actually, I have other tasks about "PARTIAL_AGGREAGATE" keyword
to respond Requirement1 and Requirement2 in the following mail.
https://www.postgresql.org/message-id/TYAPR01MB3088755F2281D41F5EEF06D495F92%40TYAPR01MB3088.jpnprd01.prod.outlook.com

After that tasks, I plan to compare your proposal with mine seriously, with 
additional POC patches if necessary.

I think that your proposal might seem to be a more fundamental solution.
However, to be honest, so far, I don't perfectly get the benefits and impacts 
by stopping usage of internal types
instead using a native types, especially on handling memory contexts of 
existing deserialization functions and
on the amount of codes to be modified or added.
The followings are the answers with the knowledge I have right now.

> From: Jelte Fennema-Nio 
> Sent: Tuesday, June 25, 2024 5:49 AM
> > However, I still have the following two questions.
> >
> > 1. Not necessary components of new native types Refer to pg_type.dat,
> > typinput and typoutput are required.
> > I think that in your proposal they are not necessary, so waste. I
> > think that it is not acceptable.
> > How can I resolve the problem?
> 
> I think requiring typinput/typoutput is a benefit personally, because that 
> makes it possible to do PARTIAL_AGGREGATE
> pushdown to a different PG major version. Also it makes it easier to debug 
> the partial aggregate values when using
> psql/pgregress. So yes, it requires implementing both binary (send/receive) 
> and text (input/output) serialization, but it also
> has some benefits. And in theory you might be able to skip implementing the 
> binary serialization, and rely purely on the text
> serialization to send partial aggregates between servers.
I see. It seems that adding new natives might make it easier to transmit the 
state values between local and remote have different major versions.
However, in my opinion, we should be careful to support the case in which local 
and remote have different major versions,
because the transtype of an aggregate function would may change in future major 
version due to
something related to the implementation.
Actually, something like that occurs recently, see
https://github.com/postgres/postgres/commit/519fc1bd9e9d7b408903e44f55f83f6db30742b7
I think the transtype of an aggregate function quite more changeable than 
retype.
Consequently, so far, I want to support the cases in which local and remote 
have the same major version.
If we try to resolve the limitation, it seems to need more additional codes.

And, I'm afraid that adding typinput/typoutput bothers the developers.
They also have to create a new native types in addition to create their new 
aggregate functions.
I wonder if this concern might outweigh the benefits for debugging.
And, if skipping send/receive, they have to select only the text representation 
on
the transmission of the state value. I think it is narrow.

> > 2. Many new native types
> > I think that, basically, each aggregate function does need a new native 
> > type.
> > For example,
> > avg(int8), avg(numeric), and var_pop(int4) has the same transtype, 
> > PolyNumAggState.
> > You said that it is enough to add only one native type like
> > state_poly_num_agg for supporting them, right?
> 
> Yes, correct. That's what I had in mind.
> 
> > But the combine functions of them might have quite different
> > expectation on the data items of PolyNumAggState like the range of
> > N(means count) and the true/false of calcSumX2 (the flag of
> > calculating sum of squares).
> > The final functions of them have the similar expectation.
> > So, I think that, responded to your proposal, each of them need a
> > native data type like state_int8_avg, state_numeric_avg, for safety.
> >
> > And, we do need a native type for an aggregate function whose
> > transtype is not internal and not pseudo.
> > For avg(int4), the current transtype is _int8.
> > However, I do need a validation check on the number of the array And
> > the positiveness of count(the first element of the array).
> > Responded to your proposal,
> > I do need a new native type like state_int4_avg.
> 
> To help avoid confusion let me try to restate what I think you mean
> here: You're worried about someone passing in a bogus native type into the 
> final/combine functions and then getting
> crashes and/or wrong results. With internal type people cannot do this 
> because they cannot manually call the
> combinefunc/finalfunc because the argument type is internal. To solve this 
> problem your suggestion is to make the type
> specific to the specific aggregate such that send/receive or input/output can 
> validate the input as reasonable. But this
> would then mean that we have many native types (and also many 
> deserialize/serialize functions).
Yes, right.

> Assuming that's indeed what you meant, that's an interesting thought, I 
> didn't consider that much indeed. My thinking was
> that we only

Re: Partial aggregates pushdown

2024-06-24 Thread Jelte Fennema-Nio
On Mon, 24 Jun 2024 at 15:03, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I see. I maybe got your proposal.
> Refer to your proposal, for avg(int8),
> I create a new native type like state_int8_avg
> with the new typsend/typreceive functions
> and use them to transmit the state value, right?

Yes, that's roughly what I had in mind indeed.

> That might seem to be a more fundamental solution
> because I can avoid adding export/import functions of my proposal,
> which are the new components of aggregate function.
> I have never considered the solution.
> I appreciate your proposal.

Thank you :)

> However, I still have the following two questions.
>
> 1. Not necessary components of new native types
> Refer to pg_type.dat, typinput and typoutput are required.
> I think that in your proposal they are not necessary,
> so waste. I think that it is not acceptable.
> How can I resolve the problem?

I think requiring typinput/typoutput is a benefit personally, because
that makes it possible to do PARTIAL_AGGREGATE pushdown to a different
PG major version. Also it makes it easier to debug the partial
aggregate values when using psql/pgregress. So yes, it requires
implementing both binary (send/receive) and text (input/output)
serialization, but it also has some benefits. And in theory you might
be able to skip implementing the binary serialization, and rely purely
on the text serialization to send partial aggregates between servers.

> 2. Many new native types
> I think that, basically, each aggregate function does need a new native type.
> For example,
> avg(int8), avg(numeric), and var_pop(int4) has the same transtype, 
> PolyNumAggState.
> You said that it is enough to add only one native type like state_poly_num_agg
> for supporting them, right?

Yes, correct. That's what I had in mind.

> But the combine functions of them might have quite different expectation
> on the data items of PolyNumAggState like
> the range of N(means count) and the true/false of calcSumX2
> (the flag of calculating sum of squares).
> The final functions of them have the similar expectation.
> So, I think that, responded to your proposal,
> each of them need a native data type
> like state_int8_avg, state_numeric_avg, for safety.
>
> And, we do need a native type for an aggregate function
> whose transtype is not internal and not pseudo.
> For avg(int4), the current transtype is _int8.
> However, I do need a validation check on the number of the array
> And the positiveness of count(the first element of the array).
> Responded to your proposal,
> I do need a new native type like state_int4_avg.

To help avoid confusion let me try to restate what I think you mean
here: You're worried about someone passing in a bogus native type into
the final/combine functions and then getting crashes and/or wrong
results. With internal type people cannot do this because they cannot
manually call the combinefunc/finalfunc because the argument type is
internal. To solve this problem your suggestion is to make the type
specific to the specific aggregate such that send/receive or
input/output can validate the input as reasonable. But this would then
mean that we have many native types (and also many
deserialize/serialize functions).

Assuming that's indeed what you meant, that's an interesting thought,
I didn't consider that much indeed. My thinking was that we only need
to implement send/receive & input/output functions for these types,
and even though their meaning is very different we can serialize them
in the same way.

As you say though, something like that is already true for avg(int4)
today. The way avg(int4) handles this issue is by doing some input
validation for every call to its trans/final/combinefunc (see
int4_avg_accum, int4_avg_combine, and int8_avg). It checks the length
of the array there, but it doesn't check the positiveness of the
count. I think that makes sense. IMHO these functions only need to
protect against crashes (e.g. null pointer dereferences). But I don't
think there is a good reason for them to protect the user against
passing in weird data. These functions aren't really meant to be
called manually in the first place anyway, so if the user does that
and they pass in weird data then I'm fine with them getting a weird
result back, even errors are fine (only crashes are not).

So as long as our input/output & send/receive functions for
state_poly_num_agg handle all the inconsistencies that could cause
crashes later on (which I think is pretty simple to do for
PolyNumAggState), then I don't think we need state_int8_avg,
state_numeric_avg, etc.

> > Not a full review but some initial notes:
> Thank you. I don't have time today, so I'll answer after tomorrow.

Sure, no rush.




RE: Partial aggregates pushdown

2024-06-24 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Jelte, hackers.

Thank you for your proposal and comments.

> From: Jelte Fennema-Nio 
> Sent: Monday, June 24, 2024 6:09 PM
> > 1. Generality
> > I believe we should develop a method that can theoretically apply to any
> aggregate function, even if we cannot implement it immediately. However, I 
> find
> it exceptionally challenging to demonstrate that any internal transtype can be
> universally converted to a native data type for aggregate functions that are
> parallel-safe under the current parallel query feature. Specifically, proving 
> this
> for user-defined aggregate functions presents a significant difficulty in my
> view.
> > On the other hand, I think that the usage of export and import functions can
> theoretically apply to any aggregate functions.
> 
> The only thing required when doing CREATE TYPE is having an INPUT and
> OUTPUT function for the type, which (de)serialize the type to text format. As
> far as I can tell by definition that requirement should be fine for any 
> aggregates
> that we can do partial aggregation pushdown for. To clarify I'm not suggesting
> we should change any of the internal representation of the type for the 
> current
> internal aggregates. I'm suggesting we create new native types (i.e. do CREATE
> TYPE) for those internal representations and then use the name of that type
> instead of internal.
I see. I maybe got your proposal.
Refer to your proposal, for avg(int8), 
I create a new native type like state_int8_avg
with the new typsend/typreceive functions
and use them to transmit the state value, right?

That might seem to be a more fundamental solution
because I can avoid adding export/import functions of my proposal,
which are the new components of aggregate function.
I have never considered the solution.
I appreciate your proposal.

However, I still have the following two questions.

1. Not necessary components of new native types
Refer to pg_type.dat, typinput and typoutput are required.
I think that in your proposal they are not necessary,
so waste. I think that it is not acceptable.
How can I resolve the problem? 

2. Many new native types
I think that, basically, each aggregate function does need a new native type.
For example,
avg(int8), avg(numeric), and var_pop(int4) has the same transtype, 
PolyNumAggState.
You said that it is enough to add only one native type like state_poly_num_agg
for supporting them, right?

But the combine functions of them might have quite different expectation
on the data items of PolyNumAggState like
the range of N(means count) and the true/false of calcSumX2
(the flag of calculating sum of squares).
The final functions of them have the similar expectation.
So, I think that, responded to your proposal,
each of them need a native data type
like state_int8_avg, state_numeric_avg, for safety.

And, we do need a native type for an aggregate function
whose transtype is not internal and not pseudo.
For avg(int4), the current transtype is _int8.
However, I do need a validation check on the number of the array
And the positiveness of count(the first element of the array).
Responded to your proposal,
I do need a new native type like state_int4_avg.

Consequently, I think that, responded to your proposal, finally
each of aggregate functions need a new native type
with typinput and typoutput.
That seems need the same amount of codes and more catalog data,
right?

> > 2. Amount of codes.
> > It could need more codes.
> 
> I think it would be about the same as your proposal. Instead of serializing 
> to an
> intermediary existing type you serialize to string straight away. I think it 
> would
> probably be slightly less code actually, since you don't have to add code to
> handle the new aggpartialexportfn and aggpartialimportfn columns.
> 
> > 3. Concern about performance
> > I'm concerned that altering the current internal data types could impact
> performance.
> 
> As explained above in my proposal all the aggregation code would remain
> unchanged, only new native types will be added. Thus performance won't be
> affected, because all aggregation code will be the same. The only thing that's
> changed is that the internal type now has a name and an INPUT and OUTPUT
> function.
I got it. Thank you.

> Not a full review but some initial notes:
Thank you. I don't have time today, so I'll answer after tomorrow.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2024-06-24 Thread Jelte Fennema-Nio
On Sun, 23 Jun 2024 at 10:24, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I attached the POC patch(the second one) which supports avg(int8) whose 
> standard format is _numeric type.

Okay, very interesting. So instead of defining the
serialization/deserialization functions to text/binary, you serialize
the internal type to an existing non-internal type, which then in turn
gets serialized to text. In the specific case of avg(int8) this is
done to an array of numeric (with length 2).

> However, I do not agree that I modify the internal transtype to the native 
> data type. The reasons are the following three.
> 1. Generality
> I believe we should develop a method that can theoretically apply to any 
> aggregate function, even if we cannot implement it immediately. However, I 
> find it exceptionally challenging to demonstrate that any internal transtype 
> can be universally converted to a native data type for aggregate functions 
> that are parallel-safe under the current parallel query feature. 
> Specifically, proving this for user-defined aggregate functions presents a 
> significant difficulty in my view.
> On the other hand, I think that the usage of export and import functions can 
> theoretically apply to any aggregate functions.

The only thing required when doing CREATE TYPE is having an INPUT and
OUTPUT function for the type, which (de)serialize the type to text
format. As far as I can tell by definition that requirement should be
fine for any aggregates that we can do partial aggregation pushdown
for. To clarify I'm not suggesting we should change any of the
internal representation of the type for the current internal
aggregates. I'm suggesting we create new native types (i.e. do CREATE
TYPE) for those internal representations and then use the name of that
type instead of internal.

> 2. Amount of codes.
> It could need more codes.

I think it would be about the same as your proposal. Instead of
serializing to an intermediary existing type you serialize to string
straight away. I think it would probably be slightly less code
actually, since you don't have to add code to handle the new
aggpartialexportfn and aggpartialimportfn columns.

> 3. Concern about performance
> I'm concerned that altering the current internal data types could impact 
> performance.

As explained above in my proposal all the aggregation code would
remain unchanged, only new native types will be added. Thus
performance won't be affected, because all aggregation code will be
the same. The only thing that's changed is that the internal type now
has a name and an INPUT and OUTPUT function.

> I know. In my proposal, the standard format is not seriarized data by 
> serialfn, instead, is text or other native data type.
> Just to clarify, I'm writing this to avoid any potential misunderstanding.

Ah alright, that definitely clarifies the proposal. I was looking at
the latest patch file on the thread and that one was still using
serialfn. Your new one indeed doesn't, so this is fine.

> Basically responding to your advice,
> for now, I prepare two POC patches.

Great! I definitely think this makes the review/discussion easier.

> The first supports case a, currently covering only avg(int4) and other 
> aggregate functions that do not require import or export functions, such as 
> min, max, and count.

Not a full review but some initial notes:

1. Why does this patch introduce aggpartialpushdownsafe? I'd have
expected that any type with a non-pseudo/internal type as aggtranstype
would be safe to partially push down.
2. It seems the int4_avg_import function shouldn't be part of this
patch (but maybe of a future one).
3. I think splitting this patch in two pieces would make it even
easier to review: First adding support for the new PARTIAL_AGGREGATE
keyword (adds the new feature). Second, using PARTIAL_AGGREGATE in
postgres_fdw (starts using the new feature). Citus would only need the
first patch not the second one, so I think the PARTIAL_AGGREGATE
feature has merit to be added on its own, even without the
postgres_fdw usage.
4. Related to 3, I think it would be good to have some tests of
PARTIAL_AGGREGATE that don't involve postgres_fdw at all. I also
spotted some comments too that mention FDW, even though they apply to
the "pure" PARTIAL_AGGREGATE code.
5. This comment now seems incorrect:
-* Apply the agg's finalfn if one is provided, else return transValue.
+* If the agg's finalfn is provided and PARTIAL_AGGREGATE keyword is
+* not specified, apply the agg's finalfn.
+* If PARTIAL_AGGREGATE keyword is specified and the transValue type
+* is internal, apply the agg's serialfn. In this case the agg's
+* serialfn must not be invalid. Otherwise return transValue.

6. These errors are not on purpose afaict (if they are a comment in
the test would be good to explain why)

+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
+ERROR:  could not connect to server "loopback"
+DETAIL:  inval

RE: Partial aggregates pushdown

2024-06-23 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Alexander, hackers.

> From: Alexander Pyhalov 
> Sent: Tuesday, May 28, 2024 2:45 PM
> The fix was to set child_agg->agg_partial to orig_agg->agg_partial in
> convert_combining_aggrefs(), it's already in the patch, as well as the 
> example -
> without this fix
I've just realized that you've added the necessary tests. I forgot to respond, 
my apologies.

> From: Alexander Pyhalov 
> Sent: Tuesday, May 28, 2024 2:58 PM
> BTW, there's I have an issue with test results in the last version of the 
> patch.
> Attaching regression diffs.
> I have partial sum over c_interval instead of sum(c_interval).
I think the difference stems from the commit[1], which add serialization 
function
to sum(interval). I will fix it.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

[1] 
https://github.com/postgres/postgres/commit/519fc1bd9e9d7b408903e44f55f83f6db30742b7






Re: Partial aggregates pushdown

2024-06-12 Thread Jelte Fennema-Nio
On Wed, 12 Jun 2024 at 07:27, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Could you please clarify what you mean?
> Are you referring to:
>   Option 1: Modifying existing aggregate functions to minimize the use of 
> internal state values.
>   Option 2: Not supporting the push down of partial aggregates for functions 
> with internal state values.

Basically I mean both Option 1 and Option 2 together. i.e. once we do
option 1, supporting partial aggregate pushdown for all important
aggregates with internal state values, then supporting pushdown of
internal state values becomes unnecessary.

> There are many aggregate functions with internal state values, so if we go 
> with Option 1,
> we might need to change a lot of existing code, like transition functions and 
> finalize functions.
> Also, I'm not sure how many existing aggregate functions can be modified this 
> way.

There are indeed 57 aggregate functions with internal state values:

> SELECT count(*) from pg_aggregate where aggtranstype = 'internal'::regtype;
 count
───
57
(1 row)

But there are only 26 different aggtransfns. And the most used 8 of
those cover 39 of those 57 aggregates.

> SELECT
sum (count) OVER(ORDER BY count desc, aggtransfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggtransfn
from pg_aggregate
where aggtranstype = 'internal'::regtype
group by aggtransfn
order by count(*) desc, aggtransfn
);
 cumulative_count │ count │   aggtransfn
──┼───┼
7 │ 7 │ ordered_set_transition
   13 │ 6 │ numeric_accum
   19 │ 6 │ int2_accum
   25 │ 6 │ int4_accum
   31 │ 6 │ int8_accum
   35 │ 4 │ ordered_set_transition_multi
   37 │ 2 │ int8_avg_accum
   39 │ 2 │ numeric_avg_accum
   40 │ 1 │ array_agg_transfn
   41 │ 1 │ json_agg_transfn
   42 │ 1 │ json_object_agg_transfn
   43 │ 1 │ jsonb_agg_transfn
   44 │ 1 │ jsonb_object_agg_transfn
   45 │ 1 │ string_agg_transfn
   46 │ 1 │ bytea_string_agg_transfn
   47 │ 1 │ array_agg_array_transfn
   48 │ 1 │ range_agg_transfn
   49 │ 1 │ multirange_agg_transfn
   50 │ 1 │ json_agg_strict_transfn
   51 │ 1 │ json_object_agg_strict_transfn
   52 │ 1 │ json_object_agg_unique_transfn
   53 │ 1 │ json_object_agg_unique_strict_transfn
   54 │ 1 │ jsonb_agg_strict_transfn
   55 │ 1 │ jsonb_object_agg_strict_transfn
   56 │ 1 │ jsonb_object_agg_unique_transfn
   57 │ 1 │ jsonb_object_agg_unique_strict_transfn
(26 rows)


And actually most of those don't have a serialfn, so they wouldn't be
supported by your suggested approach either. Looking at the
distribution of aggserialfns instead we see the following:

> SELECT
sum (count) OVER(ORDER BY count desc, aggserialfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggserialfn
from pg_aggregate
where
aggtranstype = 'internal'::regtype
AND aggserialfn != 0
group by aggserialfn
order by count(*) desc, aggserialfn
);
 cumulative_count │ count │aggserialfn
──┼───┼───
   12 │12 │ numeric_serialize
   24 │12 │ numeric_poly_serialize
   26 │ 2 │ numeric_avg_serialize
   28 │ 2 │ int8_avg_serialize
   30 │ 2 │ string_agg_serialize
   31 │ 1 │ array_agg_serialize
   32 │ 1 │ array_agg_array_serialize
(7 rows)

So there are only 7 aggserialfns, and thus at most 7 new postgres
types that you would need to create to support the same aggregates as
in your current proposal. But looking at the implementations of these
serialize functions even that is an over-estimation: numeric_serialize
and numeric_avg_serialize both serialize a NumericAggState, and
numeric_poly_serialize and int8_avg_serialize both serialize a
PolyNumAggState. So probably a we could even do with only 5 types. And
to be clear: only converting PolyNumAggState and NumericAggState to
actual postgres types would already cover 28 out of the 32 aggregates.
That seems quite feasible to do.

So I agree it's probably more code than your current approach. At the
very least because you would need to implement in/out text
serialization functions for these internal types that currently don't
have them. But I do think it would be quite a feasible amount. And to
clarify, I see a few benefits of using the approach that I'm
proposing:

1. So far aggserialfn and aggdese

RE: Partial aggregates pushdown

2024-06-11 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte,

Thank you for your comment!
> From: Jelte Fennema-Nio 
> Sent: Tuesday, June 11, 2024 11:00 PM
> How about instead of trying to serialize the output of serialfn/deserialfn, 
> instead we don't use the "internal" type and
> create actual types in pg_type for these transtypes? Then we can simply use 
> the in/out and recv/send functions of those
> types to serialize the values of the partial aggregate over the network.
> Instead of having to rely on serialfn/deserialfn to be network-safe (which 
> they probably aren't).
> 
> That indeed still leaves the pseudo types. Since non of those pseudotypes 
> have a working in/recv function (they always
> error by definition), I agree that we can simply not support those.
> 
> Basically that would mean that any aggregate with a non-internal and 
> non-pseudotype as a transtype could be used in this
> multi-node partial aggregate pushdown.
Could you please clarify what you mean?
Are you referring to:
  Option 1: Modifying existing aggregate functions to minimize the use of 
internal state values.
  Option 2: Not supporting the push down of partial aggregates for functions 
with internal state values.
  Option 3: Something other than Option 1 and Option 2.

There are many aggregate functions with internal state values, so if we go with 
Option 1,
we might need to change a lot of existing code, like transition functions and 
finalize functions.
Also, I'm not sure how many existing aggregate functions can be modified this 
way.

There are also many popular functions with internal state values,
like sum(int8) and avg(int8)(see [1]), so I don't think Option2 would be 
acceptable.

Best regards, Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

[1] 
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/include/catalog/pg_aggregate.dat


Re: Partial aggregates pushdown

2024-06-11 Thread Jelte Fennema-Nio
On Tue, 11 Jun 2024 at 13:40, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> > From: Bruce Momjian 
> > So, we need import/export text representation for the partial aggregate 
> > mode for these eight, and call the base data type
> > text import/export functions for the zero ones when in this mode?
>
> I think that you are basically right.
> But, I think, in a perfect world we should also add an import/export function 
> for the following
> two category.
>
>   Category1. Validation Chek is needed for Safety.
> For example, I think a validation check is needed for avg(float4),
>  whose transition type is not internal. (See p.18 in [1])
> I plan to add import functions of avg, count (See p.18, p.19 in [1]).
>   Category1. Transition type is a pseudo data type.
> Aggregate functions of this category needs to accept many actual data 
> types,
> including user-defined types. So I think that it is hard to implement 
> import/export functions.
> Consequently, I do not plan to support these category. (See p.19 in [1])

How about instead of trying to serialize the output of
serialfn/deserialfn, instead we don't use the "internal" type and
create actual types in pg_type for these transtypes? Then we can
simply use the in/out and recv/send functions of those types to
serialize the values of the partial aggregate over the network.
Instead of having to rely on serialfn/deserialfn to be network-safe
(which they probably aren't).

That indeed still leaves the pseudo types. Since non of those
pseudotypes have a working in/recv function (they always error by
definition), I agree that we can simply not support those.

Basically that would mean that any aggregate with a non-internal and
non-pseudotype as a transtype could be used in this multi-node partial
aggregate pushdown.




RE: Partial aggregates pushdown

2024-06-11 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Bruce.

Sorry for the late. Thank you for comment.

> From: Bruce Momjian 
> Sent: Wednesday, June 5, 2024 11:04 PM
> > > *  Passes unsafe binary data from the foreign server.
> > >
> > > Can someone show me where that last item is in the patch, and why
> > > can't we just pass back values cast to text?
> >
> > In finalize_aggregate() when we see partial aggregate with
> > peragg->aggref->aggtranstype = INTERNALOID
> > we call aggregate's serialization function and return it as bytea.
> >
> > The issue is that this internal representation isn't guaranteed to be
> > compatible between servers of different versions (or architectures?).
> > So, likely, we instead should have called some export function for
> > aggregate and later - some import function on postgres_fdw side. It
> > doesn't matter much, what this export function generates - text, json
> > or some portable binary format,
> > 1) export/import functions should just "understand" it,
> > 2) it should be a stable representation.
> 
> Okay, so looking at the serialization output functions already defined, I see 
> many zeros, which I assume means just the base
> data type, and eight
> more:
> 
>   SELECT DISTINCT aggserialfn from pg_aggregate WHERE aggserialfn::oid != 
> 0;
>   aggserialfn
>   ---
>numeric_avg_serialize
>string_agg_serialize
>array_agg_array_serialize
>numeric_serialize
>int8_avg_serialize
>array_agg_serialize
>interval_avg_serialize
>numeric_poly_serialize
> 
> I realize we need to return the sum and count for average, so that makes 
> sense.
> 
> So, we need import/export text representation for the partial aggregate mode 
> for these eight, and call the base data type
> text import/export functions for the zero ones when in this mode?

I think that you are basically right.
But, I think, in a perfect world we should also add an import/export function 
for the following
two category.

  Category1. Validation Chek is needed for Safety.
For example, I think a validation check is needed for avg(float4),
 whose transition type is not internal. (See p.18 in [1])
I plan to add import functions of avg, count (See p.18, p.19 in [1]).
  Category1. Transition type is a pseudo data type.
Aggregate functions of this category needs to accept many actual data types,
including user-defined types. So I think that it is hard to implement 
import/export functions.
Consequently, I do not plan to support these category. (See p.19 in [1])

Sincerely yours,
Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

[1] 
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf




Re: Partial aggregates pushdown

2024-06-05 Thread Bruce Momjian
On Wed, Jun  5, 2024 at 08:19:04AM +0300, Alexander Pyhalov wrote:
> > *  Passes unsafe binary data from the foreign server.
> > 
> > Can someone show me where that last item is in the patch, and why can't
> > we just pass back values cast to text?
> 
> In finalize_aggregate() when we see partial aggregate with
> peragg->aggref->aggtranstype = INTERNALOID
> we call aggregate's serialization function and return it as bytea.
> 
> The issue is that this internal representation isn't guaranteed to be
> compatible between servers
> of different versions (or architectures?). So, likely, we instead should
> have called some export function for aggregate
> and later - some import function on postgres_fdw side. It doesn't matter
> much, what this export function
> generates - text, json or some portable binary format,
> 1) export/import functions should just "understand" it,
> 2) it should be a stable representation.

Okay, so looking at the serialization output functions already defined, I
see many zeros, which I assume means just the base data type, and eight
more:

SELECT DISTINCT aggserialfn from pg_aggregate WHERE aggserialfn::oid != 
0;
aggserialfn
---
 numeric_avg_serialize
 string_agg_serialize
 array_agg_array_serialize
 numeric_serialize
 int8_avg_serialize
 array_agg_serialize
 interval_avg_serialize
 numeric_poly_serialize

I realize we need to return the sum and count for average, so that makes
sense.

So, we need import/export text representation for the partial aggregate
mode for these eight, and call the base data type text import/export
functions for the zero ones when in this mode?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-06-04 Thread Alexander Pyhalov

Bruce Momjian писал(а) 2024-06-04 20:12:
On Mon, May 27, 2024 at 09:30:59PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:

Hi Mr. Pyhalov.

Sorry for the late reply.
Thank you for your modification and detailed review.
I attach a fixed patch, have been not yet rebased.


I know this patch was discussed at the Vancouver conference.  What are
the open issues?  I know of several:

*  add tests that were requested by Fujii-san and now posted by
   Alexander Pyhalov
*  Where is the documentation?  I know the original patch had some, and
   I improved it, but it seems to be missing.
*  Passes unsafe binary data from the foreign server.

Can someone show me where that last item is in the patch, and why can't
we just pass back values cast to text?


Hi.

In finalize_aggregate() when we see partial aggregate with 
peragg->aggref->aggtranstype = INTERNALOID

we call aggregate's serialization function and return it as bytea.

The issue is that this internal representation isn't guaranteed to be 
compatible between servers
of different versions (or architectures?). So, likely, we instead should 
have called some export function for aggregate
and later - some import function on postgres_fdw side. It doesn't matter 
much, what this export function

generates - text, json or some portable binary format,
1) export/import functions should just "understand" it,
2) it should be a stable representation.
--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2024-06-04 Thread Bruce Momjian
On Wed, Jun  5, 2024 at 12:14:45AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> I will add sufficient document and comments to the next patch as Bruce and 
> Robert said.

Great, I am available to help improve the documentation.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-06-04 Thread Bruce Momjian
On Mon, May 27, 2024 at 09:30:59PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr. Pyhalov.
> 
> Sorry for the late reply.
> Thank you for your modification and detailed review.
> I attach a fixed patch, have been not yet rebased.

I know this patch was discussed at the Vancouver conference.  What are
the open issues?  I know of several:

*  add tests that were requested by Fujii-san and now posted by
   Alexander Pyhalov
*  Where is the documentation?  I know the original patch had some, and
   I improved it, but it seems to be missing.
*  Passes unsafe binary data from the foreign server.

Can someone show me where that last item is in the patch, and why can't
we just pass back values cast to text?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-05-27 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30:

Hi Mr. Pyhalov.

Sorry for the late reply.
Thank you for your modification and detailed review.
I attach a fixed patch, have been not yet rebased.

Monday, 25 March 2024 16:01 Alexander Pyhalov 
:.

Comment in nodeAgg.c seems to be strange:

1079 /*
1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE
keyword is
1081 * not specified, apply the agg's finalfn.
1082 * If PARTIAL_AGGREGATE keyword is specified and the
transValue type
1083 * is internal, apply the agg's serialfn. In this case, if
the agg's
1084 * serialfn must not be invalid. Otherwise return
transValue.
1085 */

Likely, you mean:

... In this case the agg'ss serialfn must not be invalid...

Fixed.


Lower, in the same file, please, correct error message:

1136 if(!OidIsValid(peragg->serialfn_oid))
1137  elog(ERROR, "serialfunc is note provided
for partial aggregate");

it should be "serialfunc is not provided for partial aggregate"

Fixed.


Also something is wrong with the following test :

SELECT /* aggregate <> partial aggregate */
 array_agg(c_int4array), array_agg(b),
 avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
 avg(b::float4), avg(b::float8),
 corr(b::float8, (b * b)::float8),
 covar_pop(b::float8, (b * b)::float8),
 covar_samp(b::float8, (b * b)::float8),
 regr_avgx((2 * b)::float8, b::float8),
.

Its results have changed since last patch. Do they depend on daylight
saving time?

You are right. In my environment, TimeZone is set to 'PST8PDT'
with which timetz values depends on daylight saving time.
Changed TimeZone to 'UTC' in this test.

You can see that filter is applied before append. The result is 
correct

only by chance, as sum in every partition is actually < 700. If you
lower this bound, let's say, to 200, you'll start getting wrong 
results

as data is filtered prior to aggregation.

It seems, however, that in partial case you should just avoid pulling
conditions from having qual at all, all filters will be applied on 
upper

level. Something like

Thank you for your modification.

Found one more problem. You can fire partial aggregate over 
partitioned
table, but convert_combining_aggrefs() will make non-partial copy, 
which

leads to
'variable not found in subplan target list' error.

Thanks for the correction as well.
As you pointed out,
the original patch certainly had the potential to cause problems.
However, I could not actually reproduce the problem in cases such as 
the following.


  Settings:
t(c1, c2) is a patitioned table whose partition key is c1.
t1, t2 are patitions of t and are partitioned table.
t11, t12: partitions of t1 and foreign table of postgres_fdw.
t21, t22: partitions of t2 and foreign table of postgres_fdw.
  Query:
select c2 / 2, sum(c1) from t group by c2 / 2 order by 1

If you have a reproducible example, I would like to add it to
the regression test.
Do you have a reproducible example?


Also denied partial agregates pushdown on server version mismatch.
Should check_partial_aggregate_support be 'true' by default?
Could we discuss this point after we determine how to transfer state 
values?
If we determine this point, we can easly determine whether 
check_partial_aggregate_support shold be 'true' by default.



I'm not sure what to do with current grammar - it precludes partial
distinct aggregates. I understand that it's currently impossible to 
have

partial aggregation for distinct agregates -but does it worth to have
such restriction at grammar level?
If partial aggregation for distinct agregates becomes possible in the 
future,

I see no problem with the policy of accepting new SQL keywords,
such as "PARTIL_AGGREGATE DISTINCT".


BTW, there's I have an issue with test results in the last version of 
the patch. Attaching regression diffs.

I have partial sum over c_interval instead of sum(c_interval).


--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff -U3 /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out
--- /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out	2024-05-28 08:48:27.236520098 +0300
+++ /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out	2024-05-28 08:51:24.395704846 +0300
@@ -10438,15 +10438,15 @@
  ->  Foreign Scan
Output: (PARTIAL array_agg(pagg_tab.c_int4array)), (PARTIAL array_agg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::smallint)), (PARTIAL avg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::bigint)), (PARTIAL avg(pagg_tab.c_interval)), (PARTIAL avg((pagg_tab.b)::real)), (PARTIAL avg((pagg_tab.b)::double precision)), (PARTIAL avg((pagg_tab.b)::numeric)), (PARTIAL corr((pagg_tab.b)::double precision, ((pagg_tab.b * pagg_tab.b))::double precision)), (PARTIAL covar_pop((pagg_tab.b)::double precision, ((pagg_tab.b * pagg_tab.b))::double 

Re: Partial aggregates pushdown

2024-05-27 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30:

Hi Mr. Pyhalov.


Hi.

Found one more problem. You can fire partial aggregate over 
partitioned
table, but convert_combining_aggrefs() will make non-partial copy, 
which

leads to
'variable not found in subplan target list' error.

Thanks for the correction as well.
As you pointed out,
the original patch certainly had the potential to cause problems.
However, I could not actually reproduce the problem in cases such as 
the following.


  Settings:
t(c1, c2) is a patitioned table whose partition key is c1.
t1, t2 are patitions of t and are partitioned table.
t11, t12: partitions of t1 and foreign table of postgres_fdw.
t21, t22: partitions of t2 and foreign table of postgres_fdw.
  Query:
select c2 / 2, sum(c1) from t group by c2 / 2 order by 1

If you have a reproducible example, I would like to add it to
the regression test.
Do you have a reproducible example?



The fix was to set child_agg->agg_partial to orig_agg->agg_partial in 
convert_combining_aggrefs(), it's already in the patch,

as well as the example - without this fix

-- Check partial aggregate over partitioned table
EXPLAIN (VERBOSE, COSTS OFF)
SELECT avg(PARTIAL_AGGREGATE a), avg(a) FROM pagg_tab;

fails with

ERROR:  variable not found in subplan target list
--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2024-03-25 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-03-16 05:28:

Hi. Mr.Pyhalov.

>>
>> I don't see it that way. What we would push to the foreign server
>> would be something like SELECT count(a) FROM t. Then, after we get
>> the results back and combine the various partial counts locally, we
>> would locally evaluate the HAVING clause afterward. That is, partial
>> aggregation is a barrier to pushing down HAVING clause itself, but it
>> doesn't preclude pushing down the aggregation.
> I have made modifications in the attached patch to ensure that when
> the HAVING clause is present, the HAVING clause is executed locally
> while the partial aggregations are pushed down.
>
>

Sorry, I don't see how it works. When we have partial aggregates and 
having clause, foreign_grouping_ok() returns false and

add_foreign_grouping_paths() adds no paths.
I'm not saying it's necessary to fix this in the first patch version.
Our sincere apologies. I had attached an older version before this 
modification.




Hi.

In foreign_grouping_ok() having qual is added to local conds here:

6635 if (is_foreign_expr(root, grouped_rel, 
expr) && !partial)
6636 fpinfo->remote_conds = 
lappend(fpinfo->remote_conds, rinfo);

6637 else
6638 fpinfo->local_conds = 
lappend(fpinfo->local_conds, rinfo);

6639 }
6640 }


This is incorrect. If you look at plan for query in postgres_fdw.sql


-- Partial aggregates are safe to push down when there is a HAVING 
clause

EXPLAIN (VERBOSE, COSTS OFF)
SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING 
sum(a) < 700 ORDER BY 1;
 
QUERY PLAN

-
 Finalize GroupAggregate
   Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
   Group Key: pagg_tab.b
   Filter: (sum(pagg_tab.a) < 700)
   ->  Sort
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))

 Sort Key: pagg_tab.b
 ->  Append
   ->  Foreign Scan
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))

 Filter: ((sum(pagg_tab.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
 Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p1 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))

 Filter: ((sum(pagg_tab_1.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
 Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p2 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))

 Filter: ((sum(pagg_tab_2.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
 Remote SQL: SELECT b, avg(PARTIAL_AGGREGATE a), 
max(a), count(*), sum(a), sum(a) FROM public.pagg_tab_p3 GROUP BY 1



You can see that filter is applied before append. The result is correct 
only by chance, as sum in every partition is actually < 700. If you 
lower this bound, let's say, to 200, you'll start getting wrong results 
as data is filtered prior to aggregation.


It seems, however, that in partial case you should just avoid pulling 
conditions from having qual at all, all filters will be applied on upper 
level. Something like


diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c

index 42eb17ae7c0..54918b9f1a4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6610,7 +6610,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo 
*grouped_rel,
 * Classify the pushable and non-pushable HAVING clauses and 
save them in

 * remote_conds and local_conds of the grouped rel's fpinfo.
 */
-   if (extra->havingQual)
+   if (extra->havingQual && !partial)
{
foreach(lc, (List *) extra->havingQual)
{
@@ -6632,7 +6632,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo 
*grouped_rel,
 
 grouped_rel->relids,
 
 NULL,

Re: Partial aggregates pushdown

2024-03-21 Thread Bruce Momjian
On Thu, Mar 21, 2024 at 11:37:50AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi. Mr.Momjian, Mr.Lane, Mr.Haas, hackers.
> 
> I apologize for any misunderstanding regarding the context of the attached 
> patch and
> the points on which I requested a review. Could you please allow me to 
> clarify?
> 
> In the review around early December 2023, I received the following three 
> issues pointed out by Mr.Haas[1].
> 1. Transmitting state value safely between machines
> 2. Making the patch clearer by adding SQL keywords
> 3. Fixing the behavior when the HAVING clause is present
> 
> In the email sent on February 22, 2024[2], I provided an update on the 
> progress made in addressing these issues.
> Regarding issue 1, I have only provided a proposed solution in the email and 
> have not started the programming. 
> Therefore, the latest patch is not in a commit-ready state. As mentioned 
> later, we have also temporarily reverted the changes made to the 
> documentation.
> Before proceeding with the programming, I would like to discuss the proposed 
> solution with the community and seek consensus.
> If it is necessary to have source code in order to discuss, I can create a 
> simple prototype so that I can receive your feedback.
> Would you be able to provide your opinions on it?
> 
> Regarding issue 2., I have confirmed that creating a prototype allows us to 
> address the issue and clear the patch.
> In this prototype creation, the main purpose was to verify if the patch can 
> be cleared and significant revisions were made to the previous version.
> Therefore, I have removed all the document differences.
> I have submitted a patch [3] that includes the fixes for issue 3. to the 
> patch that was posted in [2].
> Regarding the proposed solution for issue 1, unlike the patch posted in [3], 
> we have a policy of not performing partial aggregation pushdown if we cannot 
> guarantee compatibility and safety.
> The latest patch in [3] is a POC patch. The patch that Mr. Momjian reviewed 
> is this.
> If user-facing documentation is needed for this POC patch, it can be added.
> 
> I apologize for the lack of explanation regarding this positioning, which may 
> have caused misunderstandings regarding the patch posted in [3].

That makes sense.  Let's get you answers to those questions first before
you continue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2024-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Mr.Momjian, Mr.Lane, Mr.Haas, hackers.

I apologize for any misunderstanding regarding the context of the attached 
patch and
the points on which I requested a review. Could you please allow me to clarify?

In the review around early December 2023, I received the following three issues 
pointed out by Mr.Haas[1].
1. Transmitting state value safely between machines
2. Making the patch clearer by adding SQL keywords
3. Fixing the behavior when the HAVING clause is present

In the email sent on February 22, 2024[2], I provided an update on the progress 
made in addressing these issues.
Regarding issue 1, I have only provided a proposed solution in the email and 
have not started the programming. 
Therefore, the latest patch is not in a commit-ready state. As mentioned later, 
we have also temporarily reverted the changes made to the documentation.
Before proceeding with the programming, I would like to discuss the proposed 
solution with the community and seek consensus.
If it is necessary to have source code in order to discuss, I can create a 
simple prototype so that I can receive your feedback.
Would you be able to provide your opinions on it?

Regarding issue 2., I have confirmed that creating a prototype allows us to 
address the issue and clear the patch.
In this prototype creation, the main purpose was to verify if the patch can be 
cleared and significant revisions were made to the previous version.
Therefore, I have removed all the document differences.
I have submitted a patch [3] that includes the fixes for issue 3. to the patch 
that was posted in [2].
Regarding the proposed solution for issue 1, unlike the patch posted in [3], 
we have a policy of not performing partial aggregation pushdown if we cannot 
guarantee compatibility and safety.
The latest patch in [3] is a POC patch. The patch that Mr. Momjian reviewed is 
this.
If user-facing documentation is needed for this POC patch, it can be added.

I apologize for the lack of explanation regarding this positioning, which may 
have caused misunderstandings regarding the patch posted in [3].

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYCrtOvk2f32qQKZV%3DjNL35tandf2A2Dp_2F5ASuiG1BA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/TYAPR01MB5514F0CBD9CD4F84A261198195562%40TYAPR01MB5514.jpnprd01.prod.outlook.com
[3] 
https://www.postgresql.org/message-id/TYAPR01MB55141D18188AC86ADCE35FCB952F2%40TYAPR01MB5514.jpnprd01.prod.outlook.com

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2024-03-19 Thread Bruce Momjian
On Tue, Mar 19, 2024 at 05:29:07PM -0400, Tom Lane wrote:
> I'd like to vociferously protest both of those decisions.
> 
> "No version check by default" means "unsafe by default", which is not
> project style in general and is especially not so for postgres_fdw.
> We have tried very hard for years to ensure that postgres_fdw will
> work with a wide range of remote server versions, and generally been
> extremely conservative about what we think will work (example:
> collations); but this patch seems ready to throw that principle away.
> 
> Also, surely "remoteversion < PG_VERSION_NUM" is backwards.  What
> this would mean is that nobody can ever change a partial aggregate's
> implementation, because that would break queries issued from older
> servers (that couldn't know about the change) to newer ones.

Well it is the origin server that is issuing the PUSHDOWN syntax, so an
older origin server should be able to push to a newer remote server.

> Realistically, I think it's fairly unsafe to try aggregate pushdown
> to anything but the same PG major version; otherwise, you're buying
> into knowing which aggregates have partial support in which versions,
> as well as predicting the future about incompatible state changes.

Yes, incompatible state changes would be a problem with an older origin
server with a newer remote server setup.

If we require matching versions, we must accept that upgrades will
require more downtime.

> Even that isn't bulletproof --- e.g, maybe somebody wasn't careful
> about endianness-independence of the serialized partial state, making
> it unsafe to ship --- so there had better be a switch whereby the user
> can disable it.

Makes sense.  I was also wondering how a user would know whether the
pushdown is happening, or not.

> Maybe we could define a three-way setting:
> 
> * default: push down partial aggs only to same major PG version
> * disable: don't push down, period
> * force: push down regardless of remote version

What would be the default?  If it is the first one, it requires a
remote version check on first in the session.

> With the "force" setting, it's the user's responsibility not to
> issue any remote-able aggregation that would be unsafe to push
> down.  This is still a pretty crude tool: I can foresee people
> wanting to have per-aggregate control over things, especially
> extension-supplied aggregates.  But it'd do for starters.

We have the postgres_fdw extensions option to control function pushdown
to extensions.

> I'm not super thrilled by the fact that the patch contains zero
> user-facing documentation, even though it's created new SQL syntax,
> not to mention a new postgres_fdw option.  I assume this means that
> nobody thinks it's anywhere near ready to commit.

Previous versions of the patch had docs since I know I worked on
improving them.  I am not sure what happened to them.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-03-19 Thread Tom Lane
Bruce Momjian  writes:
> The current patch has:

>   if ((OidIsValid(aggform->aggfinalfn) ||
>   (aggform->aggtranstype == INTERNALOID)) &&
>   fpinfo->check_partial_aggregate_support)
>   {
>   if (fpinfo->remoteversion == 0)
>   {
>   PGconn *conn = GetConnection(fpinfo->user, false, 
> NULL);

>   fpinfo->remoteversion = PQserverVersion(conn);
>   }

>   if (fpinfo->remoteversion < PG_VERSION_NUM)
>   partial_agg_ok = false;
>   }

> It uses check_partial_aggregate_support, which defaults to false,
> meaning partial aggregates will be pushed down with no version check by
> default.  If set to true, pushdown will happen if the remote server is
> the same version or newer, which seems acceptable to me.

I'd like to vociferously protest both of those decisions.

"No version check by default" means "unsafe by default", which is not
project style in general and is especially not so for postgres_fdw.
We have tried very hard for years to ensure that postgres_fdw will
work with a wide range of remote server versions, and generally been
extremely conservative about what we think will work (example:
collations); but this patch seems ready to throw that principle away.

Also, surely "remoteversion < PG_VERSION_NUM" is backwards.  What
this would mean is that nobody can ever change a partial aggregate's
implementation, because that would break queries issued from older
servers (that couldn't know about the change) to newer ones.

Realistically, I think it's fairly unsafe to try aggregate pushdown
to anything but the same PG major version; otherwise, you're buying
into knowing which aggregates have partial support in which versions,
as well as predicting the future about incompatible state changes.
Even that isn't bulletproof --- e.g, maybe somebody wasn't careful
about endianness-independence of the serialized partial state, making
it unsafe to ship --- so there had better be a switch whereby the user
can disable it.

Maybe we could define a three-way setting:

* default: push down partial aggs only to same major PG version
* disable: don't push down, period
* force: push down regardless of remote version

With the "force" setting, it's the user's responsibility not to
issue any remote-able aggregation that would be unsafe to push
down.  This is still a pretty crude tool: I can foresee people
wanting to have per-aggregate control over things, especially
extension-supplied aggregates.  But it'd do for starters.

I'm not super thrilled by the fact that the patch contains zero
user-facing documentation, even though it's created new SQL syntax,
not to mention a new postgres_fdw option.  I assume this means that
nobody thinks it's anywhere near ready to commit.

regards, tom lane




Re: Partial aggregates pushdown

2024-03-19 Thread Bruce Momjian
On Sat, Mar 16, 2024 at 02:28:50AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi. Mr.Pyhalov.
>
> > From: Alexander Pyhalov  Sent: Wednesday,
> > February 28, 2024 10:43 PM
> > > 1. Transmitting state value safely between machines
> > >> From: Robert Haas  Sent: Wednesday,
> > >> December 6, 2023 10:25 PM the problems around transmitting
> > >> serialized bytea blobs between machines that can't be assumed to
> > >> fully trust each other will need to be addressed in some way,
> > >> which seems like it will require a good deal of design work,
> > >> forming some kind of consensus, and then implementation work to
> > >> follow.
> > > I have considered methods for safely transmitting state values
> > > between different machines.  I have taken into account the version
> > > policy of PostgreSQL (5 years of support) and the major version
> > > release cycle over the past 10 years (1 year), and as a result, I
> > > have made the assumption that transmission is allowed only when
> > > the difference between the local version and the remote version
> > > is 5 or less.  I believe that by adding new components, "export
> > > function" and "import function", to the aggregate functions, and
> > > further introducing a new SQL keyword to the query syntax of
> > > aggregate expressions, we can address this issue.
> >
> > I honestly think that achieving cross-version compatibility in
> > this way puts a significant burden on developers. Can we instead
> > always use the more or less universal export and import function
> > to fix possible issues with binary representations on different
> > architectures and just refuse to push down partial aggregates on
> > server version mismatch? At least at the first step?
>
> Thank you for your comment. I agree with your point that the proposed
> method would impose a significant burden on developers. In order
> to ensure cross-version compatibility, it is necessary to impose
> constraints on the format of the state values exchanged between
> servers, which would indeed burden developers. As you mentioned, I
> think that it is realistic to allow partial aggregation pushdown only
> when coordinating between the same versions in the first step.

The current patch has:

  if ((OidIsValid(aggform->aggfinalfn) ||
  (aggform->aggtranstype == INTERNALOID)) &&
  fpinfo->check_partial_aggregate_support)
  {
  if (fpinfo->remoteversion == 0)
  {
  PGconn *conn = GetConnection(fpinfo->user, false, 
NULL);

  fpinfo->remoteversion = PQserverVersion(conn);
  }

  if (fpinfo->remoteversion < PG_VERSION_NUM)
  partial_agg_ok = false;
  }

It uses check_partial_aggregate_support, which defaults to false,
meaning partial aggregates will be pushed down with no version check by
default.  If set to true, pushdown will happen if the remote server is
the same version or newer, which seems acceptable to me.

FYI, the patch is much smaller now.  :-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2024-02-28 Thread Alexander Pyhalov

Hi.

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20:

Hi. Mr.Haas, hackers.

I apologize for the significant delay since my last post.
I have conducted investigations and considerations regarding the 
remaining tasks as follows.

Would it be possible for you to review them?
In particular, could you please confirm if the approach mentioned in 1. 
is acceptable?
If there are no issues with the direction outlined in 1., I plan to 
make a simple prototype based on this approach.


1. Transmitting state value safely between machines

From: Robert Haas 
Sent: Wednesday, December 6, 2023 10:25 PM
the problems around transmitting
serialized bytea blobs between machines that can't be assumed to fully 
trust each other will need to be addressed in some
way, which seems like it will require a good deal of design work, 
forming some kind of consensus, and then implementation

work to follow.
I have considered methods for safely transmitting state values between 
different machines.
I have taken into account the version policy of PostgreSQL (5 years of 
support) and the major version release cycle over the past 10 years (1 
year), and as a result, I have made the assumption that transmission is 
allowed only when the difference between the local version and the 
remote version is 5 or less.
I believe that by adding new components, "export function" and "import 
function", to the aggregate functions, and further introducing a new 
SQL keyword to the query syntax of aggregate expressions, we can 
address this issue.
If the version of the local server is higher than or equal to the 
version of the remote server, the proposed method can be simplified. 
The export version mentioned later in (1) would not be necessary. 
Furthermore, if the version of the local server matches the version of 
the remote server, the proposed method can be further simplified.
I would appreciate your input on reasonable assumptions regarding the 
differences in versions between the local server and the remote server.
I will explain the specifications of the export function, import 
function, the new SQL keyword for aggregate expressions, and the 
behavior of query processing for partial aggregation separately.

(1) Export Function Specification
This function is another final function for partial aggregate.
This function converts the state value that represents the result of 
partial aggregation into a format that can be read by the local server.
This function is called instead of the existing finalfunc during the 
final stage of aggregation when performing partial aggregation.

The conversion process described above will be referred to as "export".
The argument of an export function is the version of the server that 
will receive the return value.

Hereafter, this version will be referred to as the export version.
The concept of an export version is necessary to handle cases where the 
version of the local server is smaller than the version of the remote 
server.
The return value of the export function is the transformed state value, 
and its data type is bytea.
For backward compatibility, the developer of the export function must 
ensure that the export can be performed for major versions up to five 
versions prior to the major version of PostgreSQL that the export 
function is being developed for.
For built-in functions, I believe it is necessary to allow for the 
possibility of not developing the export functionality for specific 
versions in the future (due to reasons such as development burden) 
after the export function is developed for a certain version.
To achieve this, for built-in functions, we will add a column to the 
pg_aggregate catalog that indicates the presence or absence of export 
functionality for each major version, including the major version being 
developed and the previous five major versions. This column will be 
named safety_export_versions and will have a data type of boolean[6].
For user-defined functions, we will refer to the extensions option and 
add an external server option called safety_export_extensions, which 
will maintain a list of extensions that include only the aggregate 
functions that can be exported to the local server version.

...


I honestly think that achieving cross-version compatibility in this way 
puts a significant burden on developers. Can we instead always use the 
more or less universal export and import function to fix possible issues 
with binary representations on different architectures and just refuse 
to push down partial aggregates on server version mismatch? At least at 
the first step?




3. Fixing the behavior when the HAVING clause is present

From: Robert Haas 
Sent: Tuesday, November 28, 2023 4:08 AM

On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov 
 wrote:

> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but fore

Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2024-01-26 Thread vignesh C
On Thu, 7 Dec 2023 at 05:41, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
>
> Hi Mr.Haas.
>
> > -Original Message-
> > From: Robert Haas 
> > Sent: Wednesday, December 6, 2023 10:25 PM
> > On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
> >  wrote:
> > > Are you concerned about the hassle and potential human errors of
> > > manually adding new partial aggregation functions, rather than the 
> > > catalog becoming bloated?
> >
> > I'm concerned about both.
> Understood. Thank you for your response.
>
> > > The process of creating partial aggregation functions from aggregation
> > > functions can be automated, so I believe this issue can be resolved.
> > > However, automating it may increase the size of the patch even more, so 
> > > overall, approach#2 might be better.
> > > To implement approach #2, it would be necessary to investigate how much 
> > > additional code is required.
> >
> > Yes. Unfortunately I fear that there is quite a lot of work left to do here 
> > in order to produce a committable feature. To me it
> > seems necessary to conduct an investigation of approach #2. If the result 
> > of that investigation is that nothing major
> > stands in the way of approach #2, then I think we should adopt it, which is 
> > more work. In addition, the problems around
> > transmitting serialized bytea blobs between machines that can't be assumed 
> > to fully trust each other will need to be
> > addressed in some way, which seems like it will require a good deal of 
> > design work, forming some kind of consensus, and
> > then implementation work to follow. In addition to that there may be some 
> > small problems that need to be solved at a
> > detail level, such as the HAVING issue. I think the last category won't be 
> > too hard to sort out, but that still leaves two really
> > major areas to address.
> Yes, I agree with you. It is clear that further investigation and discussion 
> are still needed.
> I would be grateful if we can resolve this issue gradually. I would also like 
> to continue the discussion if possible in the future.

Thanks for all the efforts on this patch. I have changed the status of
the commitfest entry to "Returned with Feedback" as there is still
some work to get this patch out. Feel free to continue the discussion
and add a new entry when the patch is in a reviewable shape.

Regards,
Vignesh




Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-11 Thread Robert Haas
On Thu, Dec 7, 2023 at 4:12 PM Bruce Momjian  wrote:
> Second, the patch already has a mechanism to check the remote server
> version to see if it is the same or newer.   Here is the version check
> documentation patch:

Right. This feature can certainly be implemented in a
backward-compatible way. I'm not sure that we have as much control
over what does and does not get pushed down as we really want here,
but it's completely possible to do this in a way that doesn't break
other use cases.

> However, functions don't have pre-created records, and internal
> functions don't see to have an SQL-defined structure, but as I remember
> the internal aggregate functions all take the same internal structure,
> so I guess we only need one fixed input and one output that would
> output/input such records.  Performance might be an issue, but at this
> point let's just implement this and measure the overhead since there are
> few/any(?) other viable options.

IMHO records will be the easiest approach, but it will be some work to try it.

> Fourth, going with #2 where we do the pushdown using an SQL keyword also
> allows extensions to automatically work, while requiring partial
> aggregate functions for every non-partial aggregate will require work
> for extensions, and potentially lead to more version mismatch issues.

Yeah.

> Finally, I am now concerned that this will not be able to be in PG 17,
> which I was hoping for.

Getting it ready to ship by March seems very difficult. I'm not saying
it couldn't happen, but I think more likely it won't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-07 Thread Bruce Momjian
On Thu, Dec  7, 2023 at 09:56:08AM -0500, Robert Haas wrote:
> On Wed, Dec 6, 2023 at 7:11 PM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > I would be grateful if we can resolve this issue gradually. I would also 
> > like to continue the discussion if possible in the future.
> 
> I think that would be good. Thanks for your work on this. It is a hard 
> problem.

Agreed.  First, Robert is right that this feature is long overdue.  It
might not help many of our existing workloads, but it opens us up to
handling new, larger workloads.

Second, the patch already has a mechanism to check the remote server
version to see if it is the same or newer.   Here is the version check
documentation patch:

check_partial_aggregate_support (boolean)

If this option is false, postgres_fdw always
uses partial aggregate pushdown by assuming that each built-in
aggregate function has a partial aggregate function defined on
the remote server.  If this option is true, local aggregates
whose partial computation function references itself are assumed
to exist on the remote server.  If not, during query planning,
postgres_fdw will connect to the remote
server and retrieve the remote server version.  If the remote
version is the same or newer, partial aggregate functions will be
assumed to exist.  If older, postgres_fdw
checks that the remote server has a matching partial aggregate
function before performing partial aggregate pushdown.  The default
is false.

There is also an extension list that specifies which extension-owned
functions can be pushed down;  from the doc patch:

To reduce the risk of misexecution of queries, WHERE clauses and
aggregate expressions are not sent to the remote server unless they
only use data types, operators, and functions that are built-in
or belong to an extension that is listed in the foreign server's
extensions option.

Third, we already have a way of creating records for tables:

SELECT pg_language FROM pg_language;
pg_language
---
 (12,internal,10,f,f,0,0,2246,)
 (13,c,10,f,f,0,0,2247,)
 (14,sql,10,f,t,0,0,2248,)
 (13576,plpgsql,10,t,t,13573,13574,13575,)

And we do have record input functionality:

CREATE TABLE test (x int, language pg_language);

INSERT INTO test SELECT 0, pg_language FROM pg_language;

SELECT * FROM test;
 x | language
---+---
 0 | (12,internal,10,f,f,0,0,2246,)
 0 | (13,c,10,f,f,0,0,2247,)
 0 | (14,sql,10,f,t,0,0,2248,)
 0 | (13576,plpgsql,10,t,t,13573,13574,13575,)
(4 rows)

However, functions don't have pre-created records, and internal
functions don't see to have an SQL-defined structure, but as I remember
the internal aggregate functions all take the same internal structure,
so I guess we only need one fixed input and one output that would
output/input such records.  Performance might be an issue, but at this
point let's just implement this and measure the overhead since there are
few/any(?) other viable options.

Fourth, going with #2 where we do the pushdown using an SQL keyword also
allows extensions to automatically work, while requiring partial
aggregate functions for every non-partial aggregate will require work
for extensions, and potentially lead to more version mismatch issues.

Finally, I am now concerned that this will not be able to be in PG 17,
which I was hoping for.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-07 Thread Robert Haas
On Wed, Dec 6, 2023 at 7:11 PM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I would be grateful if we can resolve this issue gradually. I would also like 
> to continue the discussion if possible in the future.

I think that would be good. Thanks for your work on this. It is a hard problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Haas.

> -Original Message-
> From: Robert Haas 
> Sent: Wednesday, December 6, 2023 10:25 PM
> On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Are you concerned about the hassle and potential human errors of
> > manually adding new partial aggregation functions, rather than the catalog 
> > becoming bloated?
> 
> I'm concerned about both.
Understood. Thank you for your response.

> > The process of creating partial aggregation functions from aggregation
> > functions can be automated, so I believe this issue can be resolved.
> > However, automating it may increase the size of the patch even more, so 
> > overall, approach#2 might be better.
> > To implement approach #2, it would be necessary to investigate how much 
> > additional code is required.
> 
> Yes. Unfortunately I fear that there is quite a lot of work left to do here 
> in order to produce a committable feature. To me it
> seems necessary to conduct an investigation of approach #2. If the result of 
> that investigation is that nothing major
> stands in the way of approach #2, then I think we should adopt it, which is 
> more work. In addition, the problems around
> transmitting serialized bytea blobs between machines that can't be assumed to 
> fully trust each other will need to be
> addressed in some way, which seems like it will require a good deal of design 
> work, forming some kind of consensus, and
> then implementation work to follow. In addition to that there may be some 
> small problems that need to be solved at a
> detail level, such as the HAVING issue. I think the last category won't be 
> too hard to sort out, but that still leaves two really
> major areas to address.
Yes, I agree with you. It is clear that further investigation and discussion 
are still needed. 
I would be grateful if we can resolve this issue gradually. I would also like 
to continue the discussion if possible in the future.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2023-12-06 Thread Robert Haas
On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Are you concerned about the hassle and potential human errors of manually 
> adding new partial
> aggregation functions, rather than the catalog becoming bloated?

I'm concerned about both.

> The process of creating partial aggregation functions from aggregation 
> functions can be automated,
> so I believe this issue can be resolved. However, automating it may increase 
> the size of the patch
> even more, so overall, approach#2 might be better.
> To implement approach #2, it would be necessary to investigate how much 
> additional code is required.

Yes. Unfortunately I fear that there is quite a lot of work left to do
here in order to produce a committable feature. To me it seems
necessary to conduct an investigation of approach #2. If the result of
that investigation is that nothing major stands in the way of approach
#2, then I think we should adopt it, which is more work. In addition,
the problems around transmitting serialized bytea blobs between
machines that can't be assumed to fully trust each other will need to
be addressed in some way, which seems like it will require a good deal
of design work, forming some kind of consensus, and then
implementation work to follow. In addition to that there may be some
small problems that need to be solved at a detail level, such as the
HAVING issue. I think the last category won't be too hard to sort out,
but that still leaves two really major areas to address.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Haas, hackers.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 5:03 AM
> Also, I want to make one other point here about security and reliability. 
> Right now, there is no way for a user to feed
> arbitrary data to a deserialization function. Since serialization and 
> deserialization functions are only used in the context of
> parallel query, we always know that the data fed to the deserialization 
> function must have come from the serialization
> function on the same machine. Nor can users call the deserialization function 
> directly with arbitrary data of their own
> choosing, because users cannot call functions that take or return internal. 
> But with this system, it becomes possible to
> feed arbitrary data to a deserialization function.
> The user could redefine the function on the remote side so that it produces 
> arbitrary data of their choosing, and the local
> deserialization function will ingest it.
> 
> That's potentially quite a significant problem. Consider for example that 
> numericvar_deserialize() does no validity
> checking on any of the weight, sign, or dscale, but not all values for those 
> fields are legal. Right now that doesn't matter,
> but if you can feed arbitrary data to that function, then it is. I don't know 
> exactly what the consequences are if you can get
> that function to spit out a NumericVar with values outside the normal legal 
> range. What can you do then?
> Store a bogus numeric on disk? Crash the server? Worst case, some problem 
> like this could be a security issue allowing for
> escalation to superuser; more likely, it would be a crash bug, corrupt your 
> database, or lead to unexpected and strange
> error messages.
> 
> Unfortunately, I have the unpleasant suspicion that most internal-type 
> aggregates will be affected by this problem.
> Consider, for example, string_agg_deserialize(). Generally, strings are one 
> of the least-constrained data types, so you
> might hope that this function would be OK. But it doesn't look very 
> promising. The first int4 in the serialized representation
> is the cursor, which would have to be bounds-checked, lest someone provide a 
> cursor that falls outside the bounds of the
> StringInfo and, maybe, cause a reference to an arbitrary memory location. 
> Then the rest of the representation is the actual
> data, which could be anything. This function is used for both bytea and for 
> text, and for bytea, letting the payload be
> anything is OK.
> But for text, the supplied data shouldn't contain an embedded zero byte, or 
> otherwise be invalid in the server encoding. If
> it were, that would provide a vector to inject invalidly encoded data into 
> the database. This feature can't be allowed to do
> that.
I completely overlooked this issue. I should have considered the risks of 
sending raw state values or serialized state
data directly from remote to local. I apologize.

> What could be a solution to this class of problems? One option is to just 
> give up on supporting this feature for internal-type
> aggregates for now. That's easy enough to do, and just means we have less 
> functionality, but it's sad because that's
> functionality we'd like to have. Another approach is to add necessary sanity 
> checks to the relevant deserialization
> functions, but that seems a little hard to get right, and it would slow down 
> parallel query cases which are probably going to
> be more common than the use of this feature. I think the slowdown might be 
> significant, too. A third option is to change
> those aggregates in some way, like giving them a transition function that 
> operates on some data type other than internal,
> but there again we have to be careful of slowdowns. A final option is to 
> rethink the infrastructure in some way, like having
> a way to serialize to something other than bytea, for which we already have 
> input functions with adequate checks. For
> instance, if string_agg_serialize() produced a record containing an integer 
> column and a text or bytea column, we could
> attempt to ingest that record on the other side and presumably the right 
> things would happen in the case of any invalid
> data. But I'm not quite sure what infrastructure would be required to make 
> this kind of idea work.
Thank you very much for providing a direction towards resolving this issue.
As you have suggested as the last option, it seems that expanding the current 
mechanism of the aggregation
function is the only choice. It may take some time, but I will consider 
specific solutions.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 4:08 AM
> On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov  
> wrote:
> > Hi. HAVING is also a problem. Consider the following query
> >
> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> > foreign server as HAVING needs full aggregate result, but foreign
> > server don't know it.
> 
> I don't see it that way. What we would push to the foreig

Re: Partial aggregates pushdown

2023-11-28 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Nov 27, 2023 at 4:23 PM Tom Lane  wrote:
> > Well, one of the founding principles of postgres_fdw was to be able
> > to talk to PG servers that are not of the same version as yours.
> > If we break that in the name of performance, we are going to have
> > a lot of unhappy users.  Even the ones who do get the benefit of
> > the speedup are going to be unhappy when it breaks because they
> > didn't upgrade local and remote at exactly the same time.
> 
> I agree with this.

+1.  We do want to continue to make this work- to the extent possible.
I don't think there's any problem with saying that when talking to an
older server, you don't get the same capabilities as you do when talking
to a newer server.

> > Just because we'd like to have it doesn't make the patch workable
> > in the real world.
> 
> And also with this in concept - I'd like to plan arbitrarily
> complicated queries perfectly and near-instantly, and then execute
> them at faster-than-light speed, but we can't. However, I don't
> understand the fatalism with respect to the feature at hand. As I said
> before, it's not like no other product has made this work. Sure, some
> of those products may not have the extensible system of data types
> that we do, or may not care about cross-version communication, but
> those don't seem like good enough reasons to just immediately give up.

Certainly there are other projects out there which are based on PG that
have managed to make this work and work really quite well.

> TBH, I suspect even some PG forks have made this work, like maybe PGXC
> or PGXL, although I don't know for certain. We might not like the
> trade-offs they made to get there, but we haven't even talked through
> possible design ideas yet, so it seems way too early to give up.

Yes, Citus[1] and Greenplum[2], to just name two.

I certainly understand the concern around the security of this and would
have thought the approach we'd use would be to not just take internal
state and pass it along but rather to provide a way for aggregates to
opt-in to supporting this and have them serialize/deserialize with
new dedicated functions that have appropriate checks to avoid bad things
happening.  That could also be versioned, perhaps, if we feel that's
necessary (I'm a bit skeptical, but it would hopefully address the
concern about different versions having different data that they want to
pass along).

> One of the things that I think is a problem in this area is that the
> ways we have to configure FDW connections are just not very rich.

Agreed.

> We're trying to cram everything into a set of strings that can be
> attached to the foreign server or the user mapping, but that's not a
> very good fit for something like how all the local SQL functions that
> might exist map onto all of the remote SQL functions that might exist.
> Now you might well say that we don't want the act of configuring a
> foreign data wrapper to be insanely complicated, and I would agree
> with that. But, on the other hand, as Larry Wall once said, a good
> programming language makes simple things simple and complicated things
> possible. I think our current configuration system is only
> accomplishing the first of those goals.

We've already got issues in this area with extensions- there's no way
for a user to say what version of an extension exists on the remote side
and no way for an extension to do anything different based on that
information.  Perhaps we could work on a solution to both of these
issues, but at the least I don't see holding back on this effort for a
problem that already exists but which we've happily accepted because of
the benefit it provides, like being able to push-down postgis bounding
box conditionals to allow for indexed lookups.

Thanks,

Stephen

[1]: https://docs.citusdata.com/en/v11.1/develop/reference_sql.html
[2]: 
https://postgresconf.org/conferences/Beijing/program/proposals/implementation-of-distributed-aggregation-in-greenplum


signature.asc
Description: PGP signature


Re: Partial aggregates pushdown

2023-11-28 Thread Robert Haas
On Tue, Nov 28, 2023 at 5:24 AM Ashutosh Bapat
 wrote:
> If my memory serves me right, PGXC implemented partial aggregation
> only when the output of partial aggregate was a SQL data type
> (non-Internal, non-Unknown). But I may be wrong. But at that time,
> JSONB wasn't there or wasn't that widespread.
>
> Problem with Internal is it's just a binary string whose content can
> change across version and which can be interpreted differently across
> different versions. There is no metadata in it to know how to
> interpret it. We can add that metadata to JSONB. The result of partial
> aggregate can be sent as a JSONB. If the local server finds the JSONB
> familiar it will construct the right partial aggregate value otherwise
> it will throw an error. If there's a way to even avoid that error (by
> looking at server version etc.) the error can be avoided too. But
> JSONB leaves very very less chance that the value will be interpreted
> wrong. Downside is we are tying PARTIAL's output to be JSONB thus
> tying SQL syntax with a data type.
>
> Does that look acceptable?

If somebody had gone to the trouble of making this work, and had done
a good job, I wouldn't vote against it, but in a vacuum, I'm not sure
it's the best design. The problem in my view is that working with JSON
is not actually very pleasant. It's super-easy to generate, and
super-easy for humans to read. But parsing and validating it is a
pain. You basically have to have two parsers, one to do syntactical
validation and then a second one to ensure that the structure of the
document and the contents of each item are as expected. See
parse_manifest.c for an example of what I mean by that. Now, if we add
new code, it can reuse the JSON parser we've already got, so it's not
that you need to write a new JSON parser for every new application of
JSON, but the semantic validator (a la parse_manifest.c) isn't
necessarily any less code than a whole new parser for a bespoke
format.

To make that a bit more concrete, for something like string_agg(), is
it easier to write a validator for the existing deserialization
function that accepts a bytea blob, or to write a validator for a JSON
blob that we could be passing instead? My suspicion is that the former
is less work and easier to verify, but it's possible I'm wrong about
that and they're more or less equal. I don't really see any way that
the JSON thing is straight-up better; at best it's a toss-up in terms
of amount of code. Now somebody could still make an argument that they
would like JSON better because it would be more useful for some
purpose other than this feature, and that is fine, but here I'm just
thinking about this feature in particular.

My personal suspicion is that the easiest way to support internal-type
aggregates here is to convert them to use an array or record type as a
transition state instead, or maybe serialize the internal state to one
of those things instead of to bytea. I suspect that would allow us to
leverage more of our existing validation infrastructure than using
JSON or sticking with bytea. But I'm certainly amenable to other
points of view. I'm not trying to pretend that my gut feeling is
necessarily correct; I'm just explaining what I currently think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Partial aggregates pushdown

2023-11-28 Thread Ashutosh Bapat
On Tue, Nov 28, 2023 at 5:21 AM Robert Haas  wrote:
>
> TBH, I suspect even some PG forks have made this work, like maybe PGXC
> or PGXL, although I don't know for certain. We might not like the
> trade-offs they made to get there, but we haven't even talked through
> possible design ideas yet, so it seems way too early to give up.

If my memory serves me right, PGXC implemented partial aggregation
only when the output of partial aggregate was a SQL data type
(non-Internal, non-Unknown). But I may be wrong. But at that time,
JSONB wasn't there or wasn't that widespread.

Problem with Internal is it's just a binary string whose content can
change across version and which can be interpreted differently across
different versions. There is no metadata in it to know how to
interpret it. We can add that metadata to JSONB. The result of partial
aggregate can be sent as a JSONB. If the local server finds the JSONB
familiar it will construct the right partial aggregate value otherwise
it will throw an error. If there's a way to even avoid that error (by
looking at server version etc.) the error can be avoided too. But
JSONB leaves very very less chance that the value will be interpreted
wrong. Downside is we are tying PARTIAL's output to be JSONB thus
tying SQL syntax with a data type.

Does that look acceptable?

-- 
Best Wishes,
Ashutosh Bapat




Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
First of all, that last email of mine was snippy, and I apologize for it. Sorry.

That said:

On Mon, Nov 27, 2023 at 4:23 PM Tom Lane  wrote:
> Well, one of the founding principles of postgres_fdw was to be able
> to talk to PG servers that are not of the same version as yours.
> If we break that in the name of performance, we are going to have
> a lot of unhappy users.  Even the ones who do get the benefit of
> the speedup are going to be unhappy when it breaks because they
> didn't upgrade local and remote at exactly the same time.

I agree with this.

> Just because we'd like to have it doesn't make the patch workable
> in the real world.

And also with this in concept - I'd like to plan arbitrarily
complicated queries perfectly and near-instantly, and then execute
them at faster-than-light speed, but we can't. However, I don't
understand the fatalism with respect to the feature at hand. As I said
before, it's not like no other product has made this work. Sure, some
of those products may not have the extensible system of data types
that we do, or may not care about cross-version communication, but
those don't seem like good enough reasons to just immediately give up.

TBH, I suspect even some PG forks have made this work, like maybe PGXC
or PGXL, although I don't know for certain. We might not like the
trade-offs they made to get there, but we haven't even talked through
possible design ideas yet, so it seems way too early to give up.

One of the things that I think is a problem in this area is that the
ways we have to configure FDW connections are just not very rich.
We're trying to cram everything into a set of strings that can be
attached to the foreign server or the user mapping, but that's not a
very good fit for something like how all the local SQL functions that
might exist map onto all of the remote SQL functions that might exist.
Now you might well say that we don't want the act of configuring a
foreign data wrapper to be insanely complicated, and I would agree
with that. But, on the other hand, as Larry Wall once said, a good
programming language makes simple things simple and complicated things
possible. I think our current configuration system is only
accomplishing the first of those goals.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Partial aggregates pushdown

2023-11-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 27, 2023 at 3:59 PM Tom Lane  wrote:
>> TBH, I think this entire proposal is dead in the water.  Which is
>> sad from a performance standpoint, but I can't see any way that
>> we would not regret shipping a feature that makes such assumptions.

> I think it's ridiculous to just hold our breath and pretend like this
> feature isn't needed -- it's at least half a decade overdue. We engage
> in endless hand-wringing over local-remote symmetry in cases where
> other systems seem to effortlessly make that assumption and then get
> on with building new features.

Well, one of the founding principles of postgres_fdw was to be able
to talk to PG servers that are not of the same version as yours.
If we break that in the name of performance, we are going to have
a lot of unhappy users.  Even the ones who do get the benefit of
the speedup are going to be unhappy when it breaks because they
didn't upgrade local and remote at exactly the same time.

Just because we'd like to have it doesn't make the patch workable
in the real world.

regards, tom lane




Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Mon, Nov 27, 2023 at 3:59 PM Tom Lane  wrote:
> Even if the partial-aggregate serialization value isn't "internal"
> but some more-narrowly-defined type, it is still an internal
> implementation detail of the aggregate.  You have no right to assume
> that the remote server implements the aggregate the same way the
> local one does.  If we start making such an assumption then we'll
> be unable to revise the implementation of an aggregate ever again.
>
> TBH, I think this entire proposal is dead in the water.  Which is
> sad from a performance standpoint, but I can't see any way that
> we would not regret shipping a feature that makes such assumptions.

I think it's ridiculous to just hold our breath and pretend like this
feature isn't needed -- it's at least half a decade overdue. We engage
in endless hand-wringing over local-remote symmetry in cases where
other systems seem to effortlessly make that assumption and then get
on with building new features. It's not that I disagree with the
concern; we're *already* doing stuff that is unprincipled in a bunch
of different areas and that could and occasionally does cause queries
that push things to the remote side to return wrong answers, and I
hate that. But the response to that can't be to refuse to add new
features and maybe rip out the features we already have. Users don't
like it when pushdown causes queries to return wrong answers, but they
like it even less when the pushdown doesn't happen in the first place
and the query runs until the heat death of the universe. I'm not
entirely sure what the right design ideas are here, but giving up and
refusing to add features ought to be completely off the table.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Partial aggregates pushdown

2023-11-27 Thread Tom Lane
Robert Haas  writes:
> Also, I want to make one other point here about security and
> reliability. Right now, there is no way for a user to feed arbitrary
> data to a deserialization function. Since serialization and
> deserialization functions are only used in the context of parallel
> query, we always know that the data fed to the deserialization
> function must have come from the serialization function on the same
> machine. Nor can users call the deserialization function directly with
> arbitrary data of their own choosing, because users cannot call
> functions that take or return internal. But with this system, it
> becomes possible to feed arbitrary data to a deserialization function.

Ouch.  That is absolutely horrid --- we have a lot of stuff that
depends on users not being able to get at "internal" values, and
it sounds like the current proposal breaks all of that.

Quite aside from security concerns, there is no justification for
assuming that the "internal" values used on one platform/PG version
are identical to those used on another.  So if the idea is to
ship back "internal" values from the remote server to the local one,
I think it's basically impossible to make that work.

Even if the partial-aggregate serialization value isn't "internal"
but some more-narrowly-defined type, it is still an internal
implementation detail of the aggregate.  You have no right to assume
that the remote server implements the aggregate the same way the
local one does.  If we start making such an assumption then we'll
be unable to revise the implementation of an aggregate ever again.

TBH, I think this entire proposal is dead in the water.  Which is
sad from a performance standpoint, but I can't see any way that
we would not regret shipping a feature that makes such assumptions.

regards, tom lane




Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Wed, Nov 22, 2023 at 5:16 AM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I did not choose Approach2 because I was not confident that the disadvantage 
> mentioned in 2.(2)(a)
> would be accepted by the PostgreSQL development community.
> If it is accepted, I think Approach 2 is smarter.
> Could you please provide your opinion on which
> approach is preferable after comparing these two approaches?
> If we cannot say anything without comparing the amount of source code, as 
> Mr.Momjian mentioned,
> we need to estimate the amount of source code required to implement Approach2.

I've had the same concern, that approach #2 would draw objections, so
I think you were right to be cautious about it. I don't think it is a
wonderful approach in all ways, but I do think that it is superior to
approach #1. If we add dedicated support to the grammar, it is mostly
a one-time effort, and after that, there should not be much need for
anyone to be concerned about it. If we instead add extra aggregates,
then that generates extra work every time someone writes a patch that
adds a new aggregate to core. I have a difficult time believing that
anyone will prefer an approach that involves an ongoing maintenance
effort of that type over one that doesn't.

One point that seems to me to be of particular importance is that if
we add new aggregates, there is a risk that some future aggregate
might do that incorrectly, so that the main aggregate works, but the
secondary aggregate created for this feature does not work. That seems
like it would be very frustrating for future code authors so I'd like
to avoid the risk as much as we can.

Also, I want to make one other point here about security and
reliability. Right now, there is no way for a user to feed arbitrary
data to a deserialization function. Since serialization and
deserialization functions are only used in the context of parallel
query, we always know that the data fed to the deserialization
function must have come from the serialization function on the same
machine. Nor can users call the deserialization function directly with
arbitrary data of their own choosing, because users cannot call
functions that take or return internal. But with this system, it
becomes possible to feed arbitrary data to a deserialization function.
The user could redefine the function on the remote side so that it
produces arbitrary data of their choosing, and the local
deserialization function will ingest it.

That's potentially quite a significant problem. Consider for example
that numericvar_deserialize() does no validity checking on any of the
weight, sign, or dscale, but not all values for those fields are
legal. Right now that doesn't matter, but if you can feed arbitrary
data to that function, then it is. I don't know exactly what the
consequences are if you can get that function to spit out a NumericVar
with values outside the normal legal range. What can you do then?
Store a bogus numeric on disk? Crash the server? Worst case, some
problem like this could be a security issue allowing for escalation to
superuser; more likely, it would be a crash bug, corrupt your
database, or lead to unexpected and strange error messages.

Unfortunately, I have the unpleasant suspicion that most internal-type
aggregates will be affected by this problem. Consider, for example,
string_agg_deserialize(). Generally, strings are one of the
least-constrained data types, so you might hope that this function
would be OK. But it doesn't look very promising. The first int4 in the
serialized representation is the cursor, which would have to be
bounds-checked, lest someone provide a cursor that falls outside the
bounds of the StringInfo and, maybe, cause a reference to an arbitrary
memory location. Then the rest of the representation is the actual
data, which could be anything. This function is used for both bytea
and for text, and for bytea, letting the payload be anything is OK.
But for text, the supplied data shouldn't contain an embedded zero
byte, or otherwise be invalid in the server encoding. If it were, that
would provide a vector to inject invalidly encoded data into the
database. This feature can't be allowed to do that.

What could be a solution to this class of problems? One option is to
just give up on supporting this feature for internal-type aggregates
for now. That's easy enough to do, and just means we have less
functionality, but it's sad because that's functionality we'd like to
have. Another approach is to add necessary sanity checks to the
relevant deserialization functions, but that seems a little hard to
get right, and it would slow down parallel query cases which are
probably going to be more common than the use of this feature. I think
the slowdown might be significant, too. A third option is to change
those aggregates in some way, like giving them a transition function
that operates on some data type other than internal, but there again
we have to be careful of slowdowns. A final option i

Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov
 wrote:
> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but foreign server
> don't know it.

I don't see it that way. What we would push to the foreign server
would be something like SELECT count(a) FROM t. Then, after we get the
results back and combine the various partial counts locally, we would
locally evaluate the HAVING clause afterward. That is, partial
aggregation is a barrier to pushing down HAVING clause itself, but it
doesn't preclude pushing down the aggregation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: Partial aggregates pushdown

2023-11-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Haas, hackers.

> From: Bruce Momjian 
> Sent: Thursday, November 23, 2023 6:16 AM
> On Wed, Nov 22, 2023 at 10:16:16AM +, 
> fujii.y...@df.mitsubishielectric.co.jp wrote:
> > 2. Approach 2
> > (1) Advantages
> > (a) No need to add partial aggregate functions to the catalogs for
> > each aggregation
> > (2) Disadvantages
> > (a) Need to add non-standard keywords to the SQL syntax.
> >
> > I did not choose Approach2 because I was not confident that the
> > disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL 
> > development community.
> > If it is accepted, I think Approach 2 is smarter.
> > Could you please provide your opinion on which approach is preferable
> > after comparing these two approaches?
> 
> I didn't know #2 was possible, but given the great number of catalog entries, 
> doing it in the SQL grammar seems cleaner
> to me.
Thank you for comments. Yes, I understand.

> From: Bruce Momjian 
> Sent: Wednesday, November 22, 2023 5:34 AM
> On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote:
> > On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian  wrote:
> > > > I do have a concern about this, though. It adds a lot of bloat. It
> > > > adds a whole lot of additional entries to pg_aggregate, and every
> > > > new aggregate we add in the future will require a bonus entry for
> > > > this, and it needs a bunch of new pg_proc entries as well. One
> > > > idea that I've had in the past is to instead introduce syntax that
> > > > just does this, without requiring a separate aggregate definition in 
> > > > each case.
> > > > For example, maybe instead of changing string_agg(whatever) to
> > > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > > > something. Then all aggregates could be treated in a generic way.
> > > > I'm not completely sure that's better, but I think it's worth 
> > > > considering.
> > >
> > > So use an SQL keyword to indicates a pushdown call?  We could then
> > > automate the behavior rather than requiring special catalog functions?
> >
> > Right. It would require more infrastructure in the parser, planner,
> > and executor, but it would be infinitely reusable instead of needing a
> > new thing for every aggregate. I think that might be better, but to be
> > honest I'm not totally sure.
> 
> It would make it automatic.  I guess we need to look at how big the patch is 
> to do it.
I will investigate specifically which parts of the PostgreSQL source code need 
to be modified and how big the patch will be if you take this approach.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2023-11-22 Thread Bruce Momjian
On Wed, Nov 22, 2023 at 10:16:16AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> 2. Approach 2
> (1) Advantages
> (a) No need to add partial aggregate functions to the catalogs for each 
> aggregation
> (2) Disadvantages
> (a) Need to add non-standard keywords to the SQL syntax.
> 
> I did not choose Approach2 because I was not confident that the disadvantage 
> mentioned in 2.(2)(a)
> would be accepted by the PostgreSQL development community.
> If it is accepted, I think Approach 2 is smarter.
> Could you please provide your opinion on which
> approach is preferable after comparing these two approaches?

I didn't know #2 was possible, but given the great number of catalog
entries, doing it in the SQL grammar seems cleaner to me.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-11-22 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas 
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a 
> whole lot of additional entries to pg_aggregate, and
> every new aggregate we add in the future will require a bonus entry for this, 
> and it needs a bunch of new pg_proc entries
> as well. One idea that I've had in the past is to instead introduce syntax 
> that just does this, without requiring a separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to 
> string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. 
> Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's 
> worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the 
approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, 
pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation 
requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation 
functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and 
pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to 
manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain 
aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related 
functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the 
pg_aggregate catalog and pg_proc catalog from pg_aggregate.dat and pg_proc.dat
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each 
aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage 
mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as 
Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2023-11-21 Thread Alexander Pyhalov

Robert Haas писал 2023-11-21 20:16:


> I don't think the patch does a good job explaining why HAVING,
> DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> shouldn't really be a problem, because HAVING is basically a WHERE
> clause that occurs after aggregation is complete, and whether or not
> the aggregation is safe shouldn't depend on what we're going to do
> with the value afterward. The HAVING clause can't necessarily be
> pushed to the remote side, but I don't see how or why it could make
> the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> little trickier: if we pushed down DISTINCT, we'd still have to
> re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> BY, we'd have to do a merge pass to combine the returned values unless
> we could prove that the partitions were non-overlapping ranges that
> would be visited in the correct order. Although that all sounds
> doable, I think it's probably a good thing that the current patch
> doesn't try to handle it -- this is complicated already. But it should
> explain why it's not handling it and maybe even a bit about how it
> could be handling in the future, rather than just saying "well, this
> kind of thing is not safe." The trouble with that explanation is that
> it does nothing to help the reader understand whether the thing in
> question is *fundamentally* unsafe or whether we just don't have the
> right code to make it work.

Makes sense.


Actually, I think I was wrong about this. We can't handle ORDER BY or
DISTINCT because we can't distinct-ify or order after we've already
partially aggregated. At least not in general, and not without
additional aggregate support functions. So what I said above was wrong
with respect to those. Or so I believe, anyway. But I still don't see
why HAVING should be a problem.


Hi. HAVING is also a problem. Consider the following query

SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
foreign server as HAVING needs full aggregate result, but foreign server
don't know it.

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2023-11-21 Thread Bruce Momjian
On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote:
> On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian  wrote:
> > > I do have a concern about this, though. It adds a lot of bloat. It
> > > adds a whole lot of additional entries to pg_aggregate, and every new
> > > aggregate we add in the future will require a bonus entry for this,
> > > and it needs a bunch of new pg_proc entries as well. One idea that
> > > I've had in the past is to instead introduce syntax that just does
> > > this, without requiring a separate aggregate definition in each case.
> > > For example, maybe instead of changing string_agg(whatever) to
> > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > > something. Then all aggregates could be treated in a generic way. I'm
> > > not completely sure that's better, but I think it's worth considering.
> >
> > So use an SQL keyword to indicates a pushdown call?  We could then
> > automate the behavior rather than requiring special catalog functions?
> 
> Right. It would require more infrastructure in the parser, planner,
> and executor, but it would be infinitely reusable instead of needing a
> new thing for every aggregate. I think that might be better, but to be
> honest I'm not totally sure.

It would make it automatic.  I guess we need to look at how big the
patch is to do it.

> > > I don't think the patch does a good job explaining why HAVING,
> > > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> > > shouldn't really be a problem, because HAVING is basically a WHERE
> > > clause that occurs after aggregation is complete, and whether or not
> > > the aggregation is safe shouldn't depend on what we're going to do
> > > with the value afterward. The HAVING clause can't necessarily be
> > > pushed to the remote side, but I don't see how or why it could make
> > > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> > > little trickier: if we pushed down DISTINCT, we'd still have to
> > > re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> > > BY, we'd have to do a merge pass to combine the returned values unless
> > > we could prove that the partitions were non-overlapping ranges that
> > > would be visited in the correct order. Although that all sounds
> > > doable, I think it's probably a good thing that the current patch
> > > doesn't try to handle it -- this is complicated already. But it should
> > > explain why it's not handling it and maybe even a bit about how it
> > > could be handling in the future, rather than just saying "well, this
> > > kind of thing is not safe." The trouble with that explanation is that
> > > it does nothing to help the reader understand whether the thing in
> > > question is *fundamentally* unsafe or whether we just don't have the
> > > right code to make it work.
> >
> > Makes sense.
> 
> Actually, I think I was wrong about this. We can't handle ORDER BY or
> DISTINCT because we can't distinct-ify or order after we've already
> partially aggregated. At least not in general, and not without
> additional aggregate support functions. So what I said above was wrong
> with respect to those. Or so I believe, anyway. But I still don't see
> why HAVING should be a problem.

This should probably be documented in the patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-11-21 Thread Robert Haas
On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian  wrote:
> > I do have a concern about this, though. It adds a lot of bloat. It
> > adds a whole lot of additional entries to pg_aggregate, and every new
> > aggregate we add in the future will require a bonus entry for this,
> > and it needs a bunch of new pg_proc entries as well. One idea that
> > I've had in the past is to instead introduce syntax that just does
> > this, without requiring a separate aggregate definition in each case.
> > For example, maybe instead of changing string_agg(whatever) to
> > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > something. Then all aggregates could be treated in a generic way. I'm
> > not completely sure that's better, but I think it's worth considering.
>
> So use an SQL keyword to indicates a pushdown call?  We could then
> automate the behavior rather than requiring special catalog functions?

Right. It would require more infrastructure in the parser, planner,
and executor, but it would be infinitely reusable instead of needing a
new thing for every aggregate. I think that might be better, but to be
honest I'm not totally sure.

> > I don't think the patch does a good job explaining why HAVING,
> > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> > shouldn't really be a problem, because HAVING is basically a WHERE
> > clause that occurs after aggregation is complete, and whether or not
> > the aggregation is safe shouldn't depend on what we're going to do
> > with the value afterward. The HAVING clause can't necessarily be
> > pushed to the remote side, but I don't see how or why it could make
> > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> > little trickier: if we pushed down DISTINCT, we'd still have to
> > re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> > BY, we'd have to do a merge pass to combine the returned values unless
> > we could prove that the partitions were non-overlapping ranges that
> > would be visited in the correct order. Although that all sounds
> > doable, I think it's probably a good thing that the current patch
> > doesn't try to handle it -- this is complicated already. But it should
> > explain why it's not handling it and maybe even a bit about how it
> > could be handling in the future, rather than just saying "well, this
> > kind of thing is not safe." The trouble with that explanation is that
> > it does nothing to help the reader understand whether the thing in
> > question is *fundamentally* unsafe or whether we just don't have the
> > right code to make it work.
>
> Makes sense.

Actually, I think I was wrong about this. We can't handle ORDER BY or
DISTINCT because we can't distinct-ify or order after we've already
partially aggregated. At least not in general, and not without
additional aggregate support functions. So what I said above was wrong
with respect to those. Or so I believe, anyway. But I still don't see
why HAVING should be a problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Partial aggregates pushdown

2023-11-20 Thread Bruce Momjian
On Mon, Nov 20, 2023 at 03:51:33PM -0500, Robert Haas wrote:
> On Mon, Nov 13, 2023 at 3:26 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > In postgres_fdw.sql, I have corrected the output format for floating point 
> > numbers
> > by extra_float_digits.
> 
> Looking at this, I find that it's not at all clear to me how the
> partial aggregate function is defined. Let's look at what we have for
> documentation:
> 
> +  
> +   Paraemter AGGPARTIALFUNC optionally defines a
> +   partial aggregate function used for partial aggregate pushdown; see
> +for details.
> +  
> 
> +   Partial aggregate function (zero if none).
> +   See  for the definition
> +   of partial aggregate function.
> 
> +   Partial aggregate pushdown is an optimization for queries that contains
> +   aggregate expressions for a partitioned table across one or more remote
> +   servers. If multiple conditions are met, partial aggregate function
> 
> +   When partial aggregate pushdown is used for aggregate expressions,
> +   remote queries replace aggregate function calls with partial
> +   aggregate function calls.  If the data type of the state value is not
> 
> But there's no definition of what the behavior of the function is
> anywhere that I can see, not even in  id="partial-aggregate-pushdown">. Everywhere it only describes how the
> partial aggregate function is used, not what it is supposed to do.

Yes, I had to figure that out myself, and I was wondering how much
detail to have in our docs vs README files vs. C comments.  I think we
should put more details somewhere.

> Looking at the changes in pg_aggregate.dat, it seems like the partial
> aggregate function is a second aggregate defined in a way that mostly
> matches the original, except that (1) if the original final function
> would have returned a data type other than internal, then the final
> function is removed; and (2) if the original final function would have
> returned a value of internal type, then the final function is the
> serialization function of the original aggregate. I think that's a
> reasonable definition, but the documentation and code comments need to
> be a lot clearer.

Agreed.  I wasn't sure enough about this to add it when I was reviewing
the patch.

> I do have a concern about this, though. It adds a lot of bloat. It
> adds a whole lot of additional entries to pg_aggregate, and every new
> aggregate we add in the future will require a bonus entry for this,
> and it needs a bunch of new pg_proc entries as well. One idea that
> I've had in the past is to instead introduce syntax that just does
> this, without requiring a separate aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to
> string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> something. Then all aggregates could be treated in a generic way. I'm
> not completely sure that's better, but I think it's worth considering.

So use an SQL keyword to indicates a pushdown call?  We could then
automate the behavior rather than requiring special catalog functions?

> I think that the control mechanism needs some thought. Right now,
> there are two possible behaviors: either we assume that the local and
> remote sides are the same unconditionally, or we assume that they're
> the same if the remote side is a new enough version. I do like having
> those behaviors available, but I wonder if we need to do something
> better or different. What if somebody wants to push down a
> non-built-in aggregate, for example? I realize that we don't have

It does allow specification of extensions that can be pushed down.

> great solutions to the problem of knowing which functions are
> push-downable in general, and I don't know that partial aggregation
> needs to be any better than anything else, but it's probably worth
> comparing and contrasting the approach we take here with the
> approaches we've taken in other, similar cases. From that point of
> view, I think check_partial_aggregate_support is a novelty: we don't
> do those kinds of checks in other cases, AFAIK. But on the other hand,
> there is the 'extensions' argument to postgres_fdw.

Right.  I am not sure how to improve what the patch does.

> I don't think the patch does a good job explaining why HAVING,
> DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> shouldn't really be a problem, because HAVING is basically a WHERE
> clause that occurs after aggregation is complete, and whether or not
> the aggregation is safe shouldn't depend on what we're going to do
> with the value afterward. The HAVING clause can't necessarily be
> pushed to the remote side, but I don't see how or why it could make
> the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> little trickier: if we pushed down DISTINCT, we'd still have to
> re-DISTINCT-ify when combining locally, and if we pushed down ORDER

Re: Partial aggregates pushdown

2023-11-20 Thread Robert Haas
On Mon, Nov 13, 2023 at 3:26 AM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> In postgres_fdw.sql, I have corrected the output format for floating point 
> numbers
> by extra_float_digits.

Looking at this, I find that it's not at all clear to me how the
partial aggregate function is defined. Let's look at what we have for
documentation:

+  
+   Paraemter AGGPARTIALFUNC optionally defines a
+   partial aggregate function used for partial aggregate pushdown; see
+for details.
+  

+   Partial aggregate function (zero if none).
+   See  for the definition
+   of partial aggregate function.

+   Partial aggregate pushdown is an optimization for queries that contains
+   aggregate expressions for a partitioned table across one or more remote
+   servers. If multiple conditions are met, partial aggregate function

+   When partial aggregate pushdown is used for aggregate expressions,
+   remote queries replace aggregate function calls with partial
+   aggregate function calls.  If the data type of the state value is not

But there's no definition of what the behavior of the function is
anywhere that I can see, not even in . Everywhere it only describes how the
partial aggregate function is used, not what it is supposed to do.

Looking at the changes in pg_aggregate.dat, it seems like the partial
aggregate function is a second aggregate defined in a way that mostly
matches the original, except that (1) if the original final function
would have returned a data type other than internal, then the final
function is removed; and (2) if the original final function would have
returned a value of internal type, then the final function is the
serialization function of the original aggregate. I think that's a
reasonable definition, but the documentation and code comments need to
be a lot clearer.

I do have a concern about this, though. It adds a lot of bloat. It
adds a whole lot of additional entries to pg_aggregate, and every new
aggregate we add in the future will require a bonus entry for this,
and it needs a bunch of new pg_proc entries as well. One idea that
I've had in the past is to instead introduce syntax that just does
this, without requiring a separate aggregate definition in each case.
For example, maybe instead of changing string_agg(whatever) to
string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
something. Then all aggregates could be treated in a generic way. I'm
not completely sure that's better, but I think it's worth considering.

I think that the control mechanism needs some thought. Right now,
there are two possible behaviors: either we assume that the local and
remote sides are the same unconditionally, or we assume that they're
the same if the remote side is a new enough version. I do like having
those behaviors available, but I wonder if we need to do something
better or different. What if somebody wants to push down a
non-built-in aggregate, for example? I realize that we don't have
great solutions to the problem of knowing which functions are
push-downable in general, and I don't know that partial aggregation
needs to be any better than anything else, but it's probably worth
comparing and contrasting the approach we take here with the
approaches we've taken in other, similar cases. From that point of
view, I think check_partial_aggregate_support is a novelty: we don't
do those kinds of checks in other cases, AFAIK. But on the other hand,
there is the 'extensions' argument to postgres_fdw.

I don't think the patch does a good job explaining why HAVING,
DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
shouldn't really be a problem, because HAVING is basically a WHERE
clause that occurs after aggregation is complete, and whether or not
the aggregation is safe shouldn't depend on what we're going to do
with the value afterward. The HAVING clause can't necessarily be
pushed to the remote side, but I don't see how or why it could make
the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
little trickier: if we pushed down DISTINCT, we'd still have to
re-DISTINCT-ify when combining locally, and if we pushed down ORDER
BY, we'd have to do a merge pass to combine the returned values unless
we could prove that the partitions were non-overlapping ranges that
would be visited in the correct order. Although that all sounds
doable, I think it's probably a good thing that the current patch
doesn't try to handle it -- this is complicated already. But it should
explain why it's not handling it and maybe even a bit about how it
could be handling in the future, rather than just saying "well, this
kind of thing is not safe." The trouble with that explanation is that
it does nothing to help the reader understand whether the thing in
question is *fundamentally* unsafe or whether we just don't have the
right code to make it work.

Typo: Paraemter

I'm so sorry to keep complaining about comm

Re: Partial aggregates pushdown

2023-10-26 Thread Bruce Momjian
On Fri, Oct 27, 2023 at 02:44:42AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Momjian.
> 
> Thank you for your improvement.
> As a matter of detail, I think that the areas marked below are erroneous.
> 
> --
> +   Pushdown causes aggregate function cals to send partial aggregate
>  ^
> +   function calls to the remote server. If the partial aggregate
> +   function doesn't doesn't exist on the remote server, it causes
>  ^^^
> --

Agreed.  Do you want to fix that on your vesion?  I don't have any more
improvements to make.

> > What else needs to be done before committers start to review
> > this?
> There are no others. May I make a new version of v31 with your
> suggested improvements for the committer's review?

Yes, please.  I think the updated docs will help people understand how
the patch works.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-10-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Momjian.

Thank you for your improvement.
As a matter of detail, I think that the areas marked below are erroneous.

--
+   Pushdown causes aggregate function cals to send partial aggregate
 ^
+   function calls to the remote server. If the partial aggregate
+   function doesn't doesn't exist on the remote server, it causes
 ^^^
--

> What else needs to be done before committers start to review
> this?
There are no others. May I make a new version of v31 with your
suggested improvements for the committer's review?

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2023-10-26 Thread Bruce Momjian
On Thu, Oct 26, 2023 at 11:11:09AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> >and checks if the remote server version is older than the local 
> > server version.
> >If so,
> >postgres_fdw
> > -->assumes that for each built-in aggregate function, the partial 
> > aggregate function is not defined
> > -->on the remote server unless the partial aggregate function and the 
> > aggregate
> > -->function match.
> >Otherwise postgres_fdw assumes that for each 
> > built-in aggregate function,
> >the partial aggregate function is defined on the remote server.
> >The default is false.
> >   
> >  
> > 
> > 
> > What does that marked sentence mean?  What is match?  Are one or both of 
> > these remote?  It sounds like you are
> > checking the local aggregate against the remote partial aggregate, but I 
> > don't see any code that does this in the patch.
> This sentence means that
> "If the partial aggregate function has the same OID as the aggregate function,
> then postgres_fdw assumes that for each built-in aggregate function, the 
> partial aggregate function is not defined
>  on the remote server."
> "Match" means that the partial aggregate function has the same OID as the 
> aggregate function in local server.
> But, in v30, there is no code which checks the partial aggregate function has 
> the same OID as the aggregate function in local server.
> So I modified the code of is_builtin_aggpartialfunc_shippable().
> Also, I modified wording postgres-fdw.sgml.

Yes, that is what I needed.  Attached is a modification of your v31
patch (the most recent) that mostly improves the documentation and
comments.  What else needs to be done before committers start to review
this?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.


fdw_partial.diff.gz
Description: application/gzip


Re: Partial aggregates pushdown

2023-10-25 Thread Bruce Momjian
On Tue, Oct 24, 2023 at 12:12:41AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Momjian.
> 
> > Fujii-san, to get this patch closer to finished, can I modify this version 
> > of the patch to improve some wording and post an
> > updated version you can use for future changes?
> Yes, I greatly appreciate your offer.
> I would very much appreciate your modifications.

I am almost done updating the patch, but I got stuck on how the feature
is supposed to work.  This documentation sentence is where I got
confused:


 check_partial_aggregate_support 
(boolean)
 
  
   If this option is false, postgres_fdw assumes
   that for each built-in aggregate function,
   the partial aggregate function is defined on the remote server
   without checking the remote server version.
   If this option is true, during query planning,
   postgres_fdw connects to the remote server
   and checks if the remote server version is older than the local 
server version.
   If so,
   postgres_fdw
-->assumes that for each built-in aggregate function, the partial 
aggregate function is not defined
-->on the remote server unless the partial aggregate function and the 
aggregate
-->function match.
   Otherwise postgres_fdw assumes that for each 
built-in aggregate function,
   the partial aggregate function is defined on the remote server.
   The default is false.
  
 


What does that marked sentence mean?  What is match?  Are one or both of
these remote?  It sounds like you are checking the local aggregate
against the remote partial aggregate, but I don't see any code that does
this in the patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-10-23 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

> Fujii-san, to get this patch closer to finished, can I modify this version of 
> the patch to improve some wording and post an
> updated version you can use for future changes?
Yes, I greatly appreciate your offer.
I would very much appreciate your modifications.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2023-10-23 Thread Bruce Momjian
On Wed, Oct 18, 2023 at 05:22:34AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi hackers.
> 
> Because there is a degrade in pg_dump.c, I fixed it.

Fujii-san, to get this patch closer to finished, can I modify this
version of the patch to improve some wording and post an updated version
you can use for future changes?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-09-27 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-28 07:40:


I'm not sure that I like this mechanics of adding sort group clauses -
it seems we do in core additional work, which is of use only for
one extension, but at least it seems to be working.

We cannot deparse the original sort group clauses and pathtarget
when performing partial aggreggate pushdown by any FDWs.
So I think the additional sort group clauses and pathtarget are
needed by any FDWs, not only postgres_fdw.



Hi.
It seems to me that *fdw postfixes don't clarify things, but just make 
naming more ugly.


+ * Adding these Vars and PlaceHolderVars to PathTarget,
+ * FDW cannot deparse this by the original List of SortGroupClauses.
+ * So, before this adding process,
+ * setGroupClausePartial generates another Pathtarget and another
+ * List of SortGroupClauses for FDW.

It seems that something like:

/*
 * Modified PathTarget cannot be used by FDW as-is to deparse this 
statement.

 * So, before modifying PathTarget, setGroupClausePartial generates
 * another Pathtarget and another list List of SortGroupClauses
 * to make deparsing possible.
 */

sounds better.

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2023-09-27 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-27 01:35:

Hi Mr.Momjian, Mr.Pyhalov.

Tuesday, 26 September 2023 22:15 Alexander Pyhalov 
:

Do you mean that extra->partial_target->sortgrouprefs is not replaced,
and so we preserve tlesortgroupref numbers?

Yes, that is correct.


I'm suspicious about rewriting extra->partial_target->exprs with
partial_target->exprs - I'm still not sure why we
  don't we loose information, added by add_column_to_pathtarget() to
extra->partial_target->exprs?



Hi.

In postgres_fdw.sql

"Partial aggregates are unsafe to push down having clause when there are 
partial aggregates" - this comment likely should be fixed.


Some comments should be added to setGroupClausePartial() and to 
make_partial_grouping_target() - especially why setGroupClausePartial()

is called prior to add_new_columns_to_pathtarget().

I'm not sure that I like this mechanics of adding sort group clauses - 
it seems we do in core additional work, which is of use only for

one extension, but at least it seems to be working.

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2023-09-26 Thread Bruce Momjian
On Tue, Sep 26, 2023 at 06:26:25AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Bruce.
> > I think this needs to be explained in the docs.  I am ready to adjust the 
> > patch to improve the wording whenever you are
> > ready.  Should I do it now and post an updated version for you to use?
> The following explanation was omitted from the documentation, so I added it.
> > *  false - no check
> > *  true, remove version < sender - check
> I have responded to your comment, but if there is a problem with the wording, 
> could you please suggest a correction?

I like your new wording, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-09-26 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-25 06:18:

Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers.

Thank you for your valuable comments. I sincerely apologize for the
very late reply.
Here is a response to your comments or a fix to the patch.

Tuesday, August 8, 2023 at 3:31 Bruce Momjian

> I have modified the program except for the point "if the version of the remote 
server is less than PG17".
> Instead, we have addressed the following.
> "If check_partial_aggregate_support is true and the remote server version is 
older than the local server
> version, postgres_fdw does not assume that the partial aggregate function is 
on the remote server unless
> the partial aggregate function and the aggregate function match."
> The reason for this is to maintain compatibility with any aggregate function 
that does not support partial
> aggregate in one version of V1 (V1 is PG17 or higher), even if the next 
version supports partial aggregate.
> For example, string_agg does not support partial aggregation in PG15, but it 
will support partial aggregation
> in PG16.

Just to clarify, I think you are saying:

If check_partial_aggregate_support is true and the remote 
server

version is older than the local server version, postgres_fdw
checks if the partial aggregate function exists on the remote
server during planning and only uses it if it does.

I tried to phrase it in a positive way, and mentioned the plan time
distinction.  Also, I am sorry I was away for most of July and am just
getting to this.

Thanks for your comment. In the documentation, the description of
check_partial_aggregate_support is as follows
(please see postgres-fdw.sgml).
--
check_partial_aggregate_support (boolean)
Only if this option is true, during query planning, postgres_fdw
connects to the remote server and check if the remote server version
is older than the local server version. If so, postgres_fdw assumes
that for each built-in aggregate function, the partial aggregate
function is not defined on the remote server unless the partial
aggregate function and the aggregate function match. The default is
false.
--

Thursday, 20 July 2023 19:23 Alexander Pyhalov 
:

fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.

> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
>

Hi.
Sorry, but I don't see how it could work.

We apologize for any inconvenience caused.
Thanks to Pyhalov's and Jim's comments, I have realized that I have
made a fundamental mistake regarding the pushdown of the HAVING clause
and the difficulty of achieving it performing Partial aggregate
pushdown.
So, I removed the codes about pushdown of the HAVING clause performing
Partial aggregate pushdown.

Thursday, 20 July 2023 19:23 Alexander Pyhalov 
:

As for changes in planner.c (setGroupClausePartial()) I have several
questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when
partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs
after processing. Why are we sure that after this tleSortGroupRef is
correct?

Response to 1)
The code you pointed out was unnecessary. I have removed this code.
Also, the process of adding PlaceHolderVar's expr to partial_target was 
missing.

So I fixed this.

Response to 2)
The making procedures extra->groupClausePartial and 
extra->partial_target

in make_partial_grouping_target for this patch is as follows.
STEP1. From grouping_target->exprs, extract Aggref, Var and
Placeholdervar that are not included in Aggref.
STEP2. setGroupClausePartial sets the copy of original groupClause to
extra->groupClausePartial
and sets the copy of original partial_target to extra->partial_target.
STEP3. setGroupClausePartial adds Var and Placeholdervar in STEP1 to
partial_target.
The sortgroupref of partial_target->sortgrouprefs to be added to value 
is set to

(the maximum value of the existing sortgroupref) + 1.
setGroupClausePartial adds data sgc of sortgroupclause type where
sgc->tlesortgroupref
matches the sortgroupref to GroupClause.
STEP4. add_new_columns_to_pathtarget adds STEP1's Aggref to 
partial_target.


Due to STEP2, the list of tlesortgrouprefs set in
extra->groupClausePartial is not duplicated.


Do you mean that extra->partial_target->sortgrouprefs is not replaced, 
and so we preserve tlesortgroupref numbers?
I'm suspicious about rewriting extra->partial_target->exprs with 
partial_target->exprs - I'm still not sure why we
 don't we loose information, added by add_column_to_pathtarget() to 
extra->partial_target->exprs?


Also look at the following example.

EXPLAIN VERBOSE SELECT  count(*) , (b/2)::numeric FROM pagg_tab GROUP BY 
b/2 ORDER BY 1;

QUERY PLAN
---
 Sort  (cost=511.35..511.47 rows=50 width=44)

Re: Partial aggregates pushdown

2023-09-25 Thread Bruce Momjian
On Mon, Sep 25, 2023 at 03:18:13AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers.
> 
> Thank you for your valuable comments. I sincerely apologize for the very late 
> reply.
> Here is a response to your comments or a fix to the patch.
> 
> Tuesday, August 8, 2023 at 3:31 Bruce Momjian
> > > I have modified the program except for the point "if the version of the 
> > > remote server is less than PG17".
> > > Instead, we have addressed the following.
> > > "If check_partial_aggregate_support is true and the remote server version 
> > > is older than the local server
> > > version, postgres_fdw does not assume that the partial aggregate function 
> > > is on the remote server unless
> > > the partial aggregate function and the aggregate function match."
> > > The reason for this is to maintain compatibility with any aggregate 
> > > function that does not support partial
> > > aggregate in one version of V1 (V1 is PG17 or higher), even if the next 
> > > version supports partial aggregate.
> > > For example, string_agg does not support partial aggregation in PG15, but 
> > > it will support partial aggregation
> > > in PG16.
> >
> > Just to clarify, I think you are saying:
> >
> > If check_partial_aggregate_support is true and the remote server
> > version is older than the local server version, postgres_fdw
> > checks if the partial aggregate function exists on the remote
> > server during planning and only uses it if it does.
> >
> > I tried to phrase it in a positive way, and mentioned the plan time
> > distinction.  Also, I am sorry I was away for most of July and am just
> > getting to this.
> Thanks for your comment. In the documentation, the description of 
> check_partial_aggregate_support is as follows
> (please see postgres-fdw.sgml).
> --
> check_partial_aggregate_support (boolean)
> Only if this option is true, during query planning, postgres_fdw connects to 
> the remote server and check if the remote server version is older than the 
> local server version. If so, postgres_fdw assumes that for each built-in 
> aggregate function, the partial aggregate function is not defined on the 
> remote server unless the partial aggregate function and the aggregate 
> function match. The default is false.
> --

My point is that there are three behaviors:

*  false - no check
*  true, remote version >= sender - no check
*  true, remove version < sender - check

Here is your code:

+ * Check that a buit-in aggpartialfunc exists on the remote server. If
+ * check_partial_aggregate_support is false, we assume the partial 
aggregate
+ * function exsits on the remote server. Otherwise we assume the 
partial
+ * aggregate function exsits on the remote server only if the remote 
server
+ * version is not less than the local server version.
+ */
+static bool
+is_builtin_aggpartialfunc_shippable(Oid aggpartialfn, 
PgFdwRelationInfo *fpinfo)
+{
+   boolshippable = true;
+
+   if (fpinfo->check_partial_aggregate_support)
+   {
+   if (fpinfo->remoteversion == 0)
+   {
+   PGconn *conn = GetConnection(fpinfo->user, 
false, NULL);
+
+   fpinfo->remoteversion = PQserverVersion(conn);
+   }
+   if (fpinfo->remoteversion < PG_VERSION_NUM)
+   shippable = false;
+   }
+   return shippable;
+}

I think this needs to be explained in the docs.  I am ready to adjust
the patch to improve the wording whenever you are ready.  Should I do it
now and post an updated version for you to use?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-08-07 Thread Bruce Momjian
On Mon, Jul 10, 2023 at 07:35:27AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> > > I will add a postgres_fdw option "check_partial_aggregate_support".
> > > This option is false, default.
> > > Only if this option is true, postgres_fdw connect to the remote server 
> > > and get the version of the remote server.
> > > And if the version of the remote server is less than PG17, then partial 
> > > aggregate push down to the remote server is
> > disable.
> > 
> > Great!
> I have modified the program except for the point "if the version of the 
> remote server is less than PG17".
> Instead, we have addressed the following.
> "If check_partial_aggregate_support is true and the remote server version is 
> older than the local server
> version, postgres_fdw does not assume that the partial aggregate function is 
> on the remote server unless
> the partial aggregate function and the aggregate function match."
> The reason for this is to maintain compatibility with any aggregate function 
> that does not support partial
> aggregate in one version of V1 (V1 is PG17 or higher), even if the next 
> version supports partial aggregate.
> For example, string_agg does not support partial aggregation in PG15, but it 
> will support partial aggregation
> in PG16.

Just to clarify, I think you are saying:

If check_partial_aggregate_support is true and the remote server
version is older than the local server version, postgres_fdw
checks if the partial aggregate function exists on the remote
server during planning and only uses it if it does.

I tried to phrase it in a positive way, and mentioned the plan time
distinction.  Also, I am sorry I was away for most of July and am just
getting to this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-08-01 Thread Finnerty, Jim
When it is valid to filter based on a HAVING clause predicate, it should 
already have been converted into a WHERE clause predicate, except in the 
special case of an LIMIT TO .k .. ORDER BY case where the HAVING clause 
predicate can be determined approximately after having found k fully qualified 
tuples and then that predicate is successively tightened as more qualified 
records are found.

*that*, by the way, is a very powerful optimization.

/Jim F

On 7/20/23, 6:24 AM, "Alexander Pyhalov" mailto:a.pyha...@postgrespro.ru>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






fujii.y...@df.mitsubishielectric.co.jp 
 писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.


> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
>


Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:


CREATE FUNCTION f() RETURNS INT AS $$
begin
return 10;
end
$$ LANGUAGE PLPGSQL;


SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY
1;
b | sum
+-
0 | 0
10 | 0
20 | 0
30 | 0
40 | 0
+(5 rows)


In fact the above query should have returned 0 rows, as


SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
b | sum
+--
0 | 600
1 | 660
2 | 720
3 | 780
4 | 840
5 | 900
6 | 960
7 | 1020
8 | 1080
9 | 1140
10 | 600
11 | 660
12 | 720

shows no such rows.


Or, on the same data


SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY
1;


You'll get 0 rows.


But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
b | sum
+--
42 | 720
29 | 1140
4 | 840
34 | 840
41 | 660
0 | 600
40 | 600
gives.


The issue is that you can't calculate "partial" having. You should
compare full aggregate in filter, but it's not possible on the level of
one partition.
And you have this in plans


Finalize GroupAggregate
Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
Group Key: pagg_tab.b
Filter: (sum(pagg_tab.a) < 700)
-> Sort
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Sort Key: pagg_tab.b
-> Append
-> Foreign Scan
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)),
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Filter: ((PARTIAL sum(pagg_tab.a)) < 700) 
<--- here we can't compare anything yet, sum is incomplete.
Relations: Aggregate on (public.fpagg_tab_p1
pagg_tab)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)),
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_1.a))
Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p2
pagg_tab_1)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)),
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_2.a))
Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p3
pagg_tab_2)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1


In foreign_grouping_ok()
6586 if (IsA(expr, Aggref))
6587 {
6588 if (partial)
6589 {
6590 mark_partial_aggref((Aggref
*) expr, AGGSPLIT_INITIAL_SERIAL);
6591 continue;
6592 }
6593 else if (!is_foreign_expr(root,
grouped_rel, expr))
6594 return false;
6595
6596 tlist = add_to_flat_tlist(tlist,
list_make1(expr));
6597 }


at least you shouldn't do anything with expr, if is_foreign_expr()
returned false. If we restrict pushing down queries with havingQuals,
I'm not quite sure how Aggref can appear in local_conds.


As for changes in planner.c (setGroupClausePartial()) I have several
questions.


1) Why don't we add non_group_exprs to pathtarget->exprs when
partial_target->exprs is not set?


2) We replace extra->partial_target->exprs with partial_target->exprs
after processing. Why are we sure that after this tleSortGroupRef is
correct?


--
Best regards,
Alexander Pyhalov,
Postgres Professional





Re: Partial aggregates pushdown

2023-07-20 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43:

Hi Mr.Pyhalov, hackers.



3)
I modified the patch to safely do a partial aggregate pushdown for
queries which contain having clauses.



Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
  return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 
1;

 b  | sum
+-
  0 |   0
 10 |   0
 20 |   0
 30 |   0
 40 |   0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
 b  | sum
+--
  0 |  600
  1 |  660
  2 |  720
  3 |  780
  4 |  840
  5 |  900
  6 |  960
  7 | 1020
  8 | 1080
  9 | 1140
 10 |  600
 11 |  660
 12 |  720

shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 
1;


You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
 b  | sum
+--
 42 |  720
 29 | 1140
  4 |  840
 34 |  840
 41 |  660
  0 |  600
 40 |  600
gives.

The issue is that you can't calculate "partial" having. You should 
compare full aggregate in filter, but it's not possible on the level of 
one partition.

And you have this in plans

 Finalize GroupAggregate
   Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
   Group Key: pagg_tab.b
   Filter: (sum(pagg_tab.a) < 700)
   ->  Sort
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))

 Sort Key: pagg_tab.b
 ->  Append
   ->  Foreign Scan
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
 Filter: ((PARTIAL sum(pagg_tab.a)) < 700)    
<--- here we can't compare anything yet, sum is incomplete.
 Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))

 Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))

 Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1


In foreign_grouping_ok()
6586 if (IsA(expr, Aggref))
6587 {
6588 if (partial)
6589 {
6590 mark_partial_aggref((Aggref 
*) expr, AGGSPLIT_INITIAL_SERIAL);

6591 continue;
6592 }
6593 else if (!is_foreign_expr(root, 
grouped_rel, expr))

6594 return false;
6595
6596 tlist = add_to_flat_tlist(tlist, 
list_make1(expr));

6597 }

at least you shouldn't do anything with expr, if is_foreign_expr() 
returned false. If we restrict pushing down queries with havingQuals, 
I'm not quite sure how Aggref can appear in local_conds.


As for changes in planner.c (setGroupClausePartial()) I have several 
questions.


1) Why don't we add non_group_exprs to pathtarget->exprs when 
partial_target->exprs is not set?


2) We replace extra->partial_target->exprs with partial_target->exprs 
after processing. Why are we sure that after this tleSortGroupRef is 
correct?


--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1824cb67fe9..c6f613019c3 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3025,6 +3025,22 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 
+CREATE FUNCTION f() RETURNS INT AS $$
+begin
+  return 10;
+end 
+$$ LANGUAGE PLPGSQL;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1;
+SELECT b, sum(a

RE: Partial aggregates pushdown

2023-07-17 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> From: Alexander Pyhalov 
> Sent: Friday, July 14, 2023 10:40 PM
> 1) In foreign_join_ok() should we set fpinfo->user if
> fpinfo->check_partial_aggregate_support is set like it's done for 
> fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user 
> fpinfo->=
> NULL if use_remote_estimate is not set.
You are right. I will modify this patch according to your advice.
Thank you for advice.

> 2) It seeems we found an additional issue with original patch, which 
> is present in current one. I'm attaching a patch which seems to fix 
> it, but I'm not quite sure in it.
Thank you for pointing out the issue.
If a query's group-by clause contains variable based expression(not variable)
and the query's select clause contains another expression,
the partial aggregate could be unsafe to push down.

An example of such queries:
SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2

Your patch disables partial aggregate pushdown for such queries.
I'll see if we can modify the patch to safely do a partial aggregate pushdown 
for such queries as well.
Such a query expects the variable in the select clause expression to be 
included in the target of the grouped rel
(let see make_partial_grouping_target), 
but the original groupby clause has no reference to this variable,
this seems to be the direct cause(let see foreign_grouping_ok). 
I will examine whether a safe pushdown can be achieved by matching the
groupby clause information referenced by foreign_grouping_ok with the grouped 
rel target information.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


Re: Partial aggregates pushdown

2023-07-14 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-10 10:35:

I have modified the program except for the point "if the version of
the remote server is less than PG17".
Instead, we have addressed the following.
"If check_partial_aggregate_support is true and the remote server
version is older than the local server
version, postgres_fdw does not assume that the partial aggregate
function is on the remote server unless
the partial aggregate function and the aggregate function match."
The reason for this is to maintain compatibility with any aggregate
function that does not support partial
aggregate in one version of V1 (V1 is PG17 or higher), even if the
next version supports partial aggregate.
For example, string_agg does not support partial aggregation in PG15,
but it will support partial aggregation
in PG16.



Hi.

1) In foreign_join_ok() should we set fpinfo->user if 
fpinfo->check_partial_aggregate_support is set like it's done for 
fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user = 
NULL if use_remote_estimate is not set.


2) It seeems we found an additional issue with original patch, which is 
present in current one. I'm attaching a patch which seems to fix it, but 
I'm not quite sure in it.




We have not been able to add a test for the case where the remote
server version is older than the
local server version to the regression test. Is there any way to add
such tests to the existing regression
tests?



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 187d15185200aabc22c5219bbe636bc950670a78 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 14 Jul 2023 16:34:02 +0300
Subject: [PATCH] For partial aggregation we can't rely on the fact that every
 var is a part of some GROUP BY expression

---
 .../postgres_fdw/expected/postgres_fdw.out| 120 ++
 contrib/postgres_fdw/postgres_fdw.c   |  31 +++--
 contrib/postgres_fdw/sql/postgres_fdw.sql |   9 ++
 3 files changed, 152 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8d80ba0a6be..31ee461045c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9955,6 +9955,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
  49 | 19. |  29 |60
 (50 rows)
 
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+   QUERY PLAN   
+
+ Sort
+   Output: ((pagg_tab.b / 2)), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*))
+   Sort Key: ((pagg_tab.b / 2))
+   ->  Finalize HashAggregate
+ Output: ((pagg_tab.b / 2)), avg(pagg_tab.a), max(pagg_tab.a), count(*)
+ Group Key: ((pagg_tab.b / 2))
+ ->  Append
+   ->  Foreign Scan
+ Output: ((pagg_tab.b / 2)), (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p1 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_1.b / 2)), (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p2 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_2.b / 2)), (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p3 GROUP BY 1
+(19 rows)
+
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+ ?column? | avg | max | count 
+--+-+-+---
+0 | 10.5000 |  21 |   120
+1 | 12.5000 |  23 |   120
+2 | 14.5000 |  25 |   120
+3 | 16.5000 |  27 |   120
+4 | 18.5000 |  29 |   120
+5 | 10.5000 |  21 |   120
+6 | 12.5000 |  23 |   120
+7 | 14.5000 |  25 |   120
+8 | 16.5000 |  27 |   120
+9 | 18.5000 |  29 |   120
+   10 | 10.5000 |  21 |   120
+   11 | 12.5000 |  23 |   120
+   12 | 14.5000 |  25 |   120
+   13 | 16.5000 |  27 |   120
+  

Re: Partial aggregates pushdown

2023-06-22 Thread Bruce Momjian
On Thu, Jun 22, 2023 at 05:23:33AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Approach1-3:
> I will add a postgres_fdw option "check_partial_aggregate_support".
> This option is false, default.
> Only if this option is true, postgres_fdw connect to the remote server and 
> get the version of the remote server.
> And if the version of the remote server is less than PG17, then partial 
> aggregate push down to the remote server is disable.

Great!

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-06-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Pyhalov, hackers.

> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
Thank you both for your advice.
We will address the compatibility issues as follows.

Approach1-3:
I will add a postgres_fdw option "check_partial_aggregate_support".
This option is false, default.
Only if this option is true, postgres_fdw connect to the remote server and get 
the version of the remote server.
And if the version of the remote server is less than PG17, then partial 
aggregate push down to the remote server is disable.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

> -Original Message-
> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> To: Alexander Pyhalov 
> Cc: Fujii Yuki/藤井 雄規(MELCO/情報総研 DM最適G) 
> ;
> PostgreSQL-development ; Andres Freund 
> ; Tom Lane
> ; Tomas Vondra ; Julien 
> Rouhaud ;
> Daniel Gustafsson ; Ilya Gladyshev 
> 
> Subject: Re: Partial aggregates pushdown
> 
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
> 
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
> 
>   Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-06-21 Thread Bruce Momjian
On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > Therefore, it seems like it would be near-zero cost to just call conn =
> > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > You can then use the return value of PQserverVersion() to determine if
> > you can push down partial aggregates.
> 
> Hi.
> Currently we don't get remote connection while planning if
> use_remote_estimate is not set.
> Such change would require to get remote connection in planner, not in
> executor.
> This can lead to change of behavior (like errors in explain when user
> mapping is wrong - e.g. bad password is specified).
> Also this potentially can lead to establishing connections even when plan
> node is not actually used
> (like extreme example - select sum(score) from t limit 0).
> I'm not saying we shouldn't do it - just hint at possible consequences.

Agreed.  I noticed it was doing FDW connections during optimization, but
didn't see the postgres_fdw option that would turn it off. 
Interestingly, it is disabled by default.

After considering the options, I think we should have a postgres_fdw
option called "planner_version_check", and default that false.  When
false, a remote server version check will not be performed during
planning and partial aggregates will be always be considered.  When
true, a version check will be performed during planning and partial
aggregate pushdown disabled for pre-PG 17 foreign servers during the
query.

If we want to be more specific, we can call it
"check_partial_aggregate_support".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-06-19 Thread Alexander Pyhalov

Bruce Momjian писал 2023-06-20 03:42:

Apologies for the delay in my reply to this email.  I looked into the
existing code and I found three things:

1)  PQserverVersion() just pulls the conn->sversion value from the
existing connection because pqSaveParameterStatus() pulls the
server_version sent by the backend;  no need to issue SELECT version().

2)  postgres_fdw already has nine calls to GetConnection(), and only
opens a connection if it already doesn't have one.  Here is an example:

/* Get the remote estimate */
conn = GetConnection(fpinfo->user, false, NULL);
get_remote_estimate(sql.data, conn, &rows, &width,
&startup_cost, &total_cost);
ReleaseConnection(conn);

Therefore, it seems like it would be near-zero cost to just call conn =
GetConnection() and then PQserverVersion(conn), and 
ReleaseConnection().

You can then use the return value of PQserverVersion() to determine if
you can push down partial aggregates.



Hi.
Currently we don't get remote connection while planning if 
use_remote_estimate is not set.
Such change would require to get remote connection in planner, not in 
executor.
This can lead to change of behavior (like errors in explain when user 
mapping is wrong - e.g. bad password is specified).
Also this potentially can lead to establishing connections even when 
plan node is not actually used

(like extreme example - select sum(score) from t limit 0).
I'm not saying we shouldn't do it - just hint at possible consequences.

--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2023-06-19 Thread Bruce Momjian
On Tue, Jun 13, 2023 at 02:18:15AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Momjian.
> 
> Thank you for advises.
> 
> > From: Bruce Momjian 
> > Sent: Monday, June 12, 2023 10:38 PM
> > > I understand what the problem is. I will put a mechanism maintaining 
> > > compatibility into the patch.
> > > I believe there are three approaches.
> > > Approach 1-1 is preferable because it does not require additional options 
> > > for postgres_fdw.
> > > I will revise the patch according to Approach 1-1, unless otherwise 
> > > commented.
> > >
> > > Approach1:
> > > I ensure that postgres_fdw retrieves the version of each remote server
> > > and does not partial aggregate pushd down if the server version is less 
> > > than 17.
> > > There are two approaches to obtaining remote server versions.
> > > Approach1-1: postgres_fdw connects a remote server and use 
> > > PQserverVersion().
> > > Approach1-2: Adding a postgres_fdw option about a remote server version 
> > > (like "server_version").
> > >
> > > Approach2:
> > > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > > or not (like enable_partial_aggregate_pushdown).
> > 
> > These are good questions.  Adding a postgres_fdw option called 
> > enable_partial_aggregate_pushdown helps make the
> > purpose of the option clear, but remote_version can be used for future 
> > breakage as well.
> > 
> > I think remote_version is the best idea, and in the documentation for the 
> > option, let's explicitly say it is useful to disable
> > partial aggregates pushdown on pre-PG 17 servers.  If we need to use the 
> > option for other cases, we can just update the
> > documentation.  When the option is blank, the default, everything is pushed 
> > down.
> > 
> > I see remote_version a logical addition to match our "extensions" option 
> > that controls what extension functions can be
> > pushed down.
> 
> Thank you for your perspective.
> So, of the approaches I have presented, you think that approach 1-2 is
> preferable and that the option name remote_server is preferable?
> Indeed, the option of a remote version may have other uses.
> However, this information can be obtained by connecting to a remote server, 
> I'm concerned that some people may find this option redundant.
> 
> Is the problem with approach 1-1 because the user cannot decide whether to 
> include the compatibility check in the decision to do partial aggregate 
> pushdown or not?
> # If Approach 1-1 is taken, the problem is that this feature cannot be used 
> for all bait-in aggregate functions
> # when the remote server is older than PG17.
> If so, Approache1-3 below seem more desirable.
> Would it be possible for us to hear your thoughts?
> 
> Approache1-3:We add a postgres_fdw option about a compatibility check for 
> partial aggregate pushdown
> (like "enable_aggpartialfunc_compatibility_check"). This option is false, the 
> default.
> When this option is true, postgres_fdw obtains the remote server version by 
> connecting the remote server and using PQserverVersion(). 
> And if the remote server version is older than PG17, then the partial 
> aggregate pushdown feature is enable for all buit-in aggregate functions.
> Otherwise the partial aggregate pushdown feature is disable for all buit-in 
> aggregate functions.

Apologies for the delay in my reply to this email.  I looked into the
existing code and I found three things:

1)  PQserverVersion() just pulls the conn->sversion value from the
existing connection because pqSaveParameterStatus() pulls the
server_version sent by the backend;  no need to issue SELECT version().

2)  postgres_fdw already has nine calls to GetConnection(), and only
opens a connection if it already doesn't have one.  Here is an example:

/* Get the remote estimate */
conn = GetConnection(fpinfo->user, false, NULL);
get_remote_estimate(sql.data, conn, &rows, &width,
&startup_cost, &total_cost);
ReleaseConnection(conn);

Therefore, it seems like it would be near-zero cost to just call conn =
GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
You can then use the return value of PQserverVersion() to determine if
you can push down partial aggregates.

3)  Looking at postgresAcquireSampleRowsFunc(), I see this exact method
used:

conn = GetConnection(user, false, NULL);

/* We'll need server version, so fetch it now. */
server_version_num = PQserverVersion(conn);

...

if ((server_version_num < 95000) &&
(method == ANALYZE_SAMPLE_SYSTEM ||
 method == ANALYZE_SAMPLE_BERNOULLI))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("remote server does not support TABLESAMPLE feature")));

I am sorry if you already knew all this, but I didn't.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can de

RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

Thank you for advises.

> From: Bruce Momjian 
> Sent: Monday, June 12, 2023 10:38 PM
> > I understand what the problem is. I will put a mechanism maintaining 
> > compatibility into the patch.
> > I believe there are three approaches.
> > Approach 1-1 is preferable because it does not require additional options 
> > for postgres_fdw.
> > I will revise the patch according to Approach 1-1, unless otherwise 
> > commented.
> >
> > Approach1:
> > I ensure that postgres_fdw retrieves the version of each remote server
> > and does not partial aggregate pushd down if the server version is less 
> > than 17.
> > There are two approaches to obtaining remote server versions.
> > Approach1-1: postgres_fdw connects a remote server and use 
> > PQserverVersion().
> > Approach1-2: Adding a postgres_fdw option about a remote server version 
> > (like "server_version").
> >
> > Approach2:
> > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > or not (like enable_partial_aggregate_pushdown).
> 
> These are good questions.  Adding a postgres_fdw option called 
> enable_partial_aggregate_pushdown helps make the
> purpose of the option clear, but remote_version can be used for future 
> breakage as well.
> 
> I think remote_version is the best idea, and in the documention for the 
> option, let's explcitly say it is useful to disable
> partial aggreates pushdown on pre-PG 17 servers.  If we need to use the 
> option for other cases, we can just update the
> documentation.  When the option is blank, the default, everything is pushed 
> down.
> 
> I see remote_version a logical addition to match our "extensions" option that 
> controls what extension functions can be
> pushed down.

Thank you for your perspective.
So, of the approaches I have presented, you think that approach 1-2 is
preferable and that the option name remote_server is preferable?
Indeed, the option of a remote version may have other uses.
However, this information can be obtained by connecting to a remote server, 
I'm concerned that some people may find this option redundant.

Is the problem with approach 1-1 because the user cannot decide whether to 
include the compatibility check in the decision to do partial aggregate 
pushdown or not?
# If Approach 1-1 is taken, the problem is that this feature cannot be used for 
all buit-in aggregate functions
# when the remote server is older than PG17.
If so, Approache1-3 below seem more desirable.
Would it be possible for us to hear your thoughts?

Approache1-3:We add a postgres_fdw option about a compatibility check for 
partial aggregate pushdown
(like "enable_aggpartialfunc_compatibility_check"). This option is false, the 
default.
When this option is true, postgres_fdw obtains the remote server version by 
connecting the remote server and using PQserverVersion(). 
And if the remote server version is older than PG17, then the partial aggregate 
pushdown feature is enable for all buit-in aggregate functions.
Otherwise the partial aggregate pushdown feature is disable for all buit-in 
aggregate functions.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2023-06-12 Thread Bruce Momjian
On Mon, Jun 12, 2023 at 08:51:30AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Bruce, Mr.Pyhalov, hackers.
> 
> Thank you for comments. I will try to respond to both of your comments as 
> follows.
> I plan to start revising the patch next week. If you have any comments on the 
> following
> respondences, I would appreciate it if you could give them to me this week.
> 
> > From: Bruce Momjian 
> > Sent: Saturday, June 10, 2023 1:44 AM
> > I agree that this feature is designed for built-in sharding, but it is 
> > possible people could be using aggregates on partitions
> > backed by foreign tables without sharding.  Adding a requirement for 
> > non-sharding setups to need PG 17+ servers might
> > be unreasonable.
> Indeed, it is possible to use partial aggregate pushdown feature for purposes 
> other than sharding.
> The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
> assumes the use of
> Built-in sharding and will be modified to eliminate this assumption.
> The title of this section should be changed to something like "Aggregate on 
> partitioned table".

Sounds good.

> > From: Bruce Momjian 
> > Sent: Saturday, June 10, 2023 1:44 AM
> > Looking at previous release note incompatibilities, we don't normally 
> > change non-administrative functions in a way that
> > causes errors if run on older servers.  Based on Alexander's observations, 
> > I wonder if we need to re-add the postgres_fdw
> > option to control partial aggregate pushdown, and default it to enabled.
> > 
> > If we ever add more function breakage we might need more postgres_fdw 
> > options.  Fortunately, such changes are rare.
> 
> I understand what the problem is. I will put a mechanism maintaining 
> compatibility into the patch.
> I believe there are three approaches.
> Approach 1-1 is preferable because it does not require additional options for 
> postgres_fdw.
> I will revise the patch according to Approach 1-1, unless otherwise commented.
> 
> Approach1:
> I ensure that postgres_fdw retrieves the version of each remote server
> and does not partial aggregate pushd down if the server version is less than 
> 17.
> There are two approaches to obtaining remote server versions.
> Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
> Approach1-2: Adding a postgres_fdw option about a remote server version (like 
> "server_version").
> 
> Approach2:
> Adding a postgres_fdw option for partial aggregate pushdown is enable or not
> (like enable_partial_aggregate_pushdown).

These are good questions.  Adding a postgres_fdw option called
enable_partial_aggregate_pushdown helps make the purpose of the option
clear, but remote_version can be used for future breakage as well.

I think remote_version is the best idea, and in the documention for the
option, let's explcitly say it is useful to disable partial aggreates
pushdown on pre-PG 17 servers.  If we need to use the option for other
cases, we can just update the documentation.  When the option is blank,
the default, everything is pushed down.

I see remote_version a logical addition to match our "extensions" option
that controls what extension functions can be pushed down.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Bruce, Mr.Pyhalov, hackers.

Thank you for comments. I will try to respond to both of your comments as 
follows.
I plan to start revising the patch next week. If you have any comments on the 
following
respondences, I would appreciate it if you could give them to me this week.

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> I agree that this feature is designed for built-in sharding, but it is 
> possible people could be using aggregates on partitions
> backed by foreign tables without sharding.  Adding a requirement for 
> non-sharding setups to need PG 17+ servers might
> be unreasonable.
Indeed, it is possible to use partial aggregate pushdown feature for purposes 
other than sharding.
The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
assumes the use of
Built-in sharding and will be modified to eliminate this assumption.
The title of this section should be changed to something like "Aggregate on 
partitioned table".

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> Looking at previous release note incompatibilities, we don't normally change 
> non-administrative functions in a way that
> causes errors if run on older servers.  Based on Alexander's observations, I 
> wonder if we need to re-add the postgres_fdw
> option to control partial aggregate pushdown, and default it to enabled.
> 
> If we ever add more function breakage we might need more postgres_fdw 
> options.  Fortunately, such changes are rare.

I understand what the problem is. I will put a mechanism maintaining 
compatibility into the patch.
I believe there are three approaches.
Approach 1-1 is preferable because it does not require additional options for 
postgres_fdw.
I will revise the patch according to Approach 1-1, unless otherwise commented.

Approach1:
I ensure that postgres_fdw retrieves the version of each remote server
and does not partial aggregate pushd down if the server version is less than 17.
There are two approaches to obtaining remote server versions.
Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
Approach1-2: Adding a postgres_fdw option about a remote server version (like 
"server_version").

Approach2:
Adding a postgres_fdw option for partial aggregate pushdown is enable or not
(like enable_partial_aggregate_pushdown).

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




Re: Partial aggregates pushdown

2023-06-09 Thread Bruce Momjian
On Tue, Jun  6, 2023 at 03:08:50AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> > I've seen this message. But introduction of new built-in function will break
> > requests to old servers only if this new function is used in the request 
> > (this
> > means that query changes). However, this patch changes the behavior of old
> > queries, which worked prior to update. This seems to be different to me.
> 
> You are right.
> However, for now, partial aggregates pushdown is mainly used when using 
> built-in sharding in PostgreSQL.
> I believe when using built-in sharding in PostgreSQL, the version of the 
> primary node server and
> the version of the remote server will usually be the same.
> So I think it would be sufficient to include in the documentation a note 
> about such problem
> and a workaround for them.

I agree that this feature is designed for built-in sharding, but it is
possible people could be using aggregates on partitions backed by
foreign tables without sharding.  Adding a requirement for non-sharding
setups to need PG 17+ servers might be unreasonable.

Looking at previous release note incompatibilities, we don't normally
change non-administrative functions in a way that causes errors if run
on older servers.  Based on Alexander's observations, I wonder if we
need to re-add the postgres_fdw option to control partial aggregate
pushdown, and default it to enabled.

If we ever add more function breakage we might need more postgres_fdw
options.  Fortunately, such changes are rare.

Yuki, thank you for writing and updating this patch, and Alexander,
thank you for helping with this patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Partial aggregates pushdown

2023-06-09 Thread Alexander Pyhalov

Hi.

+   An aggregate function, called the partial aggregate function for 
partial aggregate
+   that corresponding to the aggregate function, is defined on the 
primary node and

+   the postgres_fdw node.

Something is clearly wrong here.

+   When using built-in sharding feature in PostgreSQL is used,

And here.

Overall the code looks good to me, but I suppose that documentation 
needs further review from some native speaker.



--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Partial aggregates pushdown

2023-06-08 Thread Alexander Pyhalov

fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-08 02:08:

From: Alexander Pyhalov 
Sent: Wednesday, June 7, 2023 6:47 PM
This seems to be more robust, but the interface became more strange.
I'm not sure what to do with it. Some ideas I had to avoid introducing 
this

parameter. Not sure I like any of them.

1) You can use QualifiedNameGetCreationNamespace() for 
aggpartialfnName

and still compare namespace and function name  for it and  aggName,
aggNamespace.
Seems to be not ideal, but avoids introducing new parameters.

2) You can lookup for partial aggregate function after 
ProcedureCreate() in
AggregateCreate(), if it wasn't found at earlier stages. If it is the 
aggregate itself
- check it. If it's still not found, error out. Also seems to be a bit 
ugly - you leave

uncommitted garbage for vacuum in catalogue.

Thank you for suggesting alternatives.
The disadvantages of alternative 2) appear to be undesirable,
I have modified it according to alternative 1)

Another issue - the patch misses recording dependency between 
aggpartialfn

and aggregate procedure.

I added code to record dependencys between aggpartialfn
and aggregate procedure, similar to the code for functions such as 
combinefunc.




Hi.

Looks better. The only question I have is should we record dependency 
between procOid and aggpartialfn if aggpartialfn == procOid.


Also it seems new code likely should be run through pgindent.

doc/src/sgml/postgres-fdw.sgml:

+   For WHERE clauses,
+   JOIN clauses, this sending is active if
+   conditions in linkend="postgres-fdw-remote-query-optimization"/>
+   hold and enable_partitionwise_join is true(this 
condition

+   is need for only JOIN clauses).
+   For aggregate expressions, this sending is active if conditions in

No space between "true" and "(" in "is true(this condition".

Some sentences in documentation, like one starting with
"For aggregate expressions, this sending is active if conditions in..."
seem to be too long, but I'm not the best man to read out documentation.

In "Built-in sharding in PostgreSQL" term "shard" doesn't have a 
definition.


By the way, I'm not sure that "sharding" documentation belongs to this 
patch,

at least it needs a review from native speaker.

--
Best regards,
Alexander Pyhalov,
Postgres Professional




  1   2   >