RE: Partial aggregates pushdown

2024-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Mr.Momjian, Mr.Lane, Mr.Haas, hackers.

I apologize for any misunderstanding regarding the context of the attached 
patch and
the points on which I requested a review. Could you please allow me to clarify?

In the review around early December 2023, I received the following three issues 
pointed out by Mr.Haas[1].
1. Transmitting state value safely between machines
2. Making the patch clearer by adding SQL keywords
3. Fixing the behavior when the HAVING clause is present

In the email sent on February 22, 2024[2], I provided an update on the progress 
made in addressing these issues.
Regarding issue 1, I have only provided a proposed solution in the email and 
have not started the programming. 
Therefore, the latest patch is not in a commit-ready state. As mentioned later, 
we have also temporarily reverted the changes made to the documentation.
Before proceeding with the programming, I would like to discuss the proposed 
solution with the community and seek consensus.
If it is necessary to have source code in order to discuss, I can create a 
simple prototype so that I can receive your feedback.
Would you be able to provide your opinions on it?

Regarding issue 2., I have confirmed that creating a prototype allows us to 
address the issue and clear the patch.
In this prototype creation, the main purpose was to verify if the patch can be 
cleared and significant revisions were made to the previous version.
Therefore, I have removed all the document differences.
I have submitted a patch [3] that includes the fixes for issue 3. to the patch 
that was posted in [2].
Regarding the proposed solution for issue 1, unlike the patch posted in [3], 
we have a policy of not performing partial aggregation pushdown if we cannot 
guarantee compatibility and safety.
The latest patch in [3] is a POC patch. The patch that Mr. Momjian reviewed is 
this.
If user-facing documentation is needed for this POC patch, it can be added.

I apologize for the lack of explanation regarding this positioning, which may 
have caused misunderstandings regarding the patch posted in [3].

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYCrtOvk2f32qQKZV%3DjNL35tandf2A2Dp_2F5ASuiG1BA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/TYAPR01MB5514F0CBD9CD4F84A261198195562%40TYAPR01MB5514.jpnprd01.prod.outlook.com
[3] 
https://www.postgresql.org/message-id/TYAPR01MB55141D18188AC86ADCE35FCB952F2%40TYAPR01MB5514.jpnprd01.prod.outlook.com

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Haas.

> -Original Message-
> From: Robert Haas 
> Sent: Wednesday, December 6, 2023 10:25 PM
> On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Are you concerned about the hassle and potential human errors of
> > manually adding new partial aggregation functions, rather than the catalog 
> > becoming bloated?
> 
> I'm concerned about both.
Understood. Thank you for your response.

> > The process of creating partial aggregation functions from aggregation
> > functions can be automated, so I believe this issue can be resolved.
> > However, automating it may increase the size of the patch even more, so 
> > overall, approach#2 might be better.
> > To implement approach #2, it would be necessary to investigate how much 
> > additional code is required.
> 
> Yes. Unfortunately I fear that there is quite a lot of work left to do here 
> in order to produce a committable feature. To me it
> seems necessary to conduct an investigation of approach #2. If the result of 
> that investigation is that nothing major
> stands in the way of approach #2, then I think we should adopt it, which is 
> more work. In addition, the problems around
> transmitting serialized bytea blobs between machines that can't be assumed to 
> fully trust each other will need to be
> addressed in some way, which seems like it will require a good deal of design 
> work, forming some kind of consensus, and
> then implementation work to follow. In addition to that there may be some 
> small problems that need to be solved at a
> detail level, such as the HAVING issue. I think the last category won't be 
> too hard to sort out, but that still leaves two really
> major areas to address.
Yes, I agree with you. It is clear that further investigation and discussion 
are still needed. 
I would be grateful if we can resolve this issue gradually. I would also like 
to continue the discussion if possible in the future.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
g query
> >
> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> > foreign server as HAVING needs full aggregate result, but foreign
> > server don't know it.
> 
> I don't see it that way. What we would push to the foreign server would be 
> something like SELECT count(a) FROM t. Then,
> after we get the results back and combine the various partial counts locally, 
> we would locally evaluate the HAVING clause
> afterward. That is, partial aggregation is a barrier to pushing down HAVING 
> clause itself, but it doesn't preclude pushing
> down the aggregation.
I understand what the problem is. I will try to fix it in the next version.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 5:03 AM
> On Wed, Nov 22, 2023 at 5:16 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > I did not choose Approach2 because I was not confident that the
> > disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL 
> > development community.
> > If it is accepted, I think Approach 2 is smarter.
> > Could you please provide your opinion on which approach is preferable
> > after comparing these two approaches?
> > If we cannot say anything without comparing the amount of source code,
> > as Mr.Momjian mentioned, we need to estimate the amount of source code 
> > required to implement Approach2.
> 
> I've had the same concern, that approach #2 would draw objections, so I think 
> you were right to be cautious about it. I
> don't think it is a wonderful approach in all ways, but I do think that it is 
> superior to approach #1. If we add dedicated
> support to the grammar, it is mostly a one-time effort, and after that, there 
> should not be much need for anyone to be
> concerned about it. If we instead add extra aggregates, then that generates 
> extra work every time someone writes a patch
> that adds a new aggregate to core. I have a difficult time believing that 
> anyone will prefer an approach that involves an
> ongoing maintenance effort of that type over one that doesn't.
> 
> One point that seems to me to be of particular importance is that if we add 
> new aggregates, there is a risk that some
> future aggregate might do that incorrectly, so that the main aggregate works, 
> but the secondary aggregate created for this
> feature does not work. That seems like it would be very frustrating for 
> future code authors so I'd like to avoid the risk as
> much as we can.
Are you concerned about the hassle and potential human errors of manually 
adding new partial
aggregation functions, rather than the catalog becoming bloated?
The process of creating partial aggregation functions from aggregation 
functions can be automated,
so I believe this issue can be resolved. However, automating it may increase 
the size of the patch
even more, so overall, approach#2 might be better.
To implement approach #2, it would be necessary to investigate how much 
additional code is required.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-11-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Haas, hackers.

> From: Bruce Momjian 
> Sent: Thursday, November 23, 2023 6:16 AM
> On Wed, Nov 22, 2023 at 10:16:16AM +, 
> fujii.y...@df.mitsubishielectric.co.jp wrote:
> > 2. Approach 2
> > (1) Advantages
> > (a) No need to add partial aggregate functions to the catalogs for
> > each aggregation
> > (2) Disadvantages
> > (a) Need to add non-standard keywords to the SQL syntax.
> >
> > I did not choose Approach2 because I was not confident that the
> > disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL 
> > development community.
> > If it is accepted, I think Approach 2 is smarter.
> > Could you please provide your opinion on which approach is preferable
> > after comparing these two approaches?
> 
> I didn't know #2 was possible, but given the great number of catalog entries, 
> doing it in the SQL grammar seems cleaner
> to me.
Thank you for comments. Yes, I understand.

> From: Bruce Momjian 
> Sent: Wednesday, November 22, 2023 5:34 AM
> On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote:
> > On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian  wrote:
> > > > I do have a concern about this, though. It adds a lot of bloat. It
> > > > adds a whole lot of additional entries to pg_aggregate, and every
> > > > new aggregate we add in the future will require a bonus entry for
> > > > this, and it needs a bunch of new pg_proc entries as well. One
> > > > idea that I've had in the past is to instead introduce syntax that
> > > > just does this, without requiring a separate aggregate definition in 
> > > > each case.
> > > > For example, maybe instead of changing string_agg(whatever) to
> > > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > > > something. Then all aggregates could be treated in a generic way.
> > > > I'm not completely sure that's better, but I think it's worth 
> > > > considering.
> > >
> > > So use an SQL keyword to indicates a pushdown call?  We could then
> > > automate the behavior rather than requiring special catalog functions?
> >
> > Right. It would require more infrastructure in the parser, planner,
> > and executor, but it would be infinitely reusable instead of needing a
> > new thing for every aggregate. I think that might be better, but to be
> > honest I'm not totally sure.
> 
> It would make it automatic.  I guess we need to look at how big the patch is 
> to do it.
I will investigate specifically which parts of the PostgreSQL source code need 
to be modified and how big the patch will be if you take this approach.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-11-22 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas 
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a 
> whole lot of additional entries to pg_aggregate, and
> every new aggregate we add in the future will require a bonus entry for this, 
> and it needs a bunch of new pg_proc entries
> as well. One idea that I've had in the past is to instead introduce syntax 
> that just does this, without requiring a separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to 
> string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. 
> Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's 
> worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the 
approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, 
pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation 
requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation 
functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and 
pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to 
manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain 
aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related 
functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the 
pg_aggregate catalog and pg_proc catalog from pg_aggregate.dat and pg_proc.dat
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each 
aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage 
mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as 
Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-10-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Momjian.

Thank you for your improvement.
As a matter of detail, I think that the areas marked below are erroneous.

--
+   Pushdown causes aggregate function cals to send partial aggregate
 ^
+   function calls to the remote server. If the partial aggregate
+   function doesn't doesn't exist on the remote server, it causes
 ^^^
--

> What else needs to be done before committers start to review
> this?
There are no others. May I make a new version of v31 with your
suggested improvements for the committer's review?

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-10-23 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

> Fujii-san, to get this patch closer to finished, can I modify this version of 
> the patch to improve some wording and post an
> updated version you can use for future changes?
Yes, I greatly appreciate your offer.
I would very much appreciate your modifications.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-07-17 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> From: Alexander Pyhalov 
> Sent: Friday, July 14, 2023 10:40 PM
> 1) In foreign_join_ok() should we set fpinfo->user if
> fpinfo->check_partial_aggregate_support is set like it's done for 
> fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user 
> fpinfo->=
> NULL if use_remote_estimate is not set.
You are right. I will modify this patch according to your advice.
Thank you for advice.

> 2) It seeems we found an additional issue with original patch, which 
> is present in current one. I'm attaching a patch which seems to fix 
> it, but I'm not quite sure in it.
Thank you for pointing out the issue.
If a query's group-by clause contains variable based expression(not variable)
and the query's select clause contains another expression,
the partial aggregate could be unsafe to push down.

An example of such queries:
SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2

Your patch disables partial aggregate pushdown for such queries.
I'll see if we can modify the patch to safely do a partial aggregate pushdown 
for such queries as well.
Such a query expects the variable in the select clause expression to be 
included in the target of the grouped rel
(let see make_partial_grouping_target), 
but the original groupby clause has no reference to this variable,
this seems to be the direct cause(let see foreign_grouping_ok). 
I will examine whether a safe pushdown can be achieved by matching the
groupby clause information referenced by foreign_grouping_ok with the grouped 
rel target information.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-06-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Pyhalov, hackers.

> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
Thank you both for your advice.
We will address the compatibility issues as follows.

Approach1-3:
I will add a postgres_fdw option "check_partial_aggregate_support".
This option is false, default.
Only if this option is true, postgres_fdw connect to the remote server and get 
the version of the remote server.
And if the version of the remote server is less than PG17, then partial 
aggregate push down to the remote server is disable.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation

> -Original Message-
> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> To: Alexander Pyhalov 
> Cc: Fujii Yuki/藤井 雄規(MELCO/情報総研 DM最適G) 
> ;
> PostgreSQL-development ; Andres Freund 
> ; Tom Lane
> ; Tomas Vondra ; Julien 
> Rouhaud ;
> Daniel Gustafsson ; Ilya Gladyshev 
> 
> Subject: Re: Partial aggregates pushdown
> 
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
> 
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
> 
>   Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

Thank you for advises.

> From: Bruce Momjian 
> Sent: Monday, June 12, 2023 10:38 PM
> > I understand what the problem is. I will put a mechanism maintaining 
> > compatibility into the patch.
> > I believe there are three approaches.
> > Approach 1-1 is preferable because it does not require additional options 
> > for postgres_fdw.
> > I will revise the patch according to Approach 1-1, unless otherwise 
> > commented.
> >
> > Approach1:
> > I ensure that postgres_fdw retrieves the version of each remote server
> > and does not partial aggregate pushd down if the server version is less 
> > than 17.
> > There are two approaches to obtaining remote server versions.
> > Approach1-1: postgres_fdw connects a remote server and use 
> > PQserverVersion().
> > Approach1-2: Adding a postgres_fdw option about a remote server version 
> > (like "server_version").
> >
> > Approach2:
> > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > or not (like enable_partial_aggregate_pushdown).
> 
> These are good questions.  Adding a postgres_fdw option called 
> enable_partial_aggregate_pushdown helps make the
> purpose of the option clear, but remote_version can be used for future 
> breakage as well.
> 
> I think remote_version is the best idea, and in the documention for the 
> option, let's explcitly say it is useful to disable
> partial aggreates pushdown on pre-PG 17 servers.  If we need to use the 
> option for other cases, we can just update the
> documentation.  When the option is blank, the default, everything is pushed 
> down.
> 
> I see remote_version a logical addition to match our "extensions" option that 
> controls what extension functions can be
> pushed down.

Thank you for your perspective.
So, of the approaches I have presented, you think that approach 1-2 is
preferable and that the option name remote_server is preferable?
Indeed, the option of a remote version may have other uses.
However, this information can be obtained by connecting to a remote server, 
I'm concerned that some people may find this option redundant.

Is the problem with approach 1-1 because the user cannot decide whether to 
include the compatibility check in the decision to do partial aggregate 
pushdown or not?
# If Approach 1-1 is taken, the problem is that this feature cannot be used for 
all buit-in aggregate functions
# when the remote server is older than PG17.
If so, Approache1-3 below seem more desirable.
Would it be possible for us to hear your thoughts?

Approache1-3:We add a postgres_fdw option about a compatibility check for 
partial aggregate pushdown
(like "enable_aggpartialfunc_compatibility_check"). This option is false, the 
default.
When this option is true, postgres_fdw obtains the remote server version by 
connecting the remote server and using PQserverVersion(). 
And if the remote server version is older than PG17, then the partial aggregate 
pushdown feature is enable for all buit-in aggregate functions.
Otherwise the partial aggregate pushdown feature is disable for all buit-in 
aggregate functions.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Bruce, Mr.Pyhalov, hackers.

Thank you for comments. I will try to respond to both of your comments as 
follows.
I plan to start revising the patch next week. If you have any comments on the 
following
respondences, I would appreciate it if you could give them to me this week.

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> I agree that this feature is designed for built-in sharding, but it is 
> possible people could be using aggregates on partitions
> backed by foreign tables without sharding.  Adding a requirement for 
> non-sharding setups to need PG 17+ servers might
> be unreasonable.
Indeed, it is possible to use partial aggregate pushdown feature for purposes 
other than sharding.
The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
assumes the use of
Built-in sharding and will be modified to eliminate this assumption.
The title of this section should be changed to something like "Aggregate on 
partitioned table".

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> Looking at previous release note incompatibilities, we don't normally change 
> non-administrative functions in a way that
> causes errors if run on older servers.  Based on Alexander's observations, I 
> wonder if we need to re-add the postgres_fdw
> option to control partial aggregate pushdown, and default it to enabled.
> 
> If we ever add more function breakage we might need more postgres_fdw 
> options.  Fortunately, such changes are rare.

I understand what the problem is. I will put a mechanism maintaining 
compatibility into the patch.
I believe there are three approaches.
Approach 1-1 is preferable because it does not require additional options for 
postgres_fdw.
I will revise the patch according to Approach 1-1, unless otherwise commented.

Approach1:
I ensure that postgres_fdw retrieves the version of each remote server
and does not partial aggregate pushd down if the server version is less than 17.
There are two approaches to obtaining remote server versions.
Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
Approach1-2: Adding a postgres_fdw option about a remote server version (like 
"server_version").

Approach2:
Adding a postgres_fdw option for partial aggregate pushdown is enable or not
(like enable_partial_aggregate_pushdown).

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-13 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

> > There is one more thing I would like your opinion on.
> > As the major version of PostgreSQL increase, it is possible that the
> > new builtin aggregate functions are added to the newer PostgreSQL.
> > This patch assume that aggpartialfns definitions exist in BKI files.
> > Due to this assumption, someone should add aggpartialfns definitions of
> new builtin aggregate functions to BKI files.
> > There are two possible ways to address this issue. Would the way1 be
> sufficient?
> > Or would way2 be preferable?
> >   way1) Adding documentaion for how to add these definitions to BKI files
> >   way2) Improving this patch to automatically add these definitions to BKI
> files by some tool such as initdb.
> 
> I think documentation is sufficient.  You already showed that someone can do
> this with CREATE AGGREGATE for non-builtin aggregates.
Thank you for your opinion. I will modify this patch according to the way1.

> > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and
> > > just make it automatic --- we push down builtin partial aggregates
> > > if the remote server is the same or newer _major_ version than the
> > > sending server.  For extensions, if people have older extensions on
> > > the same or newer foreign servers, they can adjust 'extensions' above.
> > Okay, I understand. I will remove PARTIALAGG_MINVERSION option from
> > the patch and I will add check whether aggpartialfn depends on some
> > extension which is containd in extensions list of the postgres_fdw's foreign
> server.
> 
> Yes, good.  Did we never push down aggregates before?  I thought we
> pushed down partitionwise aggregates already, and such a check should
> already be done.  If the check isn't there, it should be.
Yes. The last version of this patch(and original postgres_fdw) checks if
user-defined aggregate depends some extension which is contained in 
'extensions'.
But, in the last version of this patch, there is no such check for 
aggpartialfn of user-defined aggregate. So, I will add such check to this 
patch. 
I think that this modification is easy to do . 

> In summary, we don't do any version check for built-in function pushdown, so
> we don't need it for aggregates either.  We check extension functions against
> the extension pushdown list, so we should be checking this for partial
> aggregate pushdown, and for partition-wise aggregate pushdown.  If we don't
> do that last check already, it is a bug.
I understood.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-09 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Lane, Mr.Freund.

Thank you for advices.

> From: Bruce Momjian 
> > > > 2. Automation of creating definition of partialaggfuncs In
> > > > development of v17, I manually create definition of
> > > > partialaggfuncs for avg, min, max, sum,
> > > count.
> > > > I am concerned that this may be undesirable.
> > > > So I am thinking that v17 should be modified to automate creating
> > > > definition of partialaggfuncs for all built-in aggregate functions.
> > >
> > > Are there any other builtin functions that need this?  I think we
> > > can just provide documention for extensions on how to do this.
> > For practical purposes, it is sufficient if partial aggregate for the
> > above functions can be pushed down.
> > I think you are right, it would be sufficient to document how to
> > achieve  partial aggregate pushdown for other built-in functions.
> 
> Uh, we actually want the patch to implement partial aggregate pushdown for all
> builtin data types that can support it.  Is that done?  I think it is only 
> extension
> aggregates, which we do not control, that need this documentation.
The last version of this patch can't pushdown partial aggregate for all builtin 
aggregate functions that can support it.
I will improve this patch to pushdown partial aggregate for all builtin 
aggregate functions
that can support it.

There is one more thing I would like your opinion on.
As the major version of PostgreSQL increase, it is possible that
the new builtin aggregate functions are added to the newer PostgreSQL.
This patch assume that aggpartialfns definitions exist in BKI files.
Due to this assumption, someone should add aggpartialfns definitions of new 
builtin aggregate functions to BKI files.
There are two possible ways to address this issue. Would the way1 be sufficient?
Or would way2 be preferable?
  way1) Adding documentaion for how to add these definitions to BKI files
  way2) Improving this patch to automatically add these definitions to BKI 
files by some tool such as initdb.

> From: Bruce Momjian 
> On Fri, Apr  7, 2023 at 09:16:14PM -0700, Andres Freund wrote:
> > On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote:
> > > > postgres_fdw has no business pushing down calls to non-builtin
> > > > functions unless the user has explicitly authorized that via the
> > > > existing whitelisting mechanism.  I think you're reinventing the
> > > > wheel, and not very well.
> > >
> > > The patch has you assign an option at CREATE AGGREGATE time if it
> > > can do push down, and postgres_fdw checks that.  What whitelisting
> > > mechanism are you talking about?  async_capable?
> >
> > extensions (string)
> >
> > This option is a comma-separated list of names of PostgreSQL
> extensions that are installed, in compatible versions, on both the local and
> remote servers. Functions and operators that are immutable and belong to a
> listed extension will be considered shippable to the remote server. This 
> option
> can only be specified for foreign servers, not per-table.
> >
> > When using the extensions option, it is the user's responsibility that 
> > the
> listed extensions exist and behave identically on both the local and remote
> servers. Otherwise, remote queries may fail or behave unexpectedly.
> 
> Okay, this is very helpful --- it is exactly the issue we are dealing with 
> --- how
> can we know if partial aggregate functions exists on the remote server.  (I
> knew I was going to need API help on this.)
> 
> So, let's remove the PARTIALAGG_MINVERSION option from the patch and just
> make it automatic --- we push down builtin partial aggregates if the remote
> server is the same or newer _major_ version than the sending server.  For
> extensions, if people have older extensions on the same or newer foreign
> servers, they can adjust 'extensions' above.
Okay, I understand. I will remove PARTIALAGG_MINVERSION option from the patch
and I will add check whether aggpartialfn depends on some extension which
is containd in extensions list of the postgres_fdw's foreign server.
In the next version of this patch,
we can pushdown partial aggregate for an user-defined aggregate function only 
when the function pass through this check.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian

> First, my apologies for not addressing this sooner.  I was so focused on my
> own tasks that I didn't realize this very important patch was not getting
> attention.  I will try my best to get it into PG 17.
Thank you very much for your comments. 
I will improve this patch for PG17.
I believe that this patch will help us use PostgreSQL's built-in sharding for 
OLAP.

> What amazes me is that you didn't need to create _any_ actual aggregate
> functions.  Rather, you just needed to hook existing functions into the
> aggregate tables for partial FDW execution.
Yes. This patch enables partial aggregate pushdown using 
only existing functions which belong to existing aggregate function
and are needed by parallel query(such as state transition function and 
serialization function).
This patch does not need new types of function belonging to aggregate functions
and does not need new functions belonging to existing aggregate functions.

> I suggest we remove the version check requirement --- instead just document
> that the FDW Postgres version should be the same or newer than the calling
> Postgres server --- that way, we can assume that whatever is in the system
> catalogs of the caller is in the receiving side.  
Thanks for the comment. I will modify this patch according to your comment.

> We should add a GUC to turn off
> this optimization for cases where the FDW Postgres version is older than the
> caller.  This handles case 1-2.
Thanks for the advice here too.
I thought it would be more appropriate to add a foregin server option of 
postgres_fdw rather than adding GUC. 
Would you mind if I ask you what you think about it?

> > 2. Automation of creating definition of partialaggfuncs In development
> > of v17, I manually create definition of partialaggfuncs for avg, min, max, 
> > sum,
> count.
> > I am concerned that this may be undesirable.
> > So I am thinking that v17 should be modified to automate creating
> > definition of partialaggfuncs for all built-in aggregate functions.
> 
> Are there any other builtin functions that need this?  I think we can just
> provide documention for extensions on how to do this.
For practical purposes, it is sufficient 
if partial aggregate for the above functions can be pushed down.
I think you are right, it would be sufficient to document how to achieve
 partial aggregate pushdown for other built-in functions.

> > 3. Documentation
> > I need add explanation of partialaggfunc to documents on postgres_fdw and
> other places.
> 
> I can help with that once we decide on the above.
Thank you. In the next verion of this patch, I will add documents on 
postgres_fdw
and other places. 

> I think 'partialaggfn' should be named 'aggpartialfn' to match other columns 
> in
> pg_aggregate.
Thanks for the comment. I will modify this patch according to your comment.

> For case 3, I don't even know how much pushdown those do of _any_
> aggregates to non-PG servers, let along parallel FDW ones.  Does anyone
> know the details?
To allow partial aggregate pushdown for non-PG FDWs,
I think we need to add pushdown logic to their FDWs for each function.
For example, we need to add logic avg() -> sum()/count() to their FDWs for avg.
To allow parallel partial aggregate by non-PG FDWs,
I think we need to add FDW Routines for Asynchronous Execution to their FDWs[1].

> I am confused by these changes to pg_aggegate:
> 
> +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
> +  aggfinalfn => 'int8_avg_serialize', aggcombinefn =>
> +'int8_avg_combine',
> +  aggserialfn => 'int8_avg_serialize', aggdeserialfn =>
> +'int8_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '48' },
> 
> ...
> 
> +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
> +  aggfinalfn => 'numeric_avg_serialize', aggcombinefn =>
> +'numeric_avg_combine',
> +  aggserialfn => 'numeric_avg_serialize',
> +  aggdeserialfn => 'numeric_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '128' },
> 
> Why are these marked as 'sum' but use 'avg' functions?
This reason is that sum(int8)/sum(numeric) shares some functions with 
avg(int8)/avg(numeric),
and sum_p_int8 is aggpartialfn of sum(int8) and sum_p_numeric is aggpartialfn 
of sum(numeric).

--Part of avg(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
--

--Part of sum(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'sum(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 

RE: Partial aggregates pushdown

2023-03-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian

> First, am I correct?
Yes, you are correct. This patch uses new special aggregate functions for 
partial aggregate
(then we call this partialaggfunc).

> Second, how far away is this from being committable
> and/or what work needs to be done to get it committable, either for PG 16 or 
> 17?
I believe there are three: 1. and 2. are not clear if they are necessary or 
not; 3. are clearly necessary.
I would like to hear the opinions of the development community on whether or 
not 1. and 2. need to be addressed.

1. Making partialaggfunc user-defined function
In v17, I make partialaggfuncs as built-in functions.
Because of this approach, v17 changes specification of BKI file and 
pg_aggregate.
For now, partialaggfuncs are needed by only postgres_fdw which is just an 
extension of PostgreSQL.
In the future, when revising the specifications for BKI files and pg_aggregate 
when modifying existing PostgreSQL functions,
It is necessary to align them with this patch's changes.
I am concerned that this may be undesirable.
So I am thinking that v17 should be modified to making partialaggfunc as user 
defined function.

2. Automation of creating definition of partialaggfuncs
In development of v17, I manually create definition of partialaggfuncs for avg, 
min, max, sum, count.
I am concerned that this may be undesirable.
So I am thinking that v17 should be modified to automate creating definition of 
partialaggfuncs
for all built-in aggregate functions.

3. Documentation
I need add explanation of partialaggfunc to documents on postgres_fdw and other 
places.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2022-12-15 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Freund.

> cfbot complains about some compiler warnings when building with clang:
> https://cirrus-ci.com/task/6606268580757504
> 
> deparse.c:3459:22: error: equality comparison with extraneous parentheses
> [-Werror,-Wparentheses-equality]
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
>  ~~~^~
> deparse.c:3459:22: note: remove extraneous parentheses around the
> comparison to silence this warning
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> ~   ^ ~
> deparse.c:3459:22: note: use '=' to turn this equality comparison into an
> assignment
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> ^~
> =
I fixed this error.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v17.patch
Description: 0001-Partial-aggregates-push-down-v17.patch


RE: Add semi-join pushdown to postgres_fdw

2022-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for fixing it and giving more explanation.

> IIRC, planner can create semi-join, which targetlist references Vars 
> from inner join relation. However, it's deparsed as exists and so we 
> can't reference it from SQL. So, there's this check - if Var is 
> referenced in semi-join target list, it can't be pushed down.
> You can see this if comment out this check.
> 
> EXPLAIN (verbose, costs off)
>  SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
>  (SELECT * FROM ft4 WHERE EXISTS (
>  SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
>  ON ft2.c2 = ft4.c1
>  INNER JOIN
>  (SELECT * FROM ft2 WHERE EXISTS (
>  SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
>  ON ft2.c2 = ft21.c2
>  WHERE ft2.c1 > 900
>  ORDER BY ft2.c1 LIMIT 10;
> 
> will fail with
> EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT 
> NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2
> 
> Here you can see that
> SELECT * FROM ft2 WHERE EXISTS (
>  SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)
> 
> was transformed to
> SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL 
> FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2
> 
> where our exists subquery is referenced from tlist. It's fine for plan 
> (relations, participating in semi-join, can be referenced in tlist), 
> but is not going to work with EXISTS subquery.
> BTW, there's a comment in joinrel_target_ok(). It tells exactly that -
> 
> 5535 if (jointype == JOIN_SEMI &&
> bms_is_member(var->varno,
> innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
> 5536 {
> 5537 /* We deparse semi-join as exists() subquery, and
> so can't deparse references to inner rel in join target list. */
> 5538 ok = false;
> 5539 break;
> 5540 }
> 
> Expanded comment.
Thank you for expanding your comment and giving examples. 
Thanks to the above examples, I understood in what case planner wolud create 
semi-join, 
which targetlist references Vars from inner join relation.

> > question2) In foreign_join_ok
> >   > * Constructing queries representing ANTI joins is hard, hence
> >   Is this true? Is it hard to expand your approach to ANTI join 
> > pushdown?
> 
> I haven't tried, so don't know.
I understand the situation.

> The naming means additional conditions (for WHERE clause, by analogy 
> with ignore_conds and remote_conds). Not sure if subquery_expr sounds 
> better, but if you come with better idea, I'm fine with renaming them.
Sure.

> > question4) Although really detail, there is expression making space 
> > such as
> >   "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
> >   Is there reason for this difference? If not, need we use same 
> > policy for making space?
Thank you.

Later, I'm going to look at other part of your patch.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2022-12-04 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> Attaching minor fixes. I haven't proof-read all comments (but perhaps, they
> need attention from some native speaker).
Thank you. I fixed according to your patch.
And I fixed have proof-read all comments and messages.

> Tested it with queries from
> https://github.com/swarm64/s64da-benchmark-toolkit, works as expected.
Thank you for additional tests.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v16.patch
Description: 0001-Partial-aggregates-push-down-v16.patch


RE: Add semi-join pushdown to postgres_fdw

2022-12-02 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for work on this useful patch.
I'm starting to review v2 patch.
I have cheked we can apply v2 patch to commit 
ec386948948c1708c0c28c48ef08b9c4dd9d47cc
(Date:Thu Dec 1 12:56:21 2022 +0100).
I briefly looked at this whole thing and did step execute this
by running simple queries such as the followings.

query1) select * from f_t1 a1 where a1.c1 in (select c1 from f_t2);
query2) select * from f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1 where a1.c1 in 
(select c1 from f_t3) ;
query3) update f_t2 set c1 = 1 from f_t1 a1 where a1.c2 = f_t2.c2 and exists 
(select null from f_t2 where c1 = a1.c1);

Although I haven't seen all of v2 patch, for now I have the following questions.

question1) 
  > + if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) 
&& !bms_is_member(var->varno, outerrel->relids))
  It takes time for me to find in what case this condition is true.
  There is cases in which this condition is true for semi-join of two baserels 
  when running query which joins more than two relations such as query2 and 
query3.
  Running queries such as query2, you maybe want to pushdown of only semi-join 
path of 
  joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1) and 
baserel(innerrel) f_t3 
  because of safety deparse. So you add this condition.
  Becouase of this limitation, your patch can't push down subquery expression 
  "exists (select null from f_t2 where c1 = a1.c1)" in query3.
  I think, it is one of difficulty points for semi-join pushdown.
  This is my understanding of the intent of this condition and the restrictions 
imposed by this condition.
  Is my understanding right?
  I think if there are comments for the intent of this condition and the 
restrictions imposed by this condition  
  then they help PostgreSQL developper. What do you think?

question2) In foreign_join_ok
  > * Constructing queries representing ANTI joins is hard, hence
  Is this true? Is it hard to expand your approach to ANTI join pushdown?

question3) You use variables whose name is "addl_condXXX" in the following code.
  > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s", 
join_sql_i.data);
  Does this naming mean additional literal?
  Is there more complehensive naming, such as "subquery_exprXXX"?

question4) Although really detail, there is expression making space such as
  "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
  Is there reason for this difference? If not, need we use same policy for 
making space?

Later, I'm going to look at part of your patch which is used when running more 
complex query.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2022-11-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> One more issue I started to think about - now we don't check
> partialagg_minversion for "simple" aggregates at all. Is it correct? It seems 
> that ,
> for example, we could try to pushdown bit_or(int8) to old servers, but it 
> didn't
> exist, for example, in 8.4.  I think it's a broader issue (it would be also 
> the case
> already if we push down
> aggregates) and shouldn't be fixed here. But there is an issue -
> is_shippable() is too optimistic.
I think it is correct for now.
F.38.7 of [1] says "A limitation however is that postgres_fdw generally assumes 
that 
immutable built-in functions and operators are safe to send to the remote 
server for 
execution, if they appear in a WHERE clause for a foreign table." and says that 
we can 
avoid this limitation by rewriting query.
It looks that postgres_fdw follows this policy in case of UPPERREL_GROUP_AGG 
aggregate pushdown.
If a aggreagate has not internal aggtranstype and has not aggfinalfn ,
partialaggfn of it is equal to itself.
So I think that it is adequate to follow this policy in case of partial 
aggregate pushdown
 for such aggregates.

> >> 2) Do we really have to look at pg_proc in partial_agg_ok() and
> >> deparseAggref()? Perhaps, looking at aggtranstype is enough?
> > You are right. I fixed according to your comment.
> >
> 
> partial_agg_ok() still looks at pg_proc.
Sorry for taking up your time. I fixed it.

[1] https://www.postgresql.org/docs/current/postgres-fdw.html

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v15.patch
Description: 0001-Partial-aggregates-push-down-v15.patch


RE: Partial aggregates pushdown

2022-11-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> 1) In previous version of the patch aggregates, which had partialaggfn, were 
> ok
> to push down. And it was a definite sign that aggregate can be pushed down.
> Now we allow pushing down an aggregate, which prorettype is not internal and
> aggfinalfn is not defined. Is it safe for all user-defined (or builtin) 
> aggregates,
> even if they are generally shippable? Aggcombinefn is executed locally and we
> check that aggregate function itself is shippable. Is it enough? Perhaps, we
> could use partialagg_minversion (like aggregates with partialagg_minversion
> == -1 should not be pushed down) or introduce separate explicit flag?
In what case partial aggregate pushdown is unsafe for aggregate which has not 
internal aggtranstype
 and has no aggfinalfn?
By reading [1], I believe that if aggcombinefn of such aggregate recieves 
return values of original
 aggregate functions in each remote then it must produce same value that would 
have resulted 
 from scanning all the input in a single operation.

> 2) Do we really have to look at pg_proc in partial_agg_ok() and
> deparseAggref()? Perhaps, looking at aggtranstype is enough?
You are right. I fixed according to your comment.

> 3) I'm not sure if CREATE AGGREGATE tests with invalid
> PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw
> tests or better should be moved to src/test/regress/sql/create_aggregate.sql,
> as they are not specific to postgres_fdw
Thank you. I moved these tests to src/test/regress/sql/create_aggregate.sql.

[1] https://www.postgresql.org/docs/15/xaggr.html#XAGGR-PARTIAL-AGGREGATES

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v14.patch
Description: 0001-Partial-aggregates-push-down-v14.patch


RE: Partial aggregates pushdown

2022-11-29 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for comments.

> I've looked through the patch. Overall I like this approach, but have
> the following comments.
> 
> 1) Why should we require partialaggfn for min()/max()/count()? We could
> just use original functions for a lot of aggregates, and so it would be
> possible to push down some partial aggregates to older servers. I'm not
> sure that it's a strict requirement, but a nice thing to think about.
> Can we use the function itself as partialaggfn, for example, for
> sum(int4)?
> For functions with internal aggtranstype (like sum(int8) it
> would be more difficult).
Thank you. I realized that partial aggregate pushdown is fine 
without partialaggfn if original function has no aggfinalfn and 
aggtranstype of it is not internal. So I have improved v12 by
this realization.
However, v13 requires partialaggfn for aggregate if it has aggfinalfn or 
aggtranstype of it is internal such as sum(int8). 

> 2) fpinfo->server_version is not aggregated, for example, when we form
> fpinfo in foreign_join_ok(), it seems we should spread it in more places
> in postgres_fdw.c.
I have responded to your comment by adding copy of server_version in 
merge_fdw_options.
 
> 3) In add_foreign_grouping_paths() it seems there's no need for
> additional argument, we can look at extra->patype. Also Assert() in
> add_foreign_grouping_paths() will fire in --enable-cassert build.
I have fixed according to your comment.

> 4) Why do you modify lookup_agg_function() signature? I don't see tests,
> showing that it's neccessary. Perhaps, more precise function naming
> should be used instead?
I realized that there is no need of modification lookup_agg_function().
Instead, I use LookupFuncName().

> 5) In tests:
>   - Why version_num does have "name" type in
> f_alter_server_version() function?
>   - You modify server_version option of 'loopback' server, but
> don't reset it after test. This could affect further tests.
>   - "It's unsafe to push down partial aggregates with distinct"
> in postgres_fdw.sql:3002 seems to be misleading.
> 3001
> 3002 -- It's unsafe to push down partial aggregates with distinct
> 3003 SELECT f_alter_server_version('loopback', 'set', -1);
I have fixed according to your comment.

> 6) While looking at it, could cause a crash with something like
I have fixed this problem by using LookupFuncName() instead of 
lookup_agg_function.

The following is readme of v13.
--readme of Partial aggregates push down v13
1. interface
1) pg_aggregate
There are the following additional columns.
a) partialaggfn
  data type: regproc.
  default value: zero(means invalid).
  description  : This field refers to the special aggregate function(then we 
call
 this partialaggfunc)
corresponding to aggregation function(then we call src) which has aggfnoid.
partialaggfunc is used for partial aggregation pushdown by postgres_fdw.
The followings are differences between the src and the special aggregate 
function.
  difference1) result type
The result type is same as the src's transtype if the src's transtype
is not internal.
Otherwise the result type is bytea.
  difference2) final func
The final func does not exist if the src's transtype is not internal.
Otherwize the final func returns serialized value.
For example, there is a partialaggfunc avg_p_int4 which corresponds to 
avg(int4)
whose aggtranstype is _int4.
The result value of avg_p_int4 is a float8 array which consists of count 
and 
summation. avg_p_int4 does not have finalfunc.
For another example, there is a partialaggfunc avg_p_int8 which corresponds 
to 
avg(int8) whose aggtranstype is internal.
The result value of avg_p_int8 is a bytea serialized array which consists 
of count 
and summation. avg_p_int8 has finalfunc int8_avg_serialize which is 
serialize function
of avg(int8). This field is zero if there is no partialaggfunc.

b) partialagg_minversion
  data type: int4.
  default value: zero(means current version).
  description  : This field is the minimum PostgreSQL server version which has 
partialaggfunc. This field is used for checking compatibility of 
partialaggfunc.

The above fields are valid in tuples for builtin avg, sum, min, max, count.
There are additional records which correspond to partialaggfunc for avg, sum, 
min, max, count.

2) pg_proc
There are additional records which correspond to partialaggfunc for avg, sum, 
min, max, count.

3) postgres_fdw
postgres_fdw has an additional foreign server option server_version. 
server_version is 
integer value which means remote server version number. Default value of 
server_version 
is zero. server_version is used for checking compatibility of partialaggfunc.

2. feature
Partial aggregation pushdown is fine when either of the following conditions is 
true.
  condition1) aggregate function has not internal aggtranstype and has no 
aggfinalfn.
  condition2) 

RE: Partial aggregates pushdown

2022-07-31 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Vondra, Mr.Pyhalov.

I'm interesied in Mr.Pyhalov's patch due to the following background.
--Background
I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 
I think the function of a previous patch "Push aggregation down to base 
relations and joins"[1] is desiable. I rebased the previous patch and register 
the rebased patch on the next commitfest[2].
And I think it would be more useful if the previous patch works on a foreign 
table of postgres_fdw.
I realized the function of partial aggregation pushdown is necessary  to make 
the previous patch work on a foreign table of postgres_fdw.
--

So I reviewed Mr.Pyhalov's patch and discussions on this thread.
I made a draft of approach to respond to Mr.Vondra's comments.
Would you check whether my draft is right or not?

--My draft
> 1) It's not clear to me how could this get extended to aggregates with 
> more complex aggregate states, to support e.g. avg() and similar 
> fairly common aggregates.
We add a special aggregate function every aggregate function (hereafter we call 
this src)  which supports partial aggregation.
The followings are differences between the src and the special aggregate 
function.
difference1) result type
The result type is same with the src's transtype if the src's transtype is not 
internal.
Otherwise the result type is bytea.

difference2) final func
The final func does not exist if the src's transtype is not internal.
Otherwize the final func returns serialized value.

For example, let me call the special aggregate function of avg(float8) 
avg_p(float8).
The result value of avg_p is a float8 array which consists of count and 
summation.
avg_p does not have finalfunc.

We pushdown the special aggregate function instead of a src.
For example, we issue "select avg_p(c) from t" instead of "select avg(c) from t"
in the above example.

We add a new column partialaggfn to pg_aggregate to get the oid of  the special 
aggregate function from the the src's oid.
This column is the oid of the special aggregate function which corresponds to 
the src.

If an aggregate function does not have any special aggregate function,  then we 
does not pushdown any partial aggregation of the aggregate function.

> 2) I'm not sure relying on aggpartialpushdownsafe without any version 
> checks etc. is sufficient. I mean, how would we know the remote node 
> has the same idea of representing the aggregate state. I wonder how 
> this aligns with assumptions we do e.g. for functions etc.
We add compatible server versions infomation to pg_aggregate and  the set of 
options of postgres_fdw's foreign server.
We check compatibility of an aggregate function using this infomation.

An additional column of pg_aggregate is compatibleversonrange.
This column is a range of postgresql server versions which  has compatible 
aggregate function.
An additional options of postgres_fdw's foreign server are serverversion and 
bwcompatibleverson.
serverversion is remote postgresql server version.
bwcompatibleverson is the maximum version in which any aggregate function is 
compatible with local noed's one.
Our version check passes if and only if at least one of the following 
conditions is true.
condition1) the option value of serverversion is in compatibleversonrange.
condition2) the local postgresql server version is between bwcompatibleverson 
and the option value of serverversion.

We can get the local postgresql server version from PG_VERSION_NUM macro.
We use condition1 if the local postgresql server version is not more than the 
remote one.
and use condition2 if the local postgresql server version is greater than the 
remote one.
--

Sincerely yours,
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/1247/
[2] https://commitfest.postgresql.org/39/3764/

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


WIP: Aggregation push-down - take2

2022-04-15 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi everyone.

I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 
After [1] having been withdrawn, I reviewed [1].
I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign 
table.
So, I would like to ask you a question on this patch in this new thread.

I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and 
this patch. 
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a 
table and aggregation of the join result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partial aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of 
accept of this patch's function.
I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any 
reviewers / committers are 
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation


v17-0004-pushdown-aggregation-foreign-table.patch
Description: v17-0004-pushdown-aggregation-foreign-table.patch
1. Restrictions of my prototype
(1) aggregate functions are avg, sum, count, min, max
(2) argment or sum of avg is bigint var

2. My experiment settings
(1) hardware settings
  cpu:Intel Corei5-9500 processor(3.0 GHz/9MB cache)(6 cores)
  ram:DDR4 3200Mhz 128GB
  storage:512GB SSD(M.2 NVMe)
(2) software
  os: CentOS7
  postgresql source code:14(in development) with commit-id 
947456a823d6b0973b68c6b38c8623a0504054e7
  # build by "make world"
  patchs in [1] I applied:
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
  [1] https://commitfest.postgresql.org/32/
(3) PostgreSQL settings
  max_parallel_workers_per_gather=6
  work_mem=100
(4) tables and data
  execute the following commands using postgresql's client tool such as psql
--
create database tmp;
\c tmp;
create table log as select 1::bigint as id, 1::bigint as size, 1::bigint as 
total from generate_series(1,10) t;

create table master as select t as id, 'hoge' || mod(t, 1000) as name from 
generate_series(1,100) t;
create index master_idx on master(id);

create extension postgres_fdw;
create server local_server foreign data wrapper postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'tmp');
create user mapping for user server local_server options (user 'postgres', 
password 'postgres');
create foreign table f_log(id bigint, size bigint, total bigint) server 
local_server options (table_name 'log');

analyze;

set enable_agg_pushdown=true;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;

set enable_agg_pushdown=false;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;
--


RE: WIP: Aggregation push-down

2022-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi everyone.

I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 

I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign 
table.
Actually, I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and 
this patch. 
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a 
table and aggregation of the josin result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partioal aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of 
accept of this patch's function.
. I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any 
reviewers / committers are 
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation



v17-0004-pushdown-aggregation-foreign-table.patch
Description: v17-0004-pushdown-aggregation-foreign-table.patch
1. Restrictions of my prototype
(1) aggregate functions are avg, sum, count, min, max
(2) argment or sum of avg is bigint var

2. My experiment settings
(1) hardware settings
  cpu:Intel Corei5-9500 processor(3.0 GHz/9MB cache)(6 cores)
  ram:DDR4 3200Mhz 128GB
  storage:512GB SSD(M.2 NVMe)
(2) software
  os: CentOS7
  postgresql source code:14(in development) with commit-id 
947456a823d6b0973b68c6b38c8623a0504054e7
  # build by "make world"
  patchs I applied:
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
(3) PostgreSQL settings
  max_parallel_workers_per_gather=6
  work_mem=100
(4) tables and data
  execute the following commands using postgresql's client tool such as psql
--
\c tmp;
create table log as select 1::bigint as id, 1::bigint as size, 1::bigint as 
total from generate_series(1,10) t;

create table master as select t as id, 'hoge' || mod(t, 1000) as name from 
generate_series(1,100) t;
create index master_idx on master(id);

create extension postgres_fdw;
create server local_server foreign data wrapper postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'tmp');
create user mapping for user server local_server options (user 'postgres', 
password 'postgres');
create foreign table f_log(id bigint, size bigint, total bigint) server 
local_server options (table_name 'log');

analyze;

set enable_agg_pushdown=true;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;

set enable_agg_pushdown=false;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;
--