Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-07 Thread Tom Lane
Tomas Vondra writes: > However, maybe views are not the best / most common example to think > about. I'd imagine it's much more common to reference a regular table, > but the table gets truncated / populated quickly, and/or the autovacuum > workers are busy so it takes time to update reltuples. Bu

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-07 Thread Tomas Vondra
On 1/6/23 23:41, Tomas Vondra wrote: > On 1/6/23 17:58, Tom Lane wrote: >> Tomas Vondra writes: >>> The one difference is that I realized the relkind check does not >>> actually say we can't do sampling - it just means we can't use >>> TABLESAMPLE to do it. We could still use "random()" ... >> >>>

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-06 Thread Tomas Vondra
On 1/6/23 17:58, Tom Lane wrote: > Tomas Vondra writes: >> The one difference is that I realized the relkind check does not >> actually say we can't do sampling - it just means we can't use >> TABLESAMPLE to do it. We could still use "random()" ... > >> Furthermore, I don't think we should silent

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-06 Thread Tom Lane
Tomas Vondra writes: > The one difference is that I realized the relkind check does not > actually say we can't do sampling - it just means we can't use > TABLESAMPLE to do it. We could still use "random()" ... > Furthermore, I don't think we should silently disable sampling when the > user expli

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-06 Thread Tomas Vondra
I've pushed the comment/assert cleanup. Here's a cleaned up version of the relkind check. This is almost identical to the patch from yesterday (plus the renames and updates of comments for modified functions). The one difference is that I realized the relkind check does not actually say we can't

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tom Lane
Tomas Vondra writes: > There are no callers outside postgresAcquireSampleRowsFunc, so what > about renaming them like this? > deparseAnalyzeTuplesSql > -> deparseAnalyzeInfoSql > postgresCountTuplesForForeignTable > -> postgresGetAnalyzeInfoForForeignTable WFM.

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tomas Vondra
On 1/5/23 22:05, Tom Lane wrote: > Tomas Vondra writes: >> The second patch adds the relkind check, so that we only issue >> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure >> we actually need it - the example presented earlier was foreign table >> pointing to a sequenc

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tom Lane
Tomas Vondra writes: > The second patch adds the relkind check, so that we only issue > TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure > we actually need it - the example presented earlier was foreign table > pointing to a sequence. But that actually works fine even witho

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tomas Vondra
Hi, here's two minor postgres_fdw patches, addressing the two issues discussed last week. 1) 0001-Fix-stale-comment-in-postgres_fdw.patch The first one cleans up the comment referencing the sample rate adjustment. While doing that, I realized without the adjustment we should never get sample_ra

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-01 Thread Tomas Vondra
On 12/31/22 18:23, Tom Lane wrote: > Tomas Vondra writes: >> On 12/31/22 05:42, Tom Lane wrote: >>> ERROR: TABLESAMPLE clause can only be applied to tables and materialized >>> views >>> I think the patch avoids that, but only accidentally, because >>> reltuples will be 0 or -1 for a view. Mayb

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-31 Thread Tom Lane
Tomas Vondra writes: > On 12/31/22 05:42, Tom Lane wrote: >> ERROR: TABLESAMPLE clause can only be applied to tables and materialized >> views >> I think the patch avoids that, but only accidentally, because >> reltuples will be 0 or -1 for a view. Maybe it'd be a good >> idea to pull back relk

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-31 Thread Tomas Vondra
On 12/31/22 05:42, Tom Lane wrote: > Tomas Vondra writes: >> After thinking about it a bit more I decided to rip out the 10% sampling >> rate inflation. > > +1. I'm not sure if there's anything more we need to do there, but > that didn't seem like that was it. > > I notice that the committed

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-30 Thread Tom Lane
Tomas Vondra writes: > After thinking about it a bit more I decided to rip out the 10% sampling > rate inflation. +1. I'm not sure if there's anything more we need to do there, but that didn't seem like that was it. I notice that the committed patch still has a reference to that hack though: +

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-30 Thread Tomas Vondra
Pushed. After thinking about it a bit more I decided to rip out the 10% sampling rate inflation. While stale reltuples may be an issue, but I couldn't convince myself this would be an improvement overall, or that this is the right place to deal with it. Firstly, if reltuples is too low (i.e. when

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-21 Thread Tomas Vondra
On 12/15/22 17:46, Tom Lane wrote: > James Finnerty writes: >> This patch looks good to me. I have two very minor nits: The inflation >> of the sample size by 10% is arbitrary but it doesn't seem unreasonable >> or concerning. It just makes me curious if there are any known cases >> that motivat

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-15 Thread Tom Lane
James Finnerty writes: > This patch looks good to me. I have two very minor nits: The inflation > of the sample size by 10% is arbitrary but it doesn't seem unreasonable > or concerning. It just makes me curious if there are any known cases > that motivated adding this logic. I wondered why, to

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-15 Thread James Finnerty
This patch looks good to me. I have two very minor nits: The inflation of the sample size by 10% is arbitrary but it doesn't seem unreasonable or concerning. It just makes me curious if there are any known cases that motivated adding this logic. Secondly, if the earliest non-deprecated versio

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-19 Thread Tom Lane
Tomas Vondra writes: > I we want to improve sampling for partitioned cases (where the foreign > table is just one of many partitions), I think we'd have to rework how > we determine sample size for each partition. Now we simply calculate > that from relpages, which seems quite fragile (different a

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-19 Thread Tomas Vondra
On 7/18/22 20:45, Tom Lane wrote: > Tomas Vondra writes: >> Thanks. I'll switch this to "needs review" now. > > OK, I looked through this, and attach some review suggestions in the > form of a delta patch. (0001 below is your two patches merged, 0002 > is my delta.) A lot of the delta is com

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-18 Thread Tom Lane
Tomas Vondra writes: > Thanks. I'll switch this to "needs review" now. OK, I looked through this, and attach some review suggestions in the form of a delta patch. (0001 below is your two patches merged, 0002 is my delta.) A lot of the delta is comment-smithing, but not all. After reflection I

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-18 Thread Tomas Vondra
On 7/16/22 23:57, Tom Lane wrote: > Andres Freund writes: >> On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: >>> And here's the slightly simplified patch, without the part adding the >>> unnecessary GetServerVersion() function. > >> Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-16 Thread Tom Lane
Andres Freund writes: > On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: >> And here's the slightly simplified patch, without the part adding the >> unnecessary GetServerVersion() function. > Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log > Marked as waiting-on-author. Here's

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-03-21 Thread Andres Freund
Hi, On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: > And here's the slightly simplified patch, without the part adding the > unnecessary GetServerVersion() function. Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log Marked as waiting-on-author. Greetings, Andres Freund

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-22 Thread Tomas Vondra
On 2/22/22 21:12, Tomas Vondra wrote: > On 2/22/22 01:36, Fujii Masao wrote: >> >> >> On 2022/02/18 22:28, Tomas Vondra wrote: >>> Hi, >>> >>> here's a slightly updated version of the patch series. >> >> Thanks for updating the patches! >> >> >>> The 0001 part >>> adds tracking of server_version_nu

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-22 Thread Tomas Vondra
On 2/22/22 01:36, Fujii Masao wrote: > > > On 2022/02/18 22:28, Tomas Vondra wrote: >> Hi, >> >> here's a slightly updated version of the patch series. > > Thanks for updating the patches! > > >> The 0001 part >> adds tracking of server_version_num, so that it's possible to enable >> other fea

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-21 Thread Fujii Masao
On 2022/02/18 22:28, Tomas Vondra wrote: Hi, here's a slightly updated version of the patch series. Thanks for updating the patches! The 0001 part adds tracking of server_version_num, so that it's possible to enable other features depending on it. Like configure_remote_session() does,

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-18 Thread Tomas Vondra
Hi, here's a slightly updated version of the patch series. The 0001 part adds tracking of server_version_num, so that it's possible to enable other features depending on it. In this case it's used to decide whether TABLESAMPLE is supported. The 0002 part modifies the sampling. I realized we can d

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-16 Thread Sofia Kopikova
Hello,   I find it a great idea to use TABLESAMPLE in postgres_fdw ANALYZE. Let me offer you some ideas how to resolve you problems.   Tomas Vondra < tomas.von...@enterprisedb.com > writes: > The other issue is which sampling method to use - we have SYSTEM and  > BERNOULLI built in, and the tsm_sy

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tomas Vondra
On 2/11/22 19:13, Robert Haas wrote: > On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: >> While I'm not opposed to moving those goalposts at some point, >> I think pushing them all the way up to 9.5 for this one easily-fixed >> problem is not very reasonable. >> >> Given other recent discussion, a

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: > While I'm not opposed to moving those goalposts at some point, > I think pushing them all the way up to 9.5 for this one easily-fixed > problem is not very reasonable. > > Given other recent discussion, an argument could be made for moving > the cu

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tom Lane
Robert Haas writes: > I think it's going to be necessary to compromise on that at some > point. Sure. The existing postgres_fdw documentation [1] already addresses this issue: postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available b

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 12:39 PM Tom Lane wrote: > Tomas Vondra writes: > > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > > to collect/transfer just a small sample from the remote node, saving > > both CPU and network. > > This is great if the remote end has TABLESAMPL

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tom Lane
Tomas Vondra writes: > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > to collect/transfer just a small sample from the remote node, saving > both CPU and network. This is great if the remote end has TABLESAMPLE, but pre-9.5 servers don't, and postgres_fdw is supposed

postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-11 Thread Tomas Vondra
Hi, here's a small patch modifying postgres_fdw to use TABLESAMPLE to collect sample when running ANALYZE on a foreign table. Currently the sampling happens locally, i.e. we fetch all data from the remote server and then pick rows. But that's hugely expensive for large relations and/or with l