Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > However, maybe views are not the best / most common example to think > about. I'd imagine it's much more common to reference a regular table, > but the table gets truncated / populated quickly, and/or the autovacuum > workers are busy so it takes time to update reltuples. But in this case > it's also quite simple to fix the correlation by just ordering by ctid > (which I guess we might do based on the relkind). > There's a minor issue with partitioned tables, with foreign partitions > pointing to remote views. This is kinda broken, because the sample size > for individual relations is determined based on relpages. But that's 0 > for views, so these partitions get ignored when building the sample. But > that's a pre-existing issue. I wonder if we should stop consulting reltuples directly at all, and instead do "EXPLAIN SELECT * FROM remote_table" to get the remote planner's estimate of the number of rows involved. Even for a plain table, that's likely to be a better number. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 1/6/23 23:41, Tomas Vondra wrote: > On 1/6/23 17:58, Tom Lane wrote: >> Tomas Vondra writes: >>> The one difference is that I realized the relkind check does not >>> actually say we can't do sampling - it just means we can't use >>> TABLESAMPLE to do it. We could still use "random()" ... >> >>> Furthermore, I don't think we should silently disable sampling when the >>> user explicitly requests TABLESAMPLE by specifying bernoulli/system for >>> the table - IMHO it's less surprising to just fail in that case. >> >> Agreed on both points. This patch looks good to me. >> > > Good, I'll get this committed.The "ORDER BY random()" idea is a possible > improvement, can be discussed on it's own. > Pushed. >>> Of course, all relkinds that don't support TABLESAMPLE currently have >>> reltuples value that will disable sampling anyway (e.g. views have -1). >>> So we won't actually fallback to random() anyway, because we can't >>> calculate the sample fraction. >>> That's a bit annoying for foreign tables pointing at a view, which is a >>> more likely use case than table pointing at a sequence. >> >> Right, that's a case worth being concerned about. >> >>> But I realized we could actually still do "random()" sampling: >>> SELECT * FROM t ORDER BY random() LIMIT $X; >> >> Hmm, interesting idea, but it would totally bollix our correlation >> estimates. Not sure that those are worth anything for remote views, >> but still... > > But isn't that an issue that we already have? I mean, if we do ANALYZE > on a foreign table pointing to a view, we fetch all the results. But if > the view does not have a well-defined ORDER BY, a trivial plan change > may change the order of results and thus the correlation. > > Actually, how is a correlation even defined for a view? > > It's true this "ORDER BY random()" thing would make it less stable, as > it would change the correlation on every run. Although - the calculated > correlation is actually quite stable, because it's guaranteed to be > pretty close to 0 because we make the order random. > > Maybe that's actually better than the current state where it depends on > the plan? Why not to look at the relkind and just set correlation to 0.0 > in such cases? > > But if we want to restore that current behavior (where it depends on the > actual query plan), we could do something like this: > >SELECT * FROM the_remote_view ORDER BY row_number() over (); > > But yeah, this makes the remote sampling more expensive. Probably still > a win because of network costs, but not great. > I've been thinking about this a bit more, and I'll consider submitting a patch for the next CF. IMHO it's probably better to accept correlation being 0 in these cases - it's more representative of what we know about the view / plan output (or rather lack of knowledge). However, maybe views are not the best / most common example to think about. I'd imagine it's much more common to reference a regular table, but the table gets truncated / populated quickly, and/or the autovacuum workers are busy so it takes time to update reltuples. But in this case it's also quite simple to fix the correlation by just ordering by ctid (which I guess we might do based on the relkind). There's a minor issue with partitioned tables, with foreign partitions pointing to remote views. This is kinda broken, because the sample size for individual relations is determined based on relpages. But that's 0 for views, so these partitions get ignored when building the sample. But that's a pre-existing issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 1/6/23 17:58, Tom Lane wrote: > Tomas Vondra writes: >> The one difference is that I realized the relkind check does not >> actually say we can't do sampling - it just means we can't use >> TABLESAMPLE to do it. We could still use "random()" ... > >> Furthermore, I don't think we should silently disable sampling when the >> user explicitly requests TABLESAMPLE by specifying bernoulli/system for >> the table - IMHO it's less surprising to just fail in that case. > > Agreed on both points. This patch looks good to me. > Good, I'll get this committed.The "ORDER BY random()" idea is a possible improvement, can be discussed on it's own. >> Of course, all relkinds that don't support TABLESAMPLE currently have >> reltuples value that will disable sampling anyway (e.g. views have -1). >> So we won't actually fallback to random() anyway, because we can't >> calculate the sample fraction. >> That's a bit annoying for foreign tables pointing at a view, which is a >> more likely use case than table pointing at a sequence. > > Right, that's a case worth being concerned about. > >> But I realized we could actually still do "random()" sampling: >> SELECT * FROM t ORDER BY random() LIMIT $X; > > Hmm, interesting idea, but it would totally bollix our correlation > estimates. Not sure that those are worth anything for remote views, > but still... But isn't that an issue that we already have? I mean, if we do ANALYZE on a foreign table pointing to a view, we fetch all the results. But if the view does not have a well-defined ORDER BY, a trivial plan change may change the order of results and thus the correlation. Actually, how is a correlation even defined for a view? It's true this "ORDER BY random()" thing would make it less stable, as it would change the correlation on every run. Although - the calculated correlation is actually quite stable, because it's guaranteed to be pretty close to 0 because we make the order random. Maybe that's actually better than the current state where it depends on the plan? Why not to look at the relkind and just set correlation to 0.0 in such cases? But if we want to restore that current behavior (where it depends on the actual query plan), we could do something like this: SELECT * FROM the_remote_view ORDER BY row_number() over (); But yeah, this makes the remote sampling more expensive. Probably still a win because of network costs, but not great. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > The one difference is that I realized the relkind check does not > actually say we can't do sampling - it just means we can't use > TABLESAMPLE to do it. We could still use "random()" ... > Furthermore, I don't think we should silently disable sampling when the > user explicitly requests TABLESAMPLE by specifying bernoulli/system for > the table - IMHO it's less surprising to just fail in that case. Agreed on both points. This patch looks good to me. > Of course, all relkinds that don't support TABLESAMPLE currently have > reltuples value that will disable sampling anyway (e.g. views have -1). > So we won't actually fallback to random() anyway, because we can't > calculate the sample fraction. > That's a bit annoying for foreign tables pointing at a view, which is a > more likely use case than table pointing at a sequence. Right, that's a case worth being concerned about. > But I realized we could actually still do "random()" sampling: > SELECT * FROM t ORDER BY random() LIMIT $X; Hmm, interesting idea, but it would totally bollix our correlation estimates. Not sure that those are worth anything for remote views, but still... regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
I've pushed the comment/assert cleanup. Here's a cleaned up version of the relkind check. This is almost identical to the patch from yesterday (plus the renames and updates of comments for modified functions). The one difference is that I realized the relkind check does not actually say we can't do sampling - it just means we can't use TABLESAMPLE to do it. We could still use "random()" ... Furthermore, I don't think we should silently disable sampling when the user explicitly requests TABLESAMPLE by specifying bernoulli/system for the table - IMHO it's less surprising to just fail in that case. So we now do this: if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO)) method = ANALYZE_SAMPLE_RANDOM; Yes, we may still disable sampling when reltuples is -1, 0 or something like that. But that's a condition that is expected for new relations and likely to fix itself, which is not the case for relkind. Of course, all relkinds that don't support TABLESAMPLE currently have reltuples value that will disable sampling anyway (e.g. views have -1). So we won't actually fallback to random() anyway, because we can't calculate the sample fraction. That's a bit annoying for foreign tables pointing at a view, which is a more likely use case than table pointing at a sequence. And likely more of an issue, because views may return a many rows (while sequences have only a single row). But I realized we could actually still do "random()" sampling: SELECT * FROM t ORDER BY random() LIMIT $X; where $X is the target number of rows for sample for the table. Which would result in plans like this (given sufficient work_mem values) QUERY PLAN --- Limit (actual rows=3 loops=1) -> Sort (actual rows=3 loops=1) Sort Key: (random()) Sort Method: top-N heapsort Memory: 3916kB -> Append (actual rows=100 loops=1) Even with lower work_mem values this would likely be a win, due to saving on network transfers. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 76270100186507c1403f7758131966d146db5b39 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 5 Jan 2023 21:13:20 +0100 Subject: [PATCH] Check relkind before using TABLESAMPLE in postgres_fdw Before using TABLESAMPLE to acquire a sample from a foreign relation, check the remote relkind. If it does not support TABLESAMPLE, disable sampling and fallback to reading all rows. This may seem unnecessary, because for relkinds that don't support TABLESAMPLE we'll disable the sampling anyway, due to reltuples value (e.g. views use -1, sequences 1). But that's rather accidental, and might get broken after adding TABLESAMPLE support for other relkinds. Better to make an explicit check. Instead of disabling remote sampling for relkinds that do not support TABLESAMPLE, consider fallback to using random() if the requested sampling method was AUTO. Reported-by: Tom Lane Discussion: https://postgr.es/m/951485.1672461744%40sss.pgh.pa.us --- contrib/postgres_fdw/deparse.c | 7 ++-- contrib/postgres_fdw/postgres_fdw.c | 60 - contrib/postgres_fdw/postgres_fdw.h | 2 +- 3 files changed, 47 insertions(+), 22 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 4461fb06f02..21237d18ef8 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2368,14 +2368,15 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) } /* - * Construct SELECT statement to acquire the number of rows of a relation. + * Construct SELECT statement to acquire the number of rows and the relkind of + * a relation. * * Note: we just return the remote server's reltuples value, which might * be off a good deal, but it doesn't seem worth working harder. See * comments in postgresAcquireSampleRowsFunc. */ void -deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +deparseAnalyzeInfoSql(StringInfo buf, Relation rel) { StringInfoData relname; @@ -2383,7 +2384,7 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) initStringInfo(&relname); deparseRelation(&relname, rel); - appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + appendStringInfoString(buf, "SELECT reltuples, relkind FROM pg_catalog.pg_class WHERE oid = "); deparseStringLiteral(buf, relname.data); appendStringInfoString(buf, "::pg_catalog.regclass"); } diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index f8461bf18dc..44573c9fed2 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -4974,11 +4974,14 @@ postgresAnalyzeForeignTable(Relation relation, } /* - * postgresCountTuplesForForeignTable + * postgresGetAnalyzeInfoForForeignTable * Count tuples in foreign table (just get pg_cla
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > There are no callers outside postgresAcquireSampleRowsFunc, so what > about renaming them like this? > deparseAnalyzeTuplesSql > -> deparseAnalyzeInfoSql > postgresCountTuplesForForeignTable > -> postgresGetAnalyzeInfoForForeignTable WFM. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 1/5/23 22:05, Tom Lane wrote: > Tomas Vondra writes: >> The second patch adds the relkind check, so that we only issue >> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure >> we actually need it - the example presented earlier was foreign table >> pointing to a sequence. But that actually works fine even without this >> patch, because fore sequences we have reltuples=1, which disables >> sampling due to the check mentioned above (because targrows >= 1). > > Seems pretty accidental still. > >> The other issue with this patch is that it seems wrong to check the >> relkind value locally - instead we should check this on the remote >> server, because that's where we'll run TABLESAMPLE. Currently there are >> no differences between versions, but what if we implement TABLESAMPLE >> for another relkind in the future? Then we'll either fail to use >> sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind, >> depending on which of the servers has newer version. > > We don't have a way to do that, and I don't think we need one. The > worst case is that we don't try to use TABLESAMPLE when the remote > server is a newer version that would allow it. If we do extend the > set of supported relkinds, we'd need a version-specific test in > postgres_fdw so that we don't wrongly use TABLESAMPLE with an older > remote server ... but that's hardly complicated, or a new concept. > > On the other hand, if we let the code stand, the worst case is that > ANALYZE fails because we try to apply TABLESAMPLE in an unsupported > case, presumably with some remote relkind that doesn't exist today. > I think that's clearly worse than not exploiting TABLESAMPLE > although we could (with some other new remote relkind). > OK > As far as the patch details go: I'd write 0001 as > > +Assert(sample_frac >= 0.0 && sample_frac <= 1.0); > > The way you have it seems a bit contorted. Yeah, that's certainly cleaner. > As for 0002, personally > I'd rename the affected functions to reflect their expanded duties, > and they *definitely* require adjustments to their header comments. > Functionally it looks fine though. > No argument about the header comments, ofc - I just haven't done that in the WIP patch. As for the renaming, any suggestions for the new names? The patch tweaks two functions in a way that affects what they return: 1) deparseAnalyzeTuplesSql - adds relkind to the "reltuples" SQL 2) postgresCountTuplesForForeignTable - adds can_sample flag There are no callers outside postgresAcquireSampleRowsFunc, so what about renaming them like this? deparseAnalyzeTuplesSql -> deparseAnalyzeInfoSql postgresCountTuplesForForeignTable -> postgresGetAnalyzeInfoForForeignTable regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > The second patch adds the relkind check, so that we only issue > TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure > we actually need it - the example presented earlier was foreign table > pointing to a sequence. But that actually works fine even without this > patch, because fore sequences we have reltuples=1, which disables > sampling due to the check mentioned above (because targrows >= 1). Seems pretty accidental still. > The other issue with this patch is that it seems wrong to check the > relkind value locally - instead we should check this on the remote > server, because that's where we'll run TABLESAMPLE. Currently there are > no differences between versions, but what if we implement TABLESAMPLE > for another relkind in the future? Then we'll either fail to use > sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind, > depending on which of the servers has newer version. We don't have a way to do that, and I don't think we need one. The worst case is that we don't try to use TABLESAMPLE when the remote server is a newer version that would allow it. If we do extend the set of supported relkinds, we'd need a version-specific test in postgres_fdw so that we don't wrongly use TABLESAMPLE with an older remote server ... but that's hardly complicated, or a new concept. On the other hand, if we let the code stand, the worst case is that ANALYZE fails because we try to apply TABLESAMPLE in an unsupported case, presumably with some remote relkind that doesn't exist today. I think that's clearly worse than not exploiting TABLESAMPLE although we could (with some other new remote relkind). As far as the patch details go: I'd write 0001 as +Assert(sample_frac >= 0.0 && sample_frac <= 1.0); The way you have it seems a bit contorted. As for 0002, personally I'd rename the affected functions to reflect their expanded duties, and they *definitely* require adjustments to their header comments. Functionally it looks fine though. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, here's two minor postgres_fdw patches, addressing the two issues discussed last week. 1) 0001-Fix-stale-comment-in-postgres_fdw.patch The first one cleans up the comment referencing the sample rate adjustment. While doing that, I realized without the adjustment we should never get sample_rate outside the valid range because we do: if ((reltuples <= 0) || (targrows >= reltuples)) method = ANALYZE_SAMPLE_OFF; So I removed the clamping, or rather replaced it with an assert. 2) 0002-WIP-check-relkind-in-FDW-analyze.patch The second patch adds the relkind check, so that we only issue TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure we actually need it - the example presented earlier was foreign table pointing to a sequence. But that actually works fine even without this patch, because fore sequences we have reltuples=1, which disables sampling due to the check mentioned above (because targrows >= 1). The other issue with this patch is that it seems wrong to check the relkind value locally - instead we should check this on the remote server, because that's where we'll run TABLESAMPLE. Currently there are no differences between versions, but what if we implement TABLESAMPLE for another relkind in the future? Then we'll either fail to use sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind, depending on which of the servers has newer version. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 681b3569278bc55b4e2bbfcf129245702616ecd9 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 5 Jan 2023 21:13:20 +0100 Subject: [PATCH 2/2] WIP: check relkind in FDW analyze --- contrib/postgres_fdw/deparse.c | 2 +- contrib/postgres_fdw/postgres_fdw.c | 19 +++ 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 4461fb06f02..9f39d792280 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2383,7 +2383,7 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) initStringInfo(&relname); deparseRelation(&relname, rel); - appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + appendStringInfoString(buf, "SELECT reltuples, relkind FROM pg_catalog.pg_class WHERE oid = "); deparseStringLiteral(buf, relname.data); appendStringInfoString(buf, "::pg_catalog.regclass"); } diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 65822b8a9aa..50b03067152 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -4978,7 +4978,7 @@ postgresAnalyzeForeignTable(Relation relation, * Count tuples in foreign table (just get pg_class.reltuples). */ static double -postgresCountTuplesForForeignTable(Relation relation) +postgresCountTuplesForForeignTable(Relation relation, bool *can_sample) { ForeignTable *table; UserMapping *user; @@ -4986,6 +4986,10 @@ postgresCountTuplesForForeignTable(Relation relation) StringInfoData sql; PGresult *volatile res = NULL; volatile double reltuples = -1; + volatile char relkind = 0; + + /* assume the relkind can't be sampled */ + *can_sample = false; /* * Get the connection to use. We do the remote access as the table's @@ -5008,9 +5012,10 @@ postgresCountTuplesForForeignTable(Relation relation) if (PQresultStatus(res) != PGRES_TUPLES_OK) pgfdw_report_error(ERROR, res, conn, false, sql.data); - if (PQntuples(res) != 1 || PQnfields(res) != 1) + if (PQntuples(res) != 1 || PQnfields(res) != 2) elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query"); reltuples = strtod(PQgetvalue(res, 0, 0), NULL); + relkind = *(PQgetvalue(res, 0, 1)); } PG_FINALLY(); { @@ -5021,6 +5026,10 @@ postgresCountTuplesForForeignTable(Relation relation) ReleaseConnection(conn); + *can_sample = (relkind == RELKIND_RELATION || + relkind == RELKIND_MATVIEW || + relkind == RELKIND_PARTITIONED_TABLE); + return reltuples; } @@ -5167,13 +5176,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel, */ if (method != ANALYZE_SAMPLE_OFF) { - reltuples = postgresCountTuplesForForeignTable(relation); + bool can_sample; + + reltuples = postgresCountTuplesForForeignTable(relation, &can_sample); /* * Remote's reltuples could be 0 or -1 if the table has never been * vacuumed/analyzed. In that case, disable sampling after all. */ - if ((reltuples <= 0) || (targrows >= reltuples)) + if ((!can_sample) || (reltuples <= 0) || (targrows >= reltuples)) method = ANALYZE_SAMPLE_OFF; else { -- 2.39.0 From 020ba6afb87287d782477054949d77954377d347 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 5 Jan 2023 20:11:49 +0100 Subject: [PATCH 1/2] Fix stale comment in postgres_fdw A comment was left behind referencing a bit (sample rate adjustment) remo
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 12/31/22 18:23, Tom Lane wrote: > Tomas Vondra writes: >> On 12/31/22 05:42, Tom Lane wrote: >>> ERROR: TABLESAMPLE clause can only be applied to tables and materialized >>> views >>> I think the patch avoids that, but only accidentally, because >>> reltuples will be 0 or -1 for a view. Maybe it'd be a good >>> idea to pull back relkind along with reltuples, and check >>> that too? > >> Not sure. I guess we can rely on reltuples being 0 or -1 in such cases, >> but maybe it'd be good to at least mention that in a comment? We're not >> going to use other reltuples values for views etc. > > Would anyone ever point a foreign table at a sequence? I guess it > would be a weird use-case, but it's possible, or was till now. > Yeah, it's a weird use case. I can't quite imagine why would anyone do that, but I guess the mere possibility is sufficient reason to add the relkind check ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > On 12/31/22 05:42, Tom Lane wrote: >> ERROR: TABLESAMPLE clause can only be applied to tables and materialized >> views >> I think the patch avoids that, but only accidentally, because >> reltuples will be 0 or -1 for a view. Maybe it'd be a good >> idea to pull back relkind along with reltuples, and check >> that too? > Not sure. I guess we can rely on reltuples being 0 or -1 in such cases, > but maybe it'd be good to at least mention that in a comment? We're not > going to use other reltuples values for views etc. Would anyone ever point a foreign table at a sequence? I guess it would be a weird use-case, but it's possible, or was till now. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 12/31/22 05:42, Tom Lane wrote: > Tomas Vondra writes: >> After thinking about it a bit more I decided to rip out the 10% sampling >> rate inflation. > > +1. I'm not sure if there's anything more we need to do there, but > that didn't seem like that was it. > > I notice that the committed patch still has a reference to that hack > though: > > +* Ensure the sampling rate is between 0.0 and 1.0, even after the > +* 10% adjustment above. (Clamping to 0.0 is just paranoia.) > > Clamping still seems like a wise idea, but the comment is just > confusing now. > Yeah, I missed that reference. Will fix. > Also, I wonder if there is any possibility of ANALYZE failing > with > > ERROR: TABLESAMPLE clause can only be applied to tables and materialized > views > > I think the patch avoids that, but only accidentally, because > reltuples will be 0 or -1 for a view. Maybe it'd be a good > idea to pull back relkind along with reltuples, and check > that too? Not sure. I guess we can rely on reltuples being 0 or -1 in such cases, but maybe it'd be good to at least mention that in a comment? We're not going to use other reltuples values for views etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > After thinking about it a bit more I decided to rip out the 10% sampling > rate inflation. +1. I'm not sure if there's anything more we need to do there, but that didn't seem like that was it. I notice that the committed patch still has a reference to that hack though: +* Ensure the sampling rate is between 0.0 and 1.0, even after the +* 10% adjustment above. (Clamping to 0.0 is just paranoia.) Clamping still seems like a wise idea, but the comment is just confusing now. Also, I wonder if there is any possibility of ANALYZE failing with ERROR: TABLESAMPLE clause can only be applied to tables and materialized views I think the patch avoids that, but only accidentally, because reltuples will be 0 or -1 for a view. Maybe it'd be a good idea to pull back relkind along with reltuples, and check that too? regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Pushed. After thinking about it a bit more I decided to rip out the 10% sampling rate inflation. While stale reltuples may be an issue, but I couldn't convince myself this would be an improvement overall, or that this is the right place to deal with it. Firstly, if reltuples is too low (i.e. when the table grew), everything is "fine" - we sample more than targrows and then remove some of that locally. We end up with same sample size as without this patch - which is not necessarily the right sample size (more about that in a minute), but if it was good enough for now ... improving this bit was not the ambition of this patch. If reltuples is too high (e.g. right after bulk DELETE) we may end up sampling too few rows. If it's 1/2 what it should be, we'll end up sampling only 15k rows instead of 30k rows. But it's unclear how much should we adjust the value - 10% as was in the patch (kinda based on autovacuum_analyze_scale_factor being 10%)? I'd argue that's too low to make any difference - cases that are only ~10% off should work fine (we don't have perfect stats anyway). And for the really bad cases 10% is not going to make a difference. And if the reltuples is that off, it'll probably affect even queries planned on the remote node (directly or pushed down), so I guess the right fix should be making sure reltuples is not actually off - make autovacuum more aggressive or whatever. A thing that bothers me is that calculating sample size for partitioned tables is a bit cyclic - we look at number of blocks, and divide the whole sample based on that. And that's generally correlated with the reltuples, but OTOH it may also be wrong - and not necessarily in the same way. For example you might delete 90% of a table, which will make the table overrepresented in the sample. But then reltuples may be quite accurate thanks to recent vacuum - so fixing this by blindly adjusting the sampling rate seems misguided. If this turns out to be an issue in practice, we need to rethink acquire_inherited_sample_rows as a whole (not just for foreign tables). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 12/15/22 17:46, Tom Lane wrote: > James Finnerty writes: >> This patch looks good to me. I have two very minor nits: The inflation >> of the sample size by 10% is arbitrary but it doesn't seem unreasonable >> or concerning. It just makes me curious if there are any known cases >> that motivated adding this logic. > > I wondered why, too. > Not sure about known cases, but the motivation is explained in the comment before the 10% is applied. The repluples value is never going to be spot on, and we use that to determine what fraction of the table to sample (because all built-in TABLESAMPLE methods only accept fraction to sample, not expected number of rows). If pg_class.reltuples is lower (than the actual row count), we'll end up with sample fraction too high. That's mostly harmless, as we'll then discard some of the rows locally. But if the pg_class.reltuples is higher, we'll end up sampling too few rows (less than targrows). This can happen e.g. after a bulk delete. Yes, the 10% value is mostly arbitrary, and maybe it's pointless. How much may the stats change with 10% larger sample? Probably not much, so is it really solving anything? Also, maybe it's fixing the issue at the wrong level - if stale reltuples are an issue, maybe the right fix is making it more accurate on the source instance. Decrease autovacuum_vacuum_scale_factor, or maybe also look at pg_stat_all_tables or something. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
James Finnerty writes: > This patch looks good to me. I have two very minor nits: The inflation > of the sample size by 10% is arbitrary but it doesn't seem unreasonable > or concerning. It just makes me curious if there are any known cases > that motivated adding this logic. I wondered why, too. > Secondly, if the earliest non-deprecated version of PostgreSQL supports > sampling, then you could optionally remove the logic that tests for > that. The affected lines should be unreachable. We've tried to keep postgres_fdw compatible with quite ancient remote servers (I think the manual claims back to 8.3, though it's unlikely anyone's tested that far back lately). This patch should not move those goalposts, especially if it's easy not to. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
This patch looks good to me. I have two very minor nits: The inflation of the sample size by 10% is arbitrary but it doesn't seem unreasonable or concerning. It just makes me curious if there are any known cases that motivated adding this logic. Secondly, if the earliest non-deprecated version of PostgreSQL supports sampling, then you could optionally remove the logic that tests for that. The affected lines should be unreachable.
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > I we want to improve sampling for partitioned cases (where the foreign > table is just one of many partitions), I think we'd have to rework how > we determine sample size for each partition. Now we simply calculate > that from relpages, which seems quite fragile (different amounts of > bloat, different tuple densities) and somewhat strange for FDW serves > that don't use the same "page" concept. > So it may easily happen we determine bogus sample sizes for each > partition. The difficulties when calculating the sample_frac is just a > secondary issue. > OTOH the concept of a "row" seems way more general, so perhaps > acquire_inherited_sample_rows should use reltuples, and if we want to do > correction it should happen at this stage already. Yeah, there's definitely something to be said for changing that to be based on rowcount estimates instead of physical size. I think it's a matter for a different patch though, and not a reason to hold up this one. regards, tom lane
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 7/18/22 20:45, Tom Lane wrote: > Tomas Vondra writes: >> Thanks. I'll switch this to "needs review" now. > > OK, I looked through this, and attach some review suggestions in the > form of a delta patch. (0001 below is your two patches merged, 0002 > is my delta.) A lot of the delta is comment-smithing, but not all. > Thanks! > After reflection I think that what you've got, ie use reltuples but > don't try to sample if reltuples <= 0, is just fine. The remote > would only have reltuples <= 0 in a never-analyzed table, which > shouldn't be a situation that persists for long unless the table > is tiny. Also, if reltuples is in error, the way to bet is that > it's too small thanks to the table having been enlarged. But > an error in that direction doesn't hurt us: we'll overestimate > the required sample_frac and pull back more data than we need, > but we'll still end up with a valid sample of the right size. > So I doubt it's worth the complication to try to correct based > on relpages etc. (Note that any such correction would almost > certainly end in increasing our estimate of reltuples. But > it's safer to have an underestimate than an overestimate.) > I mostly agree, particularly for the non-partitioned case. I we want to improve sampling for partitioned cases (where the foreign table is just one of many partitions), I think we'd have to rework how we determine sample size for each partition. Now we simply calculate that from relpages, which seems quite fragile (different amounts of bloat, different tuple densities) and somewhat strange for FDW serves that don't use the same "page" concept. So it may easily happen we determine bogus sample sizes for each partition. The difficulties when calculating the sample_frac is just a secondary issue. OTOH the concept of a "row" seems way more general, so perhaps acquire_inherited_sample_rows should use reltuples, and if we want to do correction it should happen at this stage already. > I messed around with the sample_frac choosing logic slightly, > to make it skip pointless calculations if we decide right off > the bat to disable sampling. That way we don't need to worry > about avoiding zero divide, nor do we have to wonder if any > of the later calculations could misbehave. > Thanks. > I left your logic about "disable if saving fewer than 100 rows" > alone, but I have to wonder if using an absolute threshold rather > than a relative one is well-conceived. Sampling at a rate of > 99.9 percent seems pretty pointless, but this code is perfectly > capable of accepting that if reltuples is big enough. So > personally I'd do that more like > > if (sample_frac > 0.95) > method = ANALYZE_SAMPLE_OFF; > > which is simpler and would also eliminate the need for the previous > range-clamp step. I'm not sure what the right cutoff is, but > your "100 tuples" constant is just as arbitrary. > I agree there probably is not much difference between a threshold on sample_frac directly and number of rows, at least in general. My reasoning for switching to "100 rows" is that in most cases the network transfer is probably more costly than "local costs", and 5% may be quite a few rows (particularly with higher statistics target). I guess the proper approach would be to make some simple costing, but that seems like an overkill. > I rearranged the docs patch too. Where you had it, analyze_sampling > was between fdw_startup_cost/fdw_tuple_cost and the following para > discussing them, which didn't seem to me to flow well at all. I ended > up putting analyze_sampling in its own separate list. You could almost > make a case for giving it its own , but I concluded that was > probably overkill. > Thanks. > One thing I'm not happy about, but did not touch here, is the expense > of the test cases you added. On my machine, that adds a full 10% to > the already excessively long runtime of postgres_fdw.sql --- and I > do not think it's buying us anything. It is not this module's job > to test whether bernoulli sampling works on partitioned tables. > I think you should just add enough to make sure we exercise the > relevant code paths in postgres_fdw itself. > Right, I should have commented on that. The purpose of those tests was verifying that if we change the sampling method on server/table, the generated query changes accordingly, etc. But that's a bit futile because we don't have a good way of verifying what query was used - it worked during development, as I added elog(WARNING). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > Thanks. I'll switch this to "needs review" now. OK, I looked through this, and attach some review suggestions in the form of a delta patch. (0001 below is your two patches merged, 0002 is my delta.) A lot of the delta is comment-smithing, but not all. After reflection I think that what you've got, ie use reltuples but don't try to sample if reltuples <= 0, is just fine. The remote would only have reltuples <= 0 in a never-analyzed table, which shouldn't be a situation that persists for long unless the table is tiny. Also, if reltuples is in error, the way to bet is that it's too small thanks to the table having been enlarged. But an error in that direction doesn't hurt us: we'll overestimate the required sample_frac and pull back more data than we need, but we'll still end up with a valid sample of the right size. So I doubt it's worth the complication to try to correct based on relpages etc. (Note that any such correction would almost certainly end in increasing our estimate of reltuples. But it's safer to have an underestimate than an overestimate.) I messed around with the sample_frac choosing logic slightly, to make it skip pointless calculations if we decide right off the bat to disable sampling. That way we don't need to worry about avoiding zero divide, nor do we have to wonder if any of the later calculations could misbehave. I left your logic about "disable if saving fewer than 100 rows" alone, but I have to wonder if using an absolute threshold rather than a relative one is well-conceived. Sampling at a rate of 99.9 percent seems pretty pointless, but this code is perfectly capable of accepting that if reltuples is big enough. So personally I'd do that more like if (sample_frac > 0.95) method = ANALYZE_SAMPLE_OFF; which is simpler and would also eliminate the need for the previous range-clamp step. I'm not sure what the right cutoff is, but your "100 tuples" constant is just as arbitrary. I rearranged the docs patch too. Where you had it, analyze_sampling was between fdw_startup_cost/fdw_tuple_cost and the following para discussing them, which didn't seem to me to flow well at all. I ended up putting analyze_sampling in its own separate list. You could almost make a case for giving it its own , but I concluded that was probably overkill. One thing I'm not happy about, but did not touch here, is the expense of the test cases you added. On my machine, that adds a full 10% to the already excessively long runtime of postgres_fdw.sql --- and I do not think it's buying us anything. It is not this module's job to test whether bernoulli sampling works on partitioned tables. I think you should just add enough to make sure we exercise the relevant code paths in postgres_fdw itself. With these issues addressed, I think this'd be committable. regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index a9766f9734..ea2139fbfa 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2369,14 +2369,56 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire a number of rows of a relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(&relname); + deparseRelation(&relname, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * * SELECT command is appended to buf, and list of columns retrieved * is returned to *retrieved_attrs. + * + * XXX We allow customizing the sampling method, but we only support methods + * we can decide based on server version. Allowing custom TSM modules (for + * example tsm_system_rows) might be useful, but it would require detecting + * which extensions are installed, to allow automatic fall-back. Moreover, the + * methods use different parameters (not sampling rate). So we don't do this + * for now, leaving it for future improvements. + * + * XXX Using remote random() to sample rows has advantages & disadvantages. + * The advantages are that this works on all PostgreSQL versions (unlike + * TABLESAMPLE), and that it does the sampling on the remote side (unlike + * the old approach, which transfers everything and then discards most data). + * We could also do "ORDER BY random() LIMIT x", which would always pick + * the expected number of rows, but it requires sorting so it's a bit more + * expensive. + * + * The disadvantage is that we still have to read al
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 7/16/22 23:57, Tom Lane wrote: > Andres Freund writes: >> On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: >>> And here's the slightly simplified patch, without the part adding the >>> unnecessary GetServerVersion() function. > >> Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log >> Marked as waiting-on-author. > > Here's a rebased version that should at least pass regression tests. > Thanks. I've been hacking on this over the past few days, and by coincidence I've been improving exactly the stuff you've pointed out in the review. 0001 is just the original patch rebased, 0002 includes all the various changes. > I've not reviewed it in any detail, but: > > * I'm not really on board with defaulting to SYSTEM sample method, > and definitely not on board with not allowing any other choice. > We don't know enough about the situation in a remote table to be > confident that potentially-nonrandom sampling is OK. So personally > I'd default to BERNOULLI, which is more reliable and seems plenty fast > enough given your upthread results. It could be an idea to extend the > sample option to be like "sample [ = methodname ]", if you want more > flexibility, but I'd be happy leaving that for another time. > I agree, I came roughly to the same conclusion, so I replaced the simple on/off option with these options: off - Disables the remote sampling, so we just fetch everything and do sampling on the local node, just like today. random - Remote sampling, but "naive" implementation using random() function. The advantage is this reduces the amount of data we need to transfer, but it still reads the whole table. This should work for all server versions, I believe. system - TABLESAMPLE system method. bernoulli - TABLESAMOLE bernoulli (default for 9.5+) auto - picks bernoulli on 9.5+, random on older servers. I'm not sure about custom TABLESAMPLE methods - that adds more complexity to detect if it's installed, it's trickier to decide what's the best choice (for "auto" to make decide), and the parameter is also different (e.g. system_rows uses number of rows vs. sampling rate). > * The code depending on reltuples is broken in recent server versions, > and might give useless results in older ones too (if reltuples = > relpages = 0). Ideally we'd retrieve all of reltuples, relpages, and > pg_relation_size(rel), and do the same calculation the planner does. > Not sure if pg_relation_size() exists far enough back though. > Yes, I noticed that too, and the reworked code should deal with this reltuples=0 (by just disabling remote sampling). I haven't implemented the reltuples/relpages correction yet, but I don't think compatibility would be an issue - deparseAnalyzeSizeSql() already calls pg_relation_size(), after all. FWIW it seems a bit weird being so careful about adjusting reltuples, when acquire_inherited_sample_rows() only really looks at relpages when deciding how many rows to sample from each partition. If our goal is to use a more accurate reltuples, maybe we should do that in the first step already. Otherwise we may end up build with a sample that does not reflect sizes of the partitions correctly. Of course, the sample rate also matters for non-partitioned tables. > * Copying-and-pasting all of deparseAnalyzeSql (twice!) seems pretty > bletcherous. Why not call that and then add a WHERE clause to its > result, or just add some parameters to it so it can do that itself? > Right. I ended up refactoring this into a single function, with a "method" parameter that determines if/how we do the remote sampling. > * More attention to updating relevant comments would be appropriate, > eg here you've not bothered to fix the adjacent falsified comment: > > /* We've retrieved all living tuples from foreign server. */ > - *totalrows = astate.samplerows; > + if (do_sample) > + *totalrows = reltuples; > + else > + *totalrows = astate.samplerows; > Yep, fixed. > * Needs docs obviously. I'm not sure if the existing regression > testing covers the new code adequately or if we need more cases. > Yep, I added the "sampling_method" to postgres-fdw.sgml. > Having said that much, I'm going to leave it in Waiting on Author > state. Thanks. I'll switch this to "needs review" now. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 77589ba90e8f3007b0d58f522f9e498b7d8ab277 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sat, 16 Jul 2022 00:37:20 +0200 Subject: [PATCH 1/2] postgres_fdw: sample data on remote node for ANALYZE When performing ANALYZE on a foreign tables, we need to collect sample of rows. Until now, we simply read all data from the remote node and built the sample locally. That is very expensive, especially in terms of network traffic etc. But it's possible to move the sampling to the remote node, and use either TABLESAMPLE or simply random() to transfer just much smaller amount
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Andres Freund writes: > On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: >> And here's the slightly simplified patch, without the part adding the >> unnecessary GetServerVersion() function. > Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log > Marked as waiting-on-author. Here's a rebased version that should at least pass regression tests. I've not reviewed it in any detail, but: * I'm not really on board with defaulting to SYSTEM sample method, and definitely not on board with not allowing any other choice. We don't know enough about the situation in a remote table to be confident that potentially-nonrandom sampling is OK. So personally I'd default to BERNOULLI, which is more reliable and seems plenty fast enough given your upthread results. It could be an idea to extend the sample option to be like "sample [ = methodname ]", if you want more flexibility, but I'd be happy leaving that for another time. * The code depending on reltuples is broken in recent server versions, and might give useless results in older ones too (if reltuples = relpages = 0). Ideally we'd retrieve all of reltuples, relpages, and pg_relation_size(rel), and do the same calculation the planner does. Not sure if pg_relation_size() exists far enough back though. * Copying-and-pasting all of deparseAnalyzeSql (twice!) seems pretty bletcherous. Why not call that and then add a WHERE clause to its result, or just add some parameters to it so it can do that itself? * More attention to updating relevant comments would be appropriate, eg here you've not bothered to fix the adjacent falsified comment: /* We've retrieved all living tuples from foreign server. */ - *totalrows = astate.samplerows; + if (do_sample) + *totalrows = reltuples; + else + *totalrows = astate.samplerows; * Needs docs obviously. I'm not sure if the existing regression testing covers the new code adequately or if we need more cases. Having said that much, I'm going to leave it in Waiting on Author state. regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 8f4d8a5022..cdf12e53e8 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2297,6 +2297,31 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire the number of rows in given relation. + * + * We just need an estimate here, so consult pg_class.reltuples instead + * of doing anything as expensive as COUNT(*). + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly, like the planner does? + * XXX: it needs some work anyway, because it won't do anything sane + * for a never-analyzed table with reltuples = -1. + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(&relname); + deparseRelation(&relname, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * @@ -2358,6 +2383,135 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) deparseRelation(buf, rel); } +/* + * Construct SELECT statement to acquire sample rows of given relation, + * by sampling a fraction of the table using TABLESAMPLE SYSTEM. + * + * SELECT command is appended to buf, and list of columns retrieved + * is returned to *retrieved_attrs. + * + * Note: We could allow selecting system/bernoulli, and maybe even the + * optional TSM modules (especially tsm_system_rows would help). + */ +void +deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac) +{ + Oid relid = RelationGetRelid(rel); + TupleDesc tupdesc = RelationGetDescr(rel); + int i; + char *colname; + List *options; + ListCell *lc; + bool first = true; + + *retrieved_attrs = NIL; + + appendStringInfoString(buf, "SELECT "); + for (i = 0; i < tupdesc->natts; i++) + { + /* Ignore dropped columns. */ + if (TupleDescAttr(tupdesc, i)->attisdropped) + continue; + + if (!first) + appendStringInfoString(buf, ", "); + first = false; + + /* Use attribute name or column_name option. */ + colname = NameStr(TupleDescAttr(tupdesc, i)->attname); + options = GetForeignColumnOptions(relid, i + 1); + + foreach(lc, options) + { + DefElem*def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, "column_name") == 0) + { +colname = defGetString(def); +break; + } + } + + appendStringInfoString(buf, quote_identifier(colname)); + + *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); + } + + /* Don't generat
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote: > And here's the slightly simplified patch, without the part adding the > unnecessary GetServerVersion() function. Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log Marked as waiting-on-author. Greetings, Andres Freund
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/22/22 21:12, Tomas Vondra wrote: > On 2/22/22 01:36, Fujii Masao wrote: >> >> >> On 2022/02/18 22:28, Tomas Vondra wrote: >>> Hi, >>> >>> here's a slightly updated version of the patch series. >> >> Thanks for updating the patches! >> >> >>> The 0001 part >>> adds tracking of server_version_num, so that it's possible to enable >>> other features depending on it. >> >> Like configure_remote_session() does, can't we use PQserverVersion() >> instead of implementing new function GetServerVersion()? >> > > Ah! My knowledge of libpq is limited, so I wasn't sure this function > exists. It'll simplify the patch. > And here's the slightly simplified patch, without the part adding the unnecessary GetServerVersion() function. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 089ecf8c9ee44b89becfbcfb9c7a0ddc6c8f197a Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 00:33:19 +0100 Subject: [PATCH] postgres_fdw: sample data on remote node for ANALYZE When performing ANALYZE on a foreign tables, we need to collect sample of rows. Until now, we simply read all data from the remote node and built the sample locally. That is very expensive, especially in terms of network traffic etc. But it's possible to move the sampling to the remote node, and use either TABLESAMPLE or simply random() to transfer just much smaller amount of data. So we run either SELECT * FROM table TABLESAMPLE SYSTEM(fraction) or SELECT * FROM table WHERE random() < fraction depending on the server version (TABLESAMPLE is supported since 9.5). To do that, we need to determine what fraction of the table to sample. We rely on reltuples (fetched from the remote node) to be sufficiently accurate and up to date, and calculate the fraction based on that. We increase the sample size a bit (in case the table shrunk), and we still do the reservoir sampling (in case it grew). Using tsm_system_rows would allow specifying sample size in rows, without determining sampling rate. But the sampling method may not be installed, and we'd still have to determine the relation size. This adds 'sample' option for remote servers / tables. By default, it's set to 'true' which enables remote sampling. Setting it to 'false' uses the old approach of fetching everything and sampling locally. --- contrib/postgres_fdw/deparse.c | 152 contrib/postgres_fdw/option.c | 7 +- contrib/postgres_fdw/postgres_fdw.c | 142 +- contrib/postgres_fdw/postgres_fdw.h | 7 ++ 4 files changed, 304 insertions(+), 4 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..32f2c0d5fb3 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire numbe of rows of given relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(&relname); + deparseRelation(&relname, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * @@ -2328,6 +2348,138 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) deparseRelation(buf, rel); } +/* + * Construct SELECT statement to acquire sample rows of given relation, + * by sampling a fraction of the table using TABLESAMPLE SYSTEM. + * + * SELECT command is appended to buf, and list of columns retrieved + * is returned to *retrieved_attrs. + * + * Note: We could allow selecting system/bernoulli, and maybe even the + * optional TSM modules (especially tsm_system_rows would help). + */ +void +deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac) +{ + Oid relid = RelationGetRelid(rel); + TupleDesc tupdesc = RelationGetDescr(rel); + int i; + char *colname; + List *options; + ListCell *lc; + bool first = true; + + *retrieved_attrs = NIL; + + appendStringInfoString(buf, "SELECT "); + for (i = 0; i < tupdesc->natts; i++) + { + /* Ignore dropped columns. */ + if (TupleDescAttr(tupdesc, i)->attisdropped) + continue; + + if (!first) + appendStringInfoString(buf, ", "); + first = false; + + /* Use attribute name or column_name option. */ + colname = NameStr(TupleDescAttr(tupdesc, i)->attname); + options = GetForeignColumnOptions(relid, i + 1); + + foreach(lc, options) + { + DefElem*def = (DefElem *) lfirst(
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/22/22 01:36, Fujii Masao wrote: > > > On 2022/02/18 22:28, Tomas Vondra wrote: >> Hi, >> >> here's a slightly updated version of the patch series. > > Thanks for updating the patches! > > >> The 0001 part >> adds tracking of server_version_num, so that it's possible to enable >> other features depending on it. > > Like configure_remote_session() does, can't we use PQserverVersion() > instead of implementing new function GetServerVersion()? > Ah! My knowledge of libpq is limited, so I wasn't sure this function exists. It'll simplify the patch. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2022/02/18 22:28, Tomas Vondra wrote: Hi, here's a slightly updated version of the patch series. Thanks for updating the patches! The 0001 part adds tracking of server_version_num, so that it's possible to enable other features depending on it. Like configure_remote_session() does, can't we use PQserverVersion() instead of implementing new function GetServerVersion()? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, here's a slightly updated version of the patch series. The 0001 part adds tracking of server_version_num, so that it's possible to enable other features depending on it. In this case it's used to decide whether TABLESAMPLE is supported. The 0002 part modifies the sampling. I realized we can do something similar even on pre-9.5 releases, by running "WHERE random() < $1". Not perfect, because it still has to read the whole table, but still better than also sending it over the network. There's a "sample" option for foreign server/table, which can be used to disable the sampling if needed. A simple measurement on a table with 10M rows, on localhost. old:6600ms random: 450ms tablesample: 40ms (system) tablesample: 200ms (bernoulli) Local analyze takes ~190ms, so that's quite close. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 6524f8c6f0db13c5dca0438bdda194ee5bedebbb Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 01:32:25 +0100 Subject: [PATCH 1/2] postgres_fdw: track server version for connections To allow using features that only exist on new Postgres versions, we need some information about version of the remote node. We simply request server_version_num from the remote node. We only fetch it if version number is actually needed, and we cache it. --- contrib/postgres_fdw/connection.c | 44 + contrib/postgres_fdw/postgres_fdw.h | 2 ++ 2 files changed, 46 insertions(+) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index f753c6e2324..3ea2948d3ec 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -279,6 +279,50 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state) return entry->conn; } +/* + * Determine remote server version (as an int value). + * + * The value is determined only once and then cached in PgFdwConnState. + */ +int +GetServerVersion(PGconn *conn, PgFdwConnState *state) +{ + PGresult *volatile res = NULL; + + /* + * If we already know server version for this connection, we're done. + */ + if (state->server_version_num != 0) + return state->server_version_num; + + /* PGresult must be released before leaving this function. */ + PG_TRY(); + { + char *line; + + /* + * Execute EXPLAIN remotely. + */ + res = pgfdw_exec_query(conn, "SHOW server_version_num", NULL); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + pgfdw_report_error(ERROR, res, conn, false, "SHOW server_version_num"); + + /* + * Extract the server version number. + */ + line = PQgetvalue(res, 0, 0); + state->server_version_num = strtol(line, NULL, 10); + } + PG_FINALLY(); + { + if (res) + PQclear(res); + } + PG_END_TRY(); + + return state->server_version_num; +} + /* * Reset all transient state fields in the cached connection entry and * establish new connection to the remote server. diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index 8ae79e97e44..1687c62df2d 100644 --- a/contrib/postgres_fdw/postgres_fdw.h +++ b/contrib/postgres_fdw/postgres_fdw.h @@ -132,6 +132,7 @@ typedef struct PgFdwRelationInfo typedef struct PgFdwConnState { AsyncRequest *pendingAreq; /* pending async request */ + int server_version_num; /* remote server version */ } PgFdwConnState; /* in postgres_fdw.c */ @@ -143,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq); extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state); extern void ReleaseConnection(PGconn *conn); +extern int GetServerVersion(PGconn *conn, PgFdwConnState *state); extern unsigned int GetCursorNumber(PGconn *conn); extern unsigned int GetPrepStmtNumber(PGconn *conn); extern void do_sql_command(PGconn *conn, const char *sql); -- 2.34.1 From 546657dd0d3ab3bebdb1145785c62809b78e7644 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 00:33:19 +0100 Subject: [PATCH 2/2] postgres_fdw: sample data on remote node for ANALYZE When performing ANALYZE on a foreign tables, we need to collect sample of rows. Until now, we simply read all data from the remote node and built the sample locally. That is very expensive, especially in terms of network traffic etc. But it's possible to move the sampling to the remote node, and use either TABLESAMPLE or simply random() to transfer just much smaller amount of data. So we run either SELECT * FROM table TABLESAMPLE SYSTEM(fraction) or SELECT * FROM table WHERE random() < fraction depending on the server version (TABLESAMPLE is supported since 9.5). To do that, we need to determine what fraction of the table to sample. We rely on reltuples (fetched from the remote node) to be sufficiently accurate and up to date, and calculate the fraction based on that. We increase the sample size a bit (in case the table shrunk), and we still do the reservoi
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Hello, I find it a great idea to use TABLESAMPLE in postgres_fdw ANALYZE. Let me offer you some ideas how to resolve you problems. Tomas Vondra < tomas.von...@enterprisedb.com > writes: > The other issue is which sampling method to use - we have SYSTEM and > BERNOULLI built in, and the tsm_system_rows as an extension (and _time, > but that's not very useful here). I guess we'd use one of the built-in > ones, because that'll work on more systems out of the box. It’s hard to choose between speed and quality, but I think we need SYSTEM method here. This patch is for speeding-up ANALYZE, and SYSTEM method will faster than BERNOULLI on fraction values to 50%. > But that leads to the main issue - determining the fraction of rows to > sample. We know how many rows we want to sample, but we have no idea how > many rows there are in total. We can look at reltuples, but we can't be > sure how accurate / up-to-date that value is. > > The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and > applies some simple sanity checks, but I wonder if we need to do more > (e.g. look at relation size and adjust reltuples by current/relpages). I found a query on Stackoverflow (it does similar thing to what estimate_rel_size does) that may help with it. So that makes tsm_system_rows unnecessary. -- SELECT (CASE WHEN relpages = 0 THEN float8 '0' ELSE reltuples / relpages END * (pg_relation_size(oid) / pg_catalog.current_setting('block_size')::int) )::bigint FROM pg_class c WHERE c.oid = 'tablename'::regclass; -- And one more refactoring note. New function deparseAnalyzeSampleSql duplicates function deparseAnalyzeSql (is previous in file deparse.c) except for the new last line. I guess it would be better to add new parameter — double sample_frac — to existing function deparseAnalyzeSql and use it as a condition for adding "TABLESAMPLE SYSTEM..." to SQL query (set it to zero when do_sample is false). Or you may also add do_sample as a parameter to deparseAnalyzeSql, but as for me that’s redundantly. Stackoverflow: https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql -- Sofia Kopikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/11/22 19:13, Robert Haas wrote: > On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: >> While I'm not opposed to moving those goalposts at some point, >> I think pushing them all the way up to 9.5 for this one easily-fixed >> problem is not very reasonable. >> >> Given other recent discussion, an argument could be made for moving >> the cutoff to 9.2, on the grounds that it's too hard to test against >> anything older. But that still leaves us needing a version check >> if we want to use TABLESAMPLE. > > OK, thanks. > Yeah, I think checking server_version is fairly simple. Which is mostly why I haven't done anything about that in the submitted patch, because the other issues (what fraction to sample) seemed more important. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: > While I'm not opposed to moving those goalposts at some point, > I think pushing them all the way up to 9.5 for this one easily-fixed > problem is not very reasonable. > > Given other recent discussion, an argument could be made for moving > the cutoff to 9.2, on the grounds that it's too hard to test against > anything older. But that still leaves us needing a version check > if we want to use TABLESAMPLE. OK, thanks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Robert Haas writes: > I think it's going to be necessary to compromise on that at some > point. Sure. The existing postgres_fdw documentation [1] already addresses this issue: postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT. While I'm not opposed to moving those goalposts at some point, I think pushing them all the way up to 9.5 for this one easily-fixed problem is not very reasonable. Given other recent discussion, an argument could be made for moving the cutoff to 9.2, on the grounds that it's too hard to test against anything older. But that still leaves us needing a version check if we want to use TABLESAMPLE. regards, tom lane [1] https://www.postgresql.org/docs/devel/postgres-fdw.html#id-1.11.7.45.17
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On Fri, Feb 11, 2022 at 12:39 PM Tom Lane wrote: > Tomas Vondra writes: > > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > > to collect/transfer just a small sample from the remote node, saving > > both CPU and network. > > This is great if the remote end has TABLESAMPLE, but pre-9.5 servers > don't, and postgres_fdw is supposed to still work with old servers. > So you need some conditionality for that. I think it's going to be necessary to compromise on that at some point. I don't, for example, think it would be reasonable for postgres_fdw to have detailed knowledge of which operators can be pushed down as a function of the remote PostgreSQL version. Nor do I think that we care about whether this works at all against, say, PostgreSQL 8.0. I'm not sure where it's reasonable to draw a line and say we're not going to expend any more effort, and maybe 15 with 9.5 is a small enough gap that we still care at least somewhat about compatibility. But even that is not 100% obvious to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > to collect/transfer just a small sample from the remote node, saving > both CPU and network. This is great if the remote end has TABLESAMPLE, but pre-9.5 servers don't, and postgres_fdw is supposed to still work with old servers. So you need some conditionality for that. regards, tom lane
postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, here's a small patch modifying postgres_fdw to use TABLESAMPLE to collect sample when running ANALYZE on a foreign table. Currently the sampling happens locally, i.e. we fetch all data from the remote server and then pick rows. But that's hugely expensive for large relations and/or with limited network bandwidth, of course. Alexander Lepikhov mentioned this in [1], but it was actually proposed by Stephen in 2020 [2] but no patch even materialized. So here we go. The patch does a very simple thing - it uses TABLESAMPLE to collect/transfer just a small sample from the remote node, saving both CPU and network. And indeed, that helps quite a bit: - create table t (a int); Time: 10.223 ms insert into t select i from generate_series(1,1000) s(i); Time: 552.207 ms analyze t; Time: 310.445 ms CREATE FOREIGN TABLE ft (a int) SERVER foreign_server OPTIONS (schema_name 'public', table_name 't'); Time: 4.838 ms analyze ft; WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t TABLESAMPLE SYSTEM(0.375001) Time: 44.632 ms alter foreign table ft options (sample 'false'); Time: 4.821 ms analyze ft; WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t Time: 6690.425 ms (00:06.690) - 6690ms without sampling, and 44ms with sampling - quite an improvement. Of course, the difference may be much smaller/larger in other cases. Now, there's a couple issues ... Firstly, the FDW API forces a bit strange division of work between different methods and duplicating some of it (and it's already mentioned in postgresAnalyzeForeignTable). But that's minor and can be fixed. The other issue is which sampling method to use - we have SYSTEM and BERNOULLI built in, and the tsm_system_rows as an extension (and _time, but that's not very useful here). I guess we'd use one of the built-in ones, because that'll work on more systems out of the box. But that leads to the main issue - determining the fraction of rows to sample. We know how many rows we want to sample, but we have no idea how many rows there are in total. We can look at reltuples, but we can't be sure how accurate / up-to-date that value is. The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and applies some simple sanity checks, but I wonder if we need to do more (e.g. look at relation size and adjust reltuples by current/relpages). FWIW this is yet a bit more convoluted when analyzing partitioned table with foreign partitions, because we only ever look at relation sizes to determine how many rows to sample from each partition. regards [1] https://www.postgresql.org/message-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb%40postgrespro.ru [2] https://www.postgresql.org/message-id/20200829162231.GE29590%40tamriel.snowman.net -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..68f875c47a6 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire numbe of rows of given relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(&relname); + deparseRelation(&relname, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * @@ -2328,6 +2348,72 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) deparseRelation(buf, rel); } +/* + * Construct SELECT statement to acquire sample rows of given relation, + * by sampling a fraction of the table using TABLESAMPLE SYSTEM. + * + * SELECT command is appended to buf, and list of columns retrieved + * is returned to *retrieved_attrs. + * + * Note: We could allow selecting system/bernoulli, and maybe even the + * optional TSM modules (especially tsm_system_rows would help). + */ +void +deparseAnalyzeSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac) +{ + Oid relid = RelationGetRelid(rel); + TupleDesc tupdesc = RelationGetDescr(rel); + int i; + char *colname; + List *options; + ListCell *lc; + bool first = true; + + *retrieved_attrs = NIL; + + appendStringInfoString(buf, "SELECT "); + for (i = 0; i < tupdesc->natts; i++) + { + /* Ignore dropped columns. */ + if (T