Re: logrep stuck with 'ERROR: int2vector has too many elements'
Andres Freund writes: > WFM, just wanted to be sure we thought about the errors it could cause. I'm > not sure we've exercised cases of tuples being too wide due to variable-width > plain storage types exhaustively. There's only a small number of these types: > int2vector, oidvector, gtsvector, tsquery > What's behind using plain for these types? Is it just because we want to use > it in tables that don't have a toast table (namely pg_index)? Obviously we > can't change the storage in existing releases... For int2vector and oidvector, I think it boils down to wanting to access columns like pg_proc.proargtypes without detoasting. We could fix that, but it'd likely be invasive and not a net positive. It seems a bit broken that tsquery is marked that way, though; I doubt we are getting any notational benefit from it. Dunno about gtsvector. regards, tom lane
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Hi, On 2023-01-15 15:53:09 -0500, Tom Lane wrote: > Andres Freund writes: > > For the purpose here a limit of MaxTupleAttributeNumber or such instead of > > FUNC_MAX_ARGS would do the trick, I think? > > As long as we have to change the code, we might as well remove the > arbitrary restriction. WFM, just wanted to be sure we thought about the errors it could cause. I'm not sure we've exercised cases of tuples being too wide due to variable-width plain storage types exhaustively. There's only a small number of these types: int2vector, oidvector, gtsvector, tsquery What's behind using plain for these types? Is it just because we want to use it in tables that don't have a toast table (namely pg_index)? Obviously we can't change the storage in existing releases... > > Should this be repalloc0? I don't know if the palloc0 above was just used > > with > > the goal of initializing the "header" fields, or also to avoid trailing > > uninitialized bytes? > > I think probably the palloc0 was mostly about belt-and-suspenders > programming. But yeah, its only real value is to ensure that all > the header fields are zero, so I don't think we need repalloc0 > when enlarging. After we set the array size at the end of the > loop, it'd be a programming bug to touch any bytes beyond that. Agreed. Greetings, Andres Freund
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Andres Freund writes: > For the purpose here a limit of MaxTupleAttributeNumber or such instead of > FUNC_MAX_ARGS would do the trick, I think? As long as we have to change the code, we might as well remove the arbitrary restriction. > Should this be repalloc0? I don't know if the palloc0 above was just used with > the goal of initializing the "header" fields, or also to avoid trailing > uninitialized bytes? I think probably the palloc0 was mostly about belt-and-suspenders programming. But yeah, its only real value is to ensure that all the header fields are zero, so I don't think we need repalloc0 when enlarging. After we set the array size at the end of the loop, it'd be a programming bug to touch any bytes beyond that. regards, tom lane
Re: logrep stuck with 'ERROR: int2vector has too many elements'
I wrote: > BTW, fd0b9dceb is in v15, so are you sure this doesn't fail in 15? Ah-hah: simple test cases only fail since b7ae03953. Before that, the default situation was that pg_publication_rel.prattrs was null and that would be passed on as the transmitted value. b7ae03953 decided it'd be cool if pg_get_publication_tables() expanded that to show all the actually-transmitted columns, and as of that point we can get an overflow in int2vectorin given the default case where all columns are published. To break it in v15, you need a test case that publishes more than 100 columns, but not all of them (else the optimization in fetch_remote_table_info's query prevents the issue). regards, tom lane
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Hi, On 2023-01-15 15:17:16 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-01-15 14:39:41 -0500, Tom Lane wrote: > >> But I suppose we are stuck with that, seeing that this datatype choice > >> is effectively part of the logrep protocol now. I think the only > >> reasonable solution is to get rid of the FUNC_MAX_ARGS restriction > >> in int2vectorin. We probably ought to back-patch that as far as > >> pg_publication_rel.prattrs exists, too. > > > Are you thinking of introducing another, or just "rely" on too long arrays > > to > > trigger errors when forming tuples? > > There's enough protections already, eg repalloc will complain if you > try to go past 1GB. We'll practically error out at a much lower limit than that, due to, due to reaching the max length of a row and int2vector not being toastable. So I'm just wondering if we want to set the limit to something that'll commonly avoid erroring out out with something like ERROR: 54000: row is too big: size 10048, maximum size 8160 For the purpose here a limit of MaxTupleAttributeNumber or such instead of FUNC_MAX_ARGS would do the trick, I think? > diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c > index e47c15a54f..44d1c7ad0c 100644 > --- a/src/backend/utils/adt/int.c > +++ b/src/backend/utils/adt/int.c > @@ -143,11 +143,13 @@ int2vectorin(PG_FUNCTION_ARGS) > char *intString = PG_GETARG_CSTRING(0); > Node *escontext = fcinfo->context; > int2vector *result; > + int nalloc; > int n; > > - result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS)); > + nalloc = 32;/* arbitrary initial size guess > */ > + result = (int2vector *) palloc0(Int2VectorSize(nalloc)); > > - for (n = 0; n < FUNC_MAX_ARGS; n++) > + for (n = 0;; n++) > { > longl; > char *endp; > @@ -157,6 +159,12 @@ int2vectorin(PG_FUNCTION_ARGS) > if (*intString == '\0') > break; > > + if (n >= nalloc) > + { > + nalloc *= 2; > + result = (int2vector *) repalloc(result, > Int2VectorSize(nalloc)); > + } Should this be repalloc0? I don't know if the palloc0 above was just used with the goal of initializing the "header" fields, or also to avoid trailing uninitialized bytes? Greetings, Andres Freund
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Andres Freund writes: > On 2023-01-15 14:39:41 -0500, Tom Lane wrote: >> But I suppose we are stuck with that, seeing that this datatype choice >> is effectively part of the logrep protocol now. I think the only >> reasonable solution is to get rid of the FUNC_MAX_ARGS restriction >> in int2vectorin. We probably ought to back-patch that as far as >> pg_publication_rel.prattrs exists, too. > Are you thinking of introducing another, or just "rely" on too long arrays to > trigger errors when forming tuples? There's enough protections already, eg repalloc will complain if you try to go past 1GB. I'm thinking of the attached for HEAD (it'll take minor mods to back-patch). regards, tom lane diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c index e47c15a54f..44d1c7ad0c 100644 --- a/src/backend/utils/adt/int.c +++ b/src/backend/utils/adt/int.c @@ -143,11 +143,13 @@ int2vectorin(PG_FUNCTION_ARGS) char *intString = PG_GETARG_CSTRING(0); Node *escontext = fcinfo->context; int2vector *result; + int nalloc; int n; - result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS)); + nalloc = 32;/* arbitrary initial size guess */ + result = (int2vector *) palloc0(Int2VectorSize(nalloc)); - for (n = 0; n < FUNC_MAX_ARGS; n++) + for (n = 0;; n++) { long l; char *endp; @@ -157,6 +159,12 @@ int2vectorin(PG_FUNCTION_ARGS) if (*intString == '\0') break; + if (n >= nalloc) + { + nalloc *= 2; + result = (int2vector *) repalloc(result, Int2VectorSize(nalloc)); + } + errno = 0; l = strtol(intString, &endp, 10); @@ -176,17 +184,11 @@ int2vectorin(PG_FUNCTION_ARGS) ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type %s: \"%s\"", - "integer", intString))); + "smallint", intString))); result->values[n] = l; intString = endp; } - while (*intString && isspace((unsigned char) *intString)) - intString++; - if (*intString) - ereturn(escontext, (Datum) 0, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("int2vector has too many elements"))); SET_VARSIZE(result, Int2VectorSize(n)); result->ndim = 1; @@ -261,12 +263,6 @@ int2vectorrecv(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid int2vector data"))); - /* check length for consistency with int2vectorin() */ - if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("oidvector has too many elements"))); - PG_RETURN_POINTER(result); } diff --git a/src/backend/utils/adt/oid.c b/src/backend/utils/adt/oid.c index 697588313d..3f7af5b3a0 100644 --- a/src/backend/utils/adt/oid.c +++ b/src/backend/utils/adt/oid.c @@ -115,27 +115,30 @@ oidvectorin(PG_FUNCTION_ARGS) char *oidString = PG_GETARG_CSTRING(0); Node *escontext = fcinfo->context; oidvector *result; + int nalloc; int n; - result = (oidvector *) palloc0(OidVectorSize(FUNC_MAX_ARGS)); + nalloc = 32;/* arbitrary initial size guess */ + result = (oidvector *) palloc0(OidVectorSize(nalloc)); - for (n = 0; n < FUNC_MAX_ARGS; n++) + for (n = 0;; n++) { while (*oidString && isspace((unsigned char) *oidString)) oidString++; if (*oidString == '\0') break; + + if (n >= nalloc) + { + nalloc *= 2; + result = (oidvector *) repalloc(result, OidVectorSize(nalloc)); + } + result->values[n] = uint32in_subr(oidString, &oidString, "oid", escontext); if (SOFT_ERROR_OCCURRED(escontext)) PG_RETURN_NULL(); } - while (*oidString && isspace((unsigned char) *oidString)) - oidString++; - if (*oidString) - ereturn(escontext, (Datum) 0, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("oidvector has too many elements"))); SET_VARSIZE(result, OidVectorSize(n)); result->ndim = 1; @@ -212,12 +215,6 @@ oidvectorrecv(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid oidvector data"))); - /* check length for consistency with oidvectorin() */ - if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("oidvector has too many elements"))); - PG_RETURN_POINTER(result); }
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Hi, On 2023-01-15 14:39:41 -0500, Tom Lane wrote: > It looks like the proximate cause is that fd0b9dceb started fetching > the remote's pg_get_publication_tables() result as-is rather than > unnesting it, so that the on-the-wire representation is now int2vector > not a series of int2. However, that just begs the question of who > thought that making pg_publication_rel.prattrs be int2vector instead > of int2[] was a good idea. Quite aside from this issue, int2vector > isn't toastable, which'll lead to bloat in pg_publication_rel. There's no easily visible comments about these restrictions of int2vector. And there's plenty other places using it, where it's not immediatelly obvious that the number of entries is very constrained, even if they are (e.g. pg_trigger.tgattr). > But I suppose we are stuck with that, seeing that this datatype choice > is effectively part of the logrep protocol now. I think the only > reasonable solution is to get rid of the FUNC_MAX_ARGS restriction > in int2vectorin. We probably ought to back-patch that as far as > pg_publication_rel.prattrs exists, too. Are you thinking of introducing another, or just "rely" on too long arrays to trigger errors when forming tuples? I guess we'll have to process the input twice? Pre-allocating an int2vector for 100 elements is one thing, for 1600 another. Greetings, Andres Freund
Re: logrep stuck with 'ERROR: int2vector has too many elements'
I wrote: > Alvaro Herrera writes: >> At the same time, I don't understand why it fails in 16 but not in 15. >> Maybe something changed in the way we process the column lists in 16? > Probably didn't have a dependency on int2vector before. It looks like the proximate cause is that fd0b9dceb started fetching the remote's pg_get_publication_tables() result as-is rather than unnesting it, so that the on-the-wire representation is now int2vector not a series of int2. However, that just begs the question of who thought that making pg_publication_rel.prattrs be int2vector instead of int2[] was a good idea. Quite aside from this issue, int2vector isn't toastable, which'll lead to bloat in pg_publication_rel. But I suppose we are stuck with that, seeing that this datatype choice is effectively part of the logrep protocol now. I think the only reasonable solution is to get rid of the FUNC_MAX_ARGS restriction in int2vectorin. We probably ought to back-patch that as far as pg_publication_rel.prattrs exists, too. BTW, fd0b9dceb is in v15, so are you sure this doesn't fail in 15? It looks like the code path is only taken if the remote is also >= 15, so maybe your test case didn't expose it? regards, tom lane
Re: logrep stuck with 'ERROR: int2vector has too many elements'
Alvaro Herrera writes: > On 2023-Jan-15, Erik Rijkers wrote: >> Logical replication sometimes gets stuck with >> ERROR: int2vector has too many elements > Weird. This error comes from int2vectorin which amusingly only wants to > read up to FUNC_MAX_ARGS values in the array (100 in the default config, > but it can be changed in pg_config_manual.h). I wonder how come we > haven't noticed this before ... surely we use int2vector's for other > things than function argument lists nowadays. Yeah. So remove the limit in int2vectorin (probably also oidvectorin), or change logrep to use int2[] not int2vector, or both. > At the same time, I don't understand why it fails in 16 but not in 15. > Maybe something changed in the way we process the column lists in 16? Probably didn't have a dependency on int2vector before. regards, tom lane
RE: logrep stuck with 'ERROR: int2vector has too many elements'
On Sunday, January 15, 2023 5:35 PM Erik Rijkers wrote: > > I can't find the exact circumstances that cause it but it has something to do > with > many columns (or adding many columns) in combination with perhaps > generated columns. > > This replication test, in a slightly different form, used to work. This is > also > suggested by the fact that the attached runs without errors in REL_15_STABLE > but > gets stuck in HEAD. > > What it does: it initdbs and runs two instances, primary and replica. In the > primary 'pgbench -is1' done, and many columns, including 1 generated column, > are added to all 4 pgbench tables. This is then pg_dump/pg_restored to the > replica, and a short pgbench is run. The result tables on primary and replica > are > compared for the final result. > (To run it will need some tweaks to directory and connection parms) > > I ran it on both v15 and v16 for 25 runs: with the parameters as given > 15 has no problem while 16 always got stuck with the int2vector error. > (15 can actually be pushed up to the max of 1600 columns per table without > errors) > > Both REL_15_STABLE and 16devel built from recent master on Debian 10, gcc > 12.2.0. > > I hope someone understands what's going wrong. Thanks for reporting. I think the basic problem is that we try to fetch the column list as a int2vector when doing table sync, and then if the number of columns is larger than 100, we will get an ERROR like the $subject. We can also hit this ERROR by manually specifying a long(>100) column list in the publication Like: create publication pub for table test(a1,a2,a3... a200); create subscription xxx. The script didn't reproduce this in PG15, because we didn't filter out generated column when fetching the column list, so it assumes all columns are replicated and will return NULL for the column list(int2vector) value. But in PG16 (b7ae039), we started to filter out generated column(because generated columns are not replicated in logical replication), so we get a valid int2vector and get the ERROR. I will think and work on a fix for this. Best regards, Hou zj
Re: logrep stuck with 'ERROR: int2vector has too many elements'
On 1/15/23 12:33, Alvaro Herrera wrote: On 2023-Jan-15, Erik Rijkers wrote: Hello, Logical replication sometimes gets stuck with ERROR: int2vector has too many elements Weird. This error comes from int2vectorin which amusingly only wants to read up to FUNC_MAX_ARGS values in the array (100 in the default config, but it can be changed in pg_config_manual.h). I wonder how come we haven't noticed this before ... surely we use int2vector's for other things than function argument lists nowadays. At the same time, I don't understand why it fails in 16 but not in 15. Maybe something changed in the way we process the column lists in 16? I wrote as comment in the script, but that's maybe vague so let me be more explicit: 16 also accepts many columns, up to 1600, without error, as long as that is not combined with generated column(s) such as in the script. It seems the combination becomes quickly problematic. Although adding just 50 columns + a generated column is still ok, 100 is already too high (see the ADD_COLUMNS variable in my script). Weird indeed. Erik
Re: logrep stuck with 'ERROR: int2vector has too many elements'
On 2023-Jan-15, Erik Rijkers wrote: > Hello, > > Logical replication sometimes gets stuck with > ERROR: int2vector has too many elements Weird. This error comes from int2vectorin which amusingly only wants to read up to FUNC_MAX_ARGS values in the array (100 in the default config, but it can be changed in pg_config_manual.h). I wonder how come we haven't noticed this before ... surely we use int2vector's for other things than function argument lists nowadays. At the same time, I don't understand why it fails in 16 but not in 15. Maybe something changed in the way we process the column lists in 16? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Update: super-fast reaction on the Postgres bugs mailing list. The report was acknowledged [...], and a fix is under discussion. The wonders of open-source !" https://twitter.com/gunnarmorling/status/1596080409259003906