Re: Error-safe user functions
On 2022-12-24 Sa 10:42, Tom Lane wrote: > Andrew Dunstan writes: >> As I was giving this a final polish I noticed this in jspConvertRegexFlags: >> /* >> * We'll never need sub-match details at execution. While >> * RE_compile_and_execute would set this flag anyway, force it on here to >> * ensure that the regex cache entries created by makeItemLikeRegex are >> * useful. >> */ >> cflags |= REG_NOSUB; >> Clearly the comment would no longer be true. I guess I should just >> remove this? > Yeah, we can just drop that I guess. I'm slightly worried that we might > need it again after some future refactoring; but it's not really worth > devising a re-worded comment to justify keeping it. > > Also, I realized that I failed in my reviewerly duty by not noticing > that you'd forgotten to pg_regfree the regex after successful > compilation. Running something like this exposes the memory leak > very quickly: > > select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', > 'jsonpath') > from generate_series(1,1000); > > The attached delta patch takes care of it. (Per comment at pg_regcomp, > we don't need this after a failure return.) > > Thanks, pushed with those changes. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: ARRNELEMS Out-of-bounds possible errors
Nikita Malakhov writes: > Even with null context it does not turn to ereport, and returns dummy value Read the code. ArrayGetNItems passes NULL for escontext, therefore if there's a problem the ereturn calls in ArrayGetNItemsSafe will throw error, *not* return -1. Not sure how we could persuade Coverity of that, though, if it fails to understand that for itself. regards, tom lane
Re: Error-safe user functions
Ted Yu writes: > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: >> Yes, it is. We don't want a query-cancel transformed into a soft error. > The same regex, without user interruption, would exhibit an `invalid > regular expression` error. On what grounds do you claim that? The timing of arrival of the SIGINT is basically chance --- it might happen while we're inside backend/regex, or not. I mean, sure you could claim that a bad regex might run a long time and thereby be more likely to cause the user to issue a query cancel, but that's a stretched line of reasoning. regards, tom lane
Re: Error-safe user functions
Andrew Dunstan writes: > As I was giving this a final polish I noticed this in jspConvertRegexFlags: > /* > * We'll never need sub-match details at execution. While > * RE_compile_and_execute would set this flag anyway, force it on here to > * ensure that the regex cache entries created by makeItemLikeRegex are > * useful. > */ > cflags |= REG_NOSUB; > Clearly the comment would no longer be true. I guess I should just > remove this? Yeah, we can just drop that I guess. I'm slightly worried that we might need it again after some future refactoring; but it's not really worth devising a re-worded comment to justify keeping it. Also, I realized that I failed in my reviewerly duty by not noticing that you'd forgotten to pg_regfree the regex after successful compilation. Running something like this exposes the memory leak very quickly: select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 'jsonpath') from generate_series(1,1000); The attached delta patch takes care of it. (Per comment at pg_regcomp, we don't need this after a failure return.) regards, tom lane diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index 8c3a0c7623..30179408f5 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -560,6 +560,8 @@ makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *pattern, (errcode(ERRCODE_INVALID_REGULAR_EXPRESSION), errmsg("invalid regular expression: %s", errMsg))); } + + pg_regfree(_tmp); } *result = v;
Re: ARRNELEMS Out-of-bounds possible errors
Hi, Even with null context it does not turn to ereport, and returns dummy value - #define errsave_domain(context, domain, ...) \ do { \ struct Node *context_ = (context); \ pg_prevent_errno_in_scope(); \ if (errsave_start(context_, domain)) \ __VA_ARGS__, errsave_finish(context_, __FILE__, __LINE__, __func__); \ } while(0) #define errsave(context, ...) \ errsave_domain(context, TEXTDOMAIN, __VA_ARGS__) /* * "ereturn(context, dummy_value, ...);" is exactly the same as * "errsave(context, ...); return dummy_value;". This saves a bit * of typing in the common case where a function has no cleanup * actions to take after reporting a soft error. "dummy_value" * can be empty if the function returns void. */ #define ereturn_domain(context, dummy_value, domain, ...) \ do { \ errsave_domain(context, domain, __VA_ARGS__); \ return dummy_value; \ } while(0) #define ereturn(context, dummy_value, ...) \ ereturn_domain(context, dummy_value, TEXTDOMAIN, __VA_ARGS__) On Fri, Dec 23, 2022 at 11:40 AM Kyotaro Horiguchi wrote: > At Fri, 23 Dec 2022 17:37:55 +0900 (JST), Kyotaro Horiguchi < > horikyota@gmail.com> wrote in > > At Thu, 22 Dec 2022 12:35:58 -0300, Ranier Vilela > wrote in > > > Hi. > > > > > > Per Coverity. > > > > > > The commit ccff2d2 > > > < > https://github.com/postgres/postgres/commit/ccff2d20ed9622815df2a7deffce8a7b14830965 > >, > > > changed the behavior function ArrayGetNItems, > > > with the introduction of the function ArrayGetNItemsSafe. > > > > > > Now ArrayGetNItems may return -1, according to the comment. > > > " instead of throwing an exception. -1 is returned after an error." > > > > If I'm reading the code correctly, it's the definition of > > ArrayGetNItems*Safe*. ArrayGetNItems() calls that function with a NULL > > escontext and the NULL turns ereturn() into ereport(). > > > That doesn't seem to be changed by the commit. > > No.. It seems to me that the commit didn't change its behavior in that > regard. > > > Of course teaching Coverity not to issue the false warnings would be > > another actual issue that we should do, maybe. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > > > -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: Error-safe user functions
On Sat, Dec 24, 2022 at 4:38 AM Andrew Dunstan wrote: > > On 2022-12-24 Sa 04:51, Ted Yu wrote: > > > > > > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > > > > Ted Yu writes: > > > In makeItemLikeRegex : > > > > > + /* See regexp.c for explanation */ > > > + CHECK_FOR_INTERRUPTS(); > > > + pg_regerror(re_result, _tmp, errMsg, > > > sizeof(errMsg)); > > > + ereturn(escontext, false, > > > > > Since an error is returned, I wonder if the > > `CHECK_FOR_INTERRUPTS` call is > > > still necessary. > > > > Yes, it is. We don't want a query-cancel transformed into a soft > > error. > > > > regards, tom lane > > > > Hi, > > For this case (`invalid regular expression`), the potential user > > interruption is one reason for stopping execution. > > I feel surfacing user interruption somehow masks the underlying error. > > > > The same regex, without user interruption, would exhibit an `invalid > > regular expression` error. > > I think it would be better to surface the error. > > > > > > All that this patch is doing is replacing a call to > RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar > code, which gives us the opportunity to call ereturn instead of ereport. > Note that where escontext is NULL (the common case), ereturn functions > identically to ereport. So unless you want to argue that the logic in > RE_compile_and_cache is wrong I don't see what we're arguing about. If > instead I had altered the API of RE_compile_and_cache to include an > escontext parameter we wouldn't be having this argument at all. The only > reason I didn't do that was the point Tom quite properly raised about > why we're doing any caching here anyway. > > > cheers > > > andrew > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com Andrew: Thanks for the response.
Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
On Sat, Dec 24, 2022 at 12:58 AM David Christensen wrote: > > On Fri, Dec 23, 2022 at 12:57 PM David Christensen > wrote: > > > > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy > > wrote: > > > > 2. +$node->init(extra => ['-k'], allows_streaming => 1); > > > When enabled with allows_streaming, there are a bunch of things that > > > happen to the node while initializing, I don't think we need all of > > > them for this. > > > > I think the "allows_streaming" was required to ensure the WAL files > > were preserved properly, and was the approach we ended up taking > > rather than trying to fail the archive_command or other approaches I'd > > taken earlier. I'd rather keep this if we can, unless you can propose > > a different approach that would continue to work in the same way. > > Confirmed that we needed this in order to create the replication slot, > so this /is/ required for the test to work. The added test needs wal_level to be replica, but the TAP tests set it to minimal if allows_streaming isn't passed. However, if passed allows_streaming, it sets a bunch of other parameters which are not required for this test (see note->init function in cluster.pm), hence we could just set the required parameters wal_level = replica and max_wal_senders for the replication slot to be created. > Enclosing v11 with yours and Michael's latest feedback. Thanks for the patch. I've made the above change as well as renamed the test file name to be save_fpi.pl, everything else remains the same as v11. Here's the v12 patch which LGTM. I'll mark it as RfC - https://commitfest.postgresql.org/41/3628/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch Description: Binary data
[PoC] Implementation of distinct in Window Aggregates
Hi, This is a PoC patch which implements distinct operation in window aggregates (without order by and for single column aggregation, final version may vary wrt these limitations). Purpose of this PoC is to get feedback on the approach used and corresponding implementation, any nitpicking as deemed reasonable. Distinct operation is mirrored from implementation in nodeAgg. Existing partitioning logic determines if row is in partition and when distinct is required, all tuples for the aggregate column are stored in tuplesort. When finalize_windowaggregate gets called, tuples are sorted and duplicates are removed, followed by calling the transition function on each tuple. When distinct is not required, the above process is skipped and the transition function gets called directly and nothing gets inserted into tuplesort. Note: For each partition, in tuplesort_begin and tuplesort_end is involved to rinse tuplesort, so at any time, max tuples in tuplesort is equal to tuples in a particular partition. I have verified it for interger and interval column aggregates (to rule out obvious issues related to data types). Sample cases: create table mytable(id int, name text); insert into mytable values(1, 'A'); insert into mytable values(1, 'A'); insert into mytable values(5, 'B'); insert into mytable values(3, 'A'); insert into mytable values(1, 'A'); select avg(distinct id) over (partition by name) from mytable; avg 2. 2. 2. 2. 5. select avg(id) over (partition by name) from mytable; avg 1.5000 1.5000 1.5000 1.5000 5. select avg(distinct id) over () from mytable; avg 3. 3. 3. 3. 3. select avg(distinct id) from mytable; avg 3. This is my first-time contribution. Please let me know if anything can be improved as I`m eager to learn. Regards, Ankit Kumar Pandey From 2a3061a95988c39f4654accc06205099713af6cc Mon Sep 17 00:00:00 2001 From: Ankit Kumar Pandey Date: Wed, 23 Nov 2022 00:38:01 +0530 Subject: [PATCH] Implement distinct in Window Aggregates. --- src/backend/executor/nodeWindowAgg.c | 229 +++ src/backend/optimizer/util/clauses.c | 2 + src/backend/parser/parse_agg.c | 45 ++ src/backend/parser/parse_func.c | 19 +-- src/include/nodes/execnodes.h| 1 + src/include/nodes/primnodes.h| 2 + 6 files changed, 258 insertions(+), 40 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 4f4aeb2883..1d67ba2c39 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -154,6 +154,14 @@ typedef struct WindowStatePerAggData int64 transValueCount; /* number of currently-aggregated rows */ + /* For DISTINCT in Aggregates */ + Datum lastdatum; /* used for single-column DISTINCT */ + FmgrInfo equalfnOne; /* single-column comparisons*/ + + Oid *eq_ops; /* used for equality check in DISTINCT */ + Oid *sort_ops; /* used for sorting distinct columns */ + bool sort_in; /* FLAG set true if data is stored in tuplesort */ + /* Data local to eval_windowaggregates() */ bool restart; /* need to restart this agg in this cycle? */ } WindowStatePerAggData; @@ -163,7 +171,7 @@ static void initialize_windowaggregate(WindowAggState *winstate, WindowStatePerAgg peraggstate); static void advance_windowaggregate(WindowAggState *winstate, WindowStatePerFunc perfuncstate, - WindowStatePerAgg peraggstate); + WindowStatePerAgg peraggstate, Datum value, bool isNull); static bool advance_windowaggregate_base(WindowAggState *winstate, WindowStatePerFunc perfuncstate, WindowStatePerAgg peraggstate); @@ -173,6 +181,9 @@ static void finalize_windowaggregate(WindowAggState *winstate, Datum *result, bool *isnull); static void eval_windowaggregates(WindowAggState *winstate); +static void process_ordered_windowaggregate_single(WindowAggState *winstate, + WindowStatePerFunc perfuncstate, + WindowStatePerAgg peraggstate); static void eval_windowfunction(WindowAggState *winstate, WindowStatePerFunc perfuncstate, Datum *result, bool *isnull); @@ -230,6 +241,7 @@ initialize_windowaggregate(WindowAggState *winstate, peraggstate->transValueIsNull = peraggstate->initValueIsNull; peraggstate->transValueCount = 0; peraggstate->resultValue = (Datum) 0; + peraggstate->lastdatum = (Datum) 0; peraggstate->resultValueIsNull = true; } @@ -240,43 +252,21 @@ initialize_windowaggregate(WindowAggState *winstate, static void advance_windowaggregate(WindowAggState *winstate, WindowStatePerFunc
Re: Error-safe user functions
On 2022-12-23 Fr 13:53, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-22 Th 11:44, Tom Lane wrote: >>> (I wonder why this is using RE_compile_and_cache at all, really, >>> rather than some other API. There doesn't seem to be value in >>> forcing the regex into the cache at this point.) >> I agree. The attached uses pg_regcomp instead. I had a lift a couple of >> lines from regexp.c, but not too many. > LGTM. No further comments. > > As I was giving this a final polish I noticed this in jspConvertRegexFlags: /* * We'll never need sub-match details at execution. While * RE_compile_and_execute would set this flag anyway, force it on here to * ensure that the regex cache entries created by makeItemLikeRegex are * useful. */ cflags |= REG_NOSUB; Clearly the comment would no longer be true. I guess I should just remove this? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
On 2022-12-24 Sa 04:51, Ted Yu wrote: > > > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > > Ted Yu writes: > > In makeItemLikeRegex : > > > + /* See regexp.c for explanation */ > > + CHECK_FOR_INTERRUPTS(); > > + pg_regerror(re_result, _tmp, errMsg, > > sizeof(errMsg)); > > + ereturn(escontext, false, > > > Since an error is returned, I wonder if the > `CHECK_FOR_INTERRUPTS` call is > > still necessary. > > Yes, it is. We don't want a query-cancel transformed into a soft > error. > > regards, tom lane > > Hi, > For this case (`invalid regular expression`), the potential user > interruption is one reason for stopping execution. > I feel surfacing user interruption somehow masks the underlying error. > > The same regex, without user interruption, would exhibit an `invalid > regular expression` error. > I think it would be better to surface the error. > > All that this patch is doing is replacing a call to RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar code, which gives us the opportunity to call ereturn instead of ereport. Note that where escontext is NULL (the common case), ereturn functions identically to ereport. So unless you want to argue that the logic in RE_compile_and_cache is wrong I don't see what we're arguing about. If instead I had altered the API of RE_compile_and_cache to include an escontext parameter we wouldn't be having this argument at all. The only reason I didn't do that was the point Tom quite properly raised about why we're doing any caching here anyway. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Force streaming every change in logical decoding
On Sat, Dec 24, 2022 at 8:56 AM Amit Kapila wrote: > > > > > > > OK, I removed the modification in 022_twophase_cascade.pl and combine the > > > two patches. > > > > Thank you for updating the patch. The v6 patch looks good to me. > > > > LGTM as well. I'll push this on Monday. > The patch looks good to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Error-safe user functions
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > Ted Yu writes: > > In makeItemLikeRegex : > > > + /* See regexp.c for explanation */ > > + CHECK_FOR_INTERRUPTS(); > > + pg_regerror(re_result, _tmp, errMsg, > > sizeof(errMsg)); > > + ereturn(escontext, false, > > > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call > is > > still necessary. > > Yes, it is. We don't want a query-cancel transformed into a soft error. > > regards, tom lane > Hi, For this case (`invalid regular expression`), the potential user interruption is one reason for stopping execution. I feel surfacing user interruption somehow masks the underlying error. The same regex, without user interruption, would exhibit an `invalid regular expression` error. I think it would be better to surface the error. Cheers
Re: daitch_mokotoff module
Dag Lem writes: > Alvaro Herrera writes: > >> On 2022-Dec-23, Alvaro Herrera wrote: >> > > [...] > >> I tried downloading a list of surnames from here >> https://www.bibliotecadenombres.com/apellidos/apellidos-espanoles/ >> pasted that in a text file and \copy'ed it into a table. Then I ran >> this query >> >> select string_agg(a, ' ' order by a), daitch_mokotoff(a), count(*) >> from apellidos >> group by daitch_mokotoff(a) >> order by count(*) desc; >> >> so I have a first entry like this >> >> string_agg │ Balasco Balles Belasco Belles Blas Blasco Fallas Feliz >> Palos Pelaez Plaza Valles Vallez Velasco Velez Veliz Veloz Villas >> daitch_mokotoff │ 784000 >> count │ 18 >> >> but then I have a bunch of other entries with the same code 784000 as >> alternative codes, >> >> string_agg │ Velazco >> daitch_mokotoff │ 784500 784000 >> count │ 1 >> >> string_agg │ Palacio >> daitch_mokotoff │ 785000 784000 >> count │ 1 >> >> I suppose I need to group these together somehow, and it would make more >> sense to do that if the values were arrays. >> >> >> If I scroll a bit further down and choose, say, 794000 (a relatively >> popular one), then I have this >> >> string_agg │ Barraza Barrios Barros Bras Ferraz Frias Frisco Parras >> Peraza Peres Perez Porras Varas Veras >> daitch_mokotoff │ 794000 >> count │ 14 >> >> and looking for that code in the result I also get these three >> >> string_agg │ Barca Barco Parco >> daitch_mokotoff │ 795000 794000 >> count │ 3 >> >> string_agg │ Borja >> daitch_mokotoff │ 79 794000 >> count │ 1 >> >> string_agg │ Borjas >> daitch_mokotoff │ 794000 794400 >> count │ 1 >> >> and then I see that I should also search for possible matches in codes >> 795000, 79 and 794400, so that gives me >> >> string_agg │ Baria Baro Barrio Barro Berra Borra Feria Para Parra >> Perea Vera >> daitch_mokotoff │ 79 >> count │ 11 >> >> string_agg │ Barriga Borge Borrego Burgo Fraga >> daitch_mokotoff │ 795000 >> count │ 5 >> >> string_agg │ Borjas >> daitch_mokotoff │ 794000 794400 >> count │ 1 >> >> which look closely related (compare "Veras" in the first to "Vera" in >> the later set. If you ignore that pseudo-match, you're likely to miss >> possible family relationships.) >> >> >> I suppose if I were a genealogy researcher, I would be helped by having >> each of these codes behave as a separate unit, rather than me having to >> split the string into the several possible contained values. > > It seems to me like you're trying to use soundex coding for something it > was never designed for. > > As stated in my previous mail, soundex algorithms are designed to index > names on some representation of sound, so that alike sounding names with > different spellings will match, and as shown in the documentation > example, that is exactly what the implementation facilitates. > > Daitch-Mokotoff Soundex indexes alternative sounds for the same name, > however if I understand correctly, you want to index names by single > sounds, linking all alike sounding names to the same soundex code. I > fail to see how that is useful - if you want to find matches for a name, > you simply match against all indexed names. If you only consider one > sound, you won't find all names that match. > > In any case, as explained in the documentation, the implementation is > intended to be a companion to Full Text Search, thus text is the natural > representation for the soundex codes. > > BTW Vera 79 does not match Veras 794000, because they don't sound > the same (up to the maximum soundex code length). > I've been sleeping on this, and perhaps the normal use case can just as well (or better) be covered by the "@>" array operator? I originally implemented similar functionality using another soundex algorithm more than a decade ago, and either arrays couldn't be GIN indexed back then, or I simply missed it. I'll have to get back to this - now it's Christmas! Merry Christmas! Best regards, Dag Lem