Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-24 Sa 10:42, Tom Lane wrote:
> Andrew Dunstan  writes:
>> As I was giving this a final polish I noticed this in jspConvertRegexFlags:
>>     /*
>>      * We'll never need sub-match details at execution.  While
>>      * RE_compile_and_execute would set this flag anyway, force it on here to
>>      * ensure that the regex cache entries created by makeItemLikeRegex are
>>      * useful.
>>      */
>>     cflags |= REG_NOSUB;
>> Clearly the comment would no longer be true. I guess I should just
>> remove this?
> Yeah, we can just drop that I guess.  I'm slightly worried that we might
> need it again after some future refactoring; but it's not really worth
> devising a re-worded comment to justify keeping it.
>
> Also, I realized that I failed in my reviewerly duty by not noticing
> that you'd forgotten to pg_regfree the regex after successful
> compilation.  Running something like this exposes the memory leak
> very quickly:
>
> select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 
> 'jsonpath')
>   from generate_series(1,1000);
>
> The attached delta patch takes care of it.  (Per comment at pg_regcomp,
> we don't need this after a failure return.)
>
>   


Thanks, pushed with those changes.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: ARRNELEMS Out-of-bounds possible errors

2022-12-24 Thread Tom Lane
Nikita Malakhov  writes:
> Even with null context it does not turn to ereport, and returns dummy value

Read the code.  ArrayGetNItems passes NULL for escontext, therefore
if there's a problem the ereturn calls in ArrayGetNItemsSafe will
throw error, *not* return -1.

Not sure how we could persuade Coverity of that, though,
if it fails to understand that for itself.

regards, tom lane




Re: Error-safe user functions

2022-12-24 Thread Tom Lane
Ted Yu  writes:
> On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
>> Yes, it is.  We don't want a query-cancel transformed into a soft error.

> The same regex, without user interruption, would exhibit an `invalid
> regular expression` error.

On what grounds do you claim that?  The timing of arrival of the SIGINT
is basically chance --- it might happen while we're inside backend/regex,
or not.  I mean, sure you could claim that a bad regex might run a long
time and thereby be more likely to cause the user to issue a query
cancel, but that's a stretched line of reasoning.

regards, tom lane




Re: Error-safe user functions

2022-12-24 Thread Tom Lane
Andrew Dunstan  writes:
> As I was giving this a final polish I noticed this in jspConvertRegexFlags:

>     /*
>      * We'll never need sub-match details at execution.  While
>      * RE_compile_and_execute would set this flag anyway, force it on here to
>      * ensure that the regex cache entries created by makeItemLikeRegex are
>      * useful.
>      */
>     cflags |= REG_NOSUB;

> Clearly the comment would no longer be true. I guess I should just
> remove this?

Yeah, we can just drop that I guess.  I'm slightly worried that we might
need it again after some future refactoring; but it's not really worth
devising a re-worded comment to justify keeping it.

Also, I realized that I failed in my reviewerly duty by not noticing
that you'd forgotten to pg_regfree the regex after successful
compilation.  Running something like this exposes the memory leak
very quickly:

select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 
'jsonpath')
  from generate_series(1,1000);

The attached delta patch takes care of it.  (Per comment at pg_regcomp,
we don't need this after a failure return.)

regards, tom lane

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 8c3a0c7623..30179408f5 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -560,6 +560,8 @@ makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *pattern,
 	(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 	 errmsg("invalid regular expression: %s", errMsg)));
 		}
+
+		pg_regfree(_tmp);
 	}
 
 	*result = v;


Re: ARRNELEMS Out-of-bounds possible errors

2022-12-24 Thread Nikita Malakhov
Hi,

Even with null context it does not turn to ereport, and returns dummy value
-

#define errsave_domain(context, domain, ...) \
do { \
struct Node *context_ = (context); \
pg_prevent_errno_in_scope(); \
if (errsave_start(context_, domain)) \
__VA_ARGS__, errsave_finish(context_, __FILE__, __LINE__, __func__); \
} while(0)

#define errsave(context, ...) \
errsave_domain(context, TEXTDOMAIN, __VA_ARGS__)

/*
 * "ereturn(context, dummy_value, ...);" is exactly the same as
 * "errsave(context, ...); return dummy_value;".  This saves a bit
 * of typing in the common case where a function has no cleanup
 * actions to take after reporting a soft error.  "dummy_value"
 * can be empty if the function returns void.
 */
#define ereturn_domain(context, dummy_value, domain, ...) \
do { \
errsave_domain(context, domain, __VA_ARGS__); \
return dummy_value; \
} while(0)

#define ereturn(context, dummy_value, ...) \
ereturn_domain(context, dummy_value, TEXTDOMAIN, __VA_ARGS__)


On Fri, Dec 23, 2022 at 11:40 AM Kyotaro Horiguchi 
wrote:

> At Fri, 23 Dec 2022 17:37:55 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > At Thu, 22 Dec 2022 12:35:58 -0300, Ranier Vilela 
> wrote in
> > > Hi.
> > >
> > > Per Coverity.
> > >
> > > The commit ccff2d2
> > > <
> https://github.com/postgres/postgres/commit/ccff2d20ed9622815df2a7deffce8a7b14830965
> >,
> > > changed the behavior function ArrayGetNItems,
> > > with the introduction of the function ArrayGetNItemsSafe.
> > >
> > > Now ArrayGetNItems may return -1, according to the comment.
> > > " instead of throwing an exception. -1 is returned after an error."
> >
> > If I'm reading the code correctly, it's the definition of
> > ArrayGetNItems*Safe*. ArrayGetNItems() calls that function with a NULL
> > escontext and the NULL turns ereturn() into ereport().
>
> > That doesn't seem to be changed by the commit.
>
> No.. It seems to me that the commit didn't change its behavior in that
> regard.
>
> > Of course teaching Coverity not to issue the false warnings would be
> > another actual issue that we should do, maybe.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Error-safe user functions

2022-12-24 Thread Ted Yu
On Sat, Dec 24, 2022 at 4:38 AM Andrew Dunstan  wrote:

>
> On 2022-12-24 Sa 04:51, Ted Yu wrote:
> >
> >
> > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
> >
> > Ted Yu  writes:
> > > In makeItemLikeRegex :
> >
> > > +   /* See regexp.c for explanation */
> > > +   CHECK_FOR_INTERRUPTS();
> > > +   pg_regerror(re_result, _tmp, errMsg,
> > > sizeof(errMsg));
> > > +   ereturn(escontext, false,
> >
> > > Since an error is returned, I wonder if the
> > `CHECK_FOR_INTERRUPTS` call is
> > > still necessary.
> >
> > Yes, it is.  We don't want a query-cancel transformed into a soft
> > error.
> >
> > regards, tom lane
> >
> > Hi,
> > For this case (`invalid regular expression`), the potential user
> > interruption is one reason for stopping execution.
> > I feel surfacing user interruption somehow masks the underlying error.
> >
> > The same regex, without user interruption, would exhibit an `invalid
> > regular expression` error.
> > I think it would be better to surface the error.
> >
> >
>
> All that this patch is doing is replacing a call to
> RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar
> code, which gives us the opportunity to call ereturn instead of ereport.
> Note that where escontext is NULL (the common case), ereturn functions
> identically to ereport. So unless you want to argue that the logic in
> RE_compile_and_cache is wrong I don't see what we're arguing about. If
> instead I had altered the API of RE_compile_and_cache to include an
> escontext parameter we wouldn't be having this argument at all. The only
> reason I didn't do that was the point Tom quite properly raised about
> why we're doing any caching here anyway.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Andrew:

Thanks for the response.


Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-24 Thread Bharath Rupireddy
On Sat, Dec 24, 2022 at 12:58 AM David Christensen
 wrote:
>
> On Fri, Dec 23, 2022 at 12:57 PM David Christensen
>  wrote:
> >
> > On Wed, Dec 21, 2022 at 5:47 AM Bharath Rupireddy
> >  wrote:
>
> > > 2. +$node->init(extra => ['-k'], allows_streaming => 1);
> > > When enabled with allows_streaming, there are a bunch of things that
> > > happen to the node while initializing, I don't think we need all of
> > > them for this.
> >
> > I think the "allows_streaming" was required to ensure the WAL files
> > were preserved properly, and was the approach we ended up taking
> > rather than trying to fail the archive_command or other approaches I'd
> > taken earlier. I'd rather keep this if we can, unless you can propose
> > a different approach that would continue to work in the same way.
>
> Confirmed that we needed this in order to create the replication slot,
> so this /is/ required for the test to work.

The added test needs wal_level to be replica, but the TAP tests set it
to minimal if allows_streaming isn't passed. However, if passed
allows_streaming, it sets a bunch of other parameters which are not
required for this test (see note->init function in cluster.pm), hence
we could just set the required parameters wal_level = replica and
max_wal_senders for the replication slot to be created.

> Enclosing v11 with yours and Michael's latest feedback.

Thanks for the patch. I've made the above change as well as renamed
the test file name to be save_fpi.pl, everything else remains the same
as v11. Here's the v12 patch which LGTM. I'll mark it as RfC -
https://commitfest.postgresql.org/41/3628/.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v12-0001-Teach-pg_waldump-to-extract-FPIs-from-the-WAL-st.patch
Description: Binary data


[PoC] Implementation of distinct in Window Aggregates

2022-12-24 Thread Ankit Pandey
Hi,

This is a PoC patch which implements distinct operation in window
aggregates (without order by and for single column aggregation, final
version may vary wrt these limitations). Purpose of this PoC is to get
feedback on the approach used and corresponding implementation, any
nitpicking as deemed reasonable.

Distinct operation is mirrored from implementation in nodeAgg. Existing
partitioning logic determines if row is in partition and when distinct is
required, all tuples for the aggregate column are stored in tuplesort. When
finalize_windowaggregate gets called, tuples are sorted and duplicates are
removed, followed by calling the transition function on each tuple.
When distinct is not required, the above process is skipped and the
transition function gets called directly and nothing gets inserted into
tuplesort.
Note: For each partition, in tuplesort_begin and tuplesort_end is involved
to rinse tuplesort, so at any time, max tuples in tuplesort is equal to
tuples in a particular partition.

I have verified it for interger and interval column aggregates (to rule out
obvious issues related to data types).

Sample cases:

create table mytable(id int, name text);
insert into mytable values(1, 'A');
insert into mytable values(1, 'A');
insert into mytable values(5, 'B');
insert into mytable values(3, 'A');
insert into mytable values(1, 'A');

select avg(distinct id) over (partition by name) from mytable;
avg

2.
2.
2.
2.
5.

select avg(id) over (partition by name) from mytable;
avg

 1.5000
 1.5000
 1.5000
 1.5000
 5.

select avg(distinct id) over () from mytable;
avg

 3.
 3.
 3.
 3.
 3.

select avg(distinct id)  from mytable;
avg

 3.

This is my first-time contribution. Please let me know if anything can be
improved as I`m eager to learn.

Regards,
Ankit Kumar Pandey
From 2a3061a95988c39f4654accc06205099713af6cc Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey 
Date: Wed, 23 Nov 2022 00:38:01 +0530
Subject: [PATCH] Implement distinct in Window Aggregates.

---
 src/backend/executor/nodeWindowAgg.c | 229 +++
 src/backend/optimizer/util/clauses.c |   2 +
 src/backend/parser/parse_agg.c   |  45 ++
 src/backend/parser/parse_func.c  |  19 +--
 src/include/nodes/execnodes.h|   1 +
 src/include/nodes/primnodes.h|   2 +
 6 files changed, 258 insertions(+), 40 deletions(-)

diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 4f4aeb2883..1d67ba2c39 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -154,6 +154,14 @@ typedef struct WindowStatePerAggData
 
 	int64		transValueCount;	/* number of currently-aggregated rows */
 
+	/* For DISTINCT in Aggregates */
+	Datum		lastdatum;		/* used for single-column DISTINCT */
+	FmgrInfo	equalfnOne; /* single-column comparisons*/
+
+	Oid			*eq_ops;  /* used for equality check in DISTINCT */
+	Oid			*sort_ops; /* used for sorting distinct columns */
+	bool 		sort_in; /* FLAG set true if data is stored in tuplesort */
+
 	/* Data local to eval_windowaggregates() */
 	bool		restart;		/* need to restart this agg in this cycle? */
 } WindowStatePerAggData;
@@ -163,7 +171,7 @@ static void initialize_windowaggregate(WindowAggState *winstate,
 	   WindowStatePerAgg peraggstate);
 static void advance_windowaggregate(WindowAggState *winstate,
 	WindowStatePerFunc perfuncstate,
-	WindowStatePerAgg peraggstate);
+	WindowStatePerAgg peraggstate, Datum value, bool isNull);
 static bool advance_windowaggregate_base(WindowAggState *winstate,
 		 WindowStatePerFunc perfuncstate,
 		 WindowStatePerAgg peraggstate);
@@ -173,6 +181,9 @@ static void finalize_windowaggregate(WindowAggState *winstate,
 	 Datum *result, bool *isnull);
 
 static void eval_windowaggregates(WindowAggState *winstate);
+static void process_ordered_windowaggregate_single(WindowAggState *winstate, 
+			 WindowStatePerFunc perfuncstate,
+ 			 WindowStatePerAgg peraggstate);
 static void eval_windowfunction(WindowAggState *winstate,
 WindowStatePerFunc perfuncstate,
 Datum *result, bool *isnull);
@@ -230,6 +241,7 @@ initialize_windowaggregate(WindowAggState *winstate,
 	peraggstate->transValueIsNull = peraggstate->initValueIsNull;
 	peraggstate->transValueCount = 0;
 	peraggstate->resultValue = (Datum) 0;
+	peraggstate->lastdatum = (Datum) 0;
 	peraggstate->resultValueIsNull = true;
 }
 
@@ -240,43 +252,21 @@ initialize_windowaggregate(WindowAggState *winstate,
 static void
 advance_windowaggregate(WindowAggState *winstate,
 		WindowStatePerFunc 

Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-23 Fr 13:53, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-22 Th 11:44, Tom Lane wrote:
>>> (I wonder why this is using RE_compile_and_cache at all, really,
>>> rather than some other API.  There doesn't seem to be value in
>>> forcing the regex into the cache at this point.)
>> I agree. The attached uses pg_regcomp instead. I had a lift a couple of
>> lines from regexp.c, but not too many.
> LGTM.  No further comments.
>
>   


As I was giving this a final polish I noticed this in jspConvertRegexFlags:


    /*
     * We'll never need sub-match details at execution.  While
     * RE_compile_and_execute would set this flag anyway, force it on
here to
     * ensure that the regex cache entries created by makeItemLikeRegex are
     * useful.
     */
    cflags |= REG_NOSUB;


Clearly the comment would no longer be true. I guess I should just
remove this?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-24 Sa 04:51, Ted Yu wrote:
>
>
> On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
>
> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +                       /* See regexp.c for explanation */
> > +                       CHECK_FOR_INTERRUPTS();
> > +                       pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +                       ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the
> `CHECK_FOR_INTERRUPTS` call is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft
> error.
>
>                         regards, tom lane
>
> Hi,
> For this case (`invalid regular expression`), the potential user
> interruption is one reason for stopping execution.
> I feel surfacing user interruption somehow masks the underlying error.
>
> The same regex, without user interruption, would exhibit an `invalid
> regular expression` error.
> I think it would be better to surface the error.
>
>

All that this patch is doing is replacing a call to
RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar
code, which gives us the opportunity to call ereturn instead of ereport.
Note that where escontext is NULL (the common case), ereturn functions
identically to ereport. So unless you want to argue that the logic in
RE_compile_and_cache is wrong I don't see what we're arguing about. If
instead I had altered the API of RE_compile_and_cache to include an
escontext parameter we wouldn't be having this argument at all. The only
reason I didn't do that was the point Tom quite properly raised about
why we're doing any caching here anyway.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Force streaming every change in logical decoding

2022-12-24 Thread Dilip Kumar
On Sat, Dec 24, 2022 at 8:56 AM Amit Kapila  wrote:
>
> > >
> > > OK, I removed the modification in 022_twophase_cascade.pl and combine the 
> > > two patches.
> >
> > Thank you for updating the patch. The v6 patch looks good to me.
> >
>
> LGTM as well. I'll push this on Monday.
>

The patch looks good to me.



--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-24 Thread Ted Yu
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:

> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +   /* See regexp.c for explanation */
> > +   CHECK_FOR_INTERRUPTS();
> > +   pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +   ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call
> is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft error.
>
> regards, tom lane
>
Hi,
For this case (`invalid regular expression`), the potential user
interruption is one reason for stopping execution.
I feel surfacing user interruption somehow masks the underlying error.

The same regex, without user interruption, would exhibit an `invalid
regular expression` error.
I think it would be better to surface the error.

Cheers


Re: daitch_mokotoff module

2022-12-24 Thread Dag Lem
Dag Lem  writes:

> Alvaro Herrera  writes:
>
>> On 2022-Dec-23, Alvaro Herrera wrote:
>>
>
> [...]
>
>> I tried downloading a list of surnames from here
>> https://www.bibliotecadenombres.com/apellidos/apellidos-espanoles/
>> pasted that in a text file and \copy'ed it into a table.  Then I ran
>> this query
>>
>> select string_agg(a, ' ' order by a), daitch_mokotoff(a), count(*)
>> from apellidos
>> group by daitch_mokotoff(a)
>> order by count(*) desc;
>>
>> so I have a first entry like this
>>
>> string_agg │ Balasco Balles Belasco Belles Blas Blasco Fallas Feliz
>> Palos Pelaez Plaza Valles Vallez Velasco Velez Veliz Veloz Villas
>> daitch_mokotoff │ 784000
>> count   │ 18
>>
>> but then I have a bunch of other entries with the same code 784000 as
>> alternative codes,
>>
>> string_agg  │ Velazco
>> daitch_mokotoff │ 784500 784000
>> count   │ 1
>>
>> string_agg  │ Palacio
>> daitch_mokotoff │ 785000 784000
>> count   │ 1
>>
>> I suppose I need to group these together somehow, and it would make more
>> sense to do that if the values were arrays.
>>
>>
>> If I scroll a bit further down and choose, say, 794000 (a relatively
>> popular one), then I have this
>>
>> string_agg │ Barraza Barrios Barros Bras Ferraz Frias Frisco Parras
>> Peraza Peres Perez Porras Varas Veras
>> daitch_mokotoff │ 794000
>> count   │ 14
>>
>> and looking for that code in the result I also get these three
>>
>> string_agg  │ Barca Barco Parco
>> daitch_mokotoff │ 795000 794000
>> count   │ 3
>>
>> string_agg  │ Borja
>> daitch_mokotoff │ 79 794000
>> count   │ 1
>>
>> string_agg  │ Borjas
>> daitch_mokotoff │ 794000 794400
>> count   │ 1
>>
>> and then I see that I should also search for possible matches in codes
>> 795000, 79 and 794400, so that gives me
>>
>> string_agg │ Baria Baro Barrio Barro Berra Borra Feria Para Parra
>> Perea Vera
>> daitch_mokotoff │ 79
>> count   │ 11
>>
>> string_agg  │ Barriga Borge Borrego Burgo Fraga
>> daitch_mokotoff │ 795000
>> count   │ 5
>>
>> string_agg  │ Borjas
>> daitch_mokotoff │ 794000 794400
>> count   │ 1
>>
>> which look closely related (compare "Veras" in the first to "Vera" in
>> the later set.  If you ignore that pseudo-match, you're likely to miss
>> possible family relationships.)
>>
>>
>> I suppose if I were a genealogy researcher, I would be helped by having
>> each of these codes behave as a separate unit, rather than me having to
>> split the string into the several possible contained values.
>
> It seems to me like you're trying to use soundex coding for something it
> was never designed for.
>
> As stated in my previous mail, soundex algorithms are designed to index
> names on some representation of sound, so that alike sounding names with
> different spellings will match, and as shown in the documentation
> example, that is exactly what the implementation facilitates.
>
> Daitch-Mokotoff Soundex indexes alternative sounds for the same name,
> however if I understand correctly, you want to index names by single
> sounds, linking all alike sounding names to the same soundex code. I
> fail to see how that is useful - if you want to find matches for a name,
> you simply match against all indexed names. If you only consider one
> sound, you won't find all names that match.
>
> In any case, as explained in the documentation, the implementation is
> intended to be a companion to Full Text Search, thus text is the natural
> representation for the soundex codes.
>
> BTW Vera 79 does not match Veras 794000, because they don't sound
> the same (up to the maximum soundex code length).
>

I've been sleeping on this, and perhaps the normal use case can just as
well (or better) be covered by the "@>" array operator? I originally
implemented similar functionality using another soundex algorithm more
than a decade ago, and either arrays couldn't be GIN indexed back then,
or I simply missed it. I'll have to get back to this - now it's
Christmas!

Merry Christmas!

Best regards,

Dag Lem