Re: remaining sql/json patches

2023-10-01 Thread Amit Langote
On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  wrote:
> Pushed this 30 min ago (no email on -committers yet!) and am looking
> at the following llvm crash reported by buildfarm animal pogona [1]:
>
> #4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
> "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
> assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
> FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
> with a bad signature!\\"", file=file@entry=0x7f5bc1336051
> "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
> line=line@entry=299, function=function@entry=0x7f5bc13362af "void
> llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
> ArrayRef, ArrayRef, const
> llvm::Twine &)") at ./assert/assert.c:92
> #5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
> >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
> && \\"Calling a function with a bad signature!\\"",
> file=0x7f5bc1336051
> "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
> function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
> *, llvm::Value *, ArrayRef,
> ArrayRef, const llvm::Twine &)") at
> ./assert/assert.c:101
> #6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
> FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
> NameStr=...) at
> /home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
> #7  0x7f5bc0fa579d in llvm::CallInst::CallInst
> (this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
> Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
> #8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
> Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
> InsertBefore=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
> #9  0x7f5bc0fa51f9 in llvm::IRBuilder llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
> FPMathTag=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
> #10 0x7f5bc100edda in llvm::IRBuilder llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
> #11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
> Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
> "funccall_iocoerce_in_safe") at
> /home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
> #12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
>
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
>
> This seems to me to be complaining about the following addition:
>
> +   {
> +   Oid ioparam = op->d.iocoerce.typioparam;
> +   LLVMValueRef v_params[6];
> +   LLVMValueRef v_success;
> +
> +   v_params[0] = l_ptr_const(op->d.iocoerce.finfo_in,
> + l_ptr(StructFmgrInfo));
> +   v_params[1] = v_output;
> +   v_params[2] = l_oid_const(lc, ioparam);
> +   v_params[3] = l_int32_const(lc, -1);
> +   v_params[4] =
l_ptr_const(op->d.iocoerce.escontext,
> +
> l_ptr(StructErrorSaveContext));
>
> -   LLVMBuildStore(b, v_retval, v_resvaluep);
> +   /*
> +* InputFunctionCallSafe() will write directly
into
> +* *op->resvalue.
> +*/
> +   v_params[5] = v_resvaluep;
> +
> +   v_success = LLVMBuildCall(b, llvm_pg_func(mod,
> "InputFunctionCallSafe"),
> + v_params,
lengthof(v_params),
> +
 "funccall_iocoerce_in_safe");
> +
> +   /*
> +* Return null if InputFunctionCallSafe()
encountered
> +* an error.
> +*/
> +   v_resnullp = LLVMBuildICmp(b, LLVMIntEQ,
v_success,
> +  l_sbool_const(0), "");
> +   }

Although most animals except pogona looked fine, I've decided to revert the
patch for now.

IIUC, LLVM is complaining that the code in the above block is not passing
the arguments of InputFunctionCallSafe() using the correct types.  I'm not
exactly sure which particular argument is not handled correctly in the
above code, but perhaps it's:


+   v_params[1] = v_output;

which maps to char *str argument of InputFunctionCallSafe().  v_output is
set in the code preceding the above block as follows:

/* and call output function (can 

Re: remaining sql/json patches

2023-10-01 Thread Amit Langote
On Fri, Sep 29, 2023 at 1:57 PM Amit Langote  wrote:
> On Thu, Sep 28, 2023 at 8:04 PM Alvaro Herrera  
> wrote:
> > On 2023-Sep-27, Amit Langote wrote:
> > > Maybe the following is better:
> > >
> > > +   /*
> > > +* For expression nodes that support soft errors.  Should be set to 
> > > NULL
> > > +* before calling ExecInitExprRec() if the caller wants errors thrown.
> > > +*/
> > >
> > > ...as in the attached.
> >
> > That's good.
> >
> > > Alvaro, do you think your concern regarding escontext not being in the
> > > right spot in the ExprState struct is addressed?  It doesn't seem very
> > > critical to me to place it in the struct's 1st cacheline, because
> > > escontext is not accessed in performance critical paths such as during
> > > expression evaluation, especially with the latest version.  (It would
> > > get accessed during evaluation with previous versions.)
> > >
> > > If so, I'd like to move ahead with committing it.
> >
> > Yeah, looks OK to me in v21.
>
> Thanks.  I will push the attached 0001 shortly.

Pushed this 30 min ago (no email on -committers yet!) and am looking
at the following llvm crash reported by buildfarm animal pogona [1]:

#0  __pthread_kill_implementation (threadid=,
signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0  __pthread_kill_implementation (threadid=,
signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
#1  0x7f5bcebcb15f in __pthread_kill_internal (signo=6,
threadid=) at ./nptl/pthread_kill.c:78
#2  0x7f5bceb7d472 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x7f5bceb674b2 in __GI_abort () at ./stdlib/abort.c:79
#4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
"%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
with a bad signature!\\"", file=file@entry=0x7f5bc1336051
"/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
line=line@entry=299, function=function@entry=0x7f5bc13362af "void
llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
ArrayRef, ArrayRef, const
llvm::Twine &)") at ./assert/assert.c:92
#5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
>= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
&& \\"Calling a function with a bad signature!\\"",
file=0x7f5bc1336051
"/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
*, llvm::Value *, ArrayRef,
ArrayRef, const llvm::Twine &)") at
./assert/assert.c:101
#6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
NameStr=...) at
/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
#7  0x7f5bc0fa579d in llvm::CallInst::CallInst
(this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
#8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
InsertBefore=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
#9  0x7f5bc0fa51f9 in llvm::IRBuilder::CreateCall (this=0x557a91f9c6a0,
FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
FPMathTag=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
#10 0x7f5bc100edda in llvm::IRBuilder::CreateCall (this=0x557a91f9c6a0,
Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
#11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
"funccall_iocoerce_in_safe") at
/home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
#12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
#13 0x557a915992db in jit_compile_expr
(state=state@entry=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/jit.c:177
#14 0x557a9123071d in ExecReadyExpr
(state=state@entry=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/executor/execExpr.c:880
#15 0x557a912340d7 in ExecBuildProjectionInfo
(targetList=0x557a91fa6b58, econtext=, slot=, parent=parent@entry=0x557a91f430a8,
inputDesc=inputDesc@entry=0x0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/executor/execExpr.c:484
#16 0x557a9124e61e in ExecAssignProjectionInfo
(planstate=planstate@entry=0x557a91f430a8,
inputDesc=inputDesc@entry=0x0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/executor/execUtils.c:547
#17 

Re: Regression in COPY FROM caused by 9f8377f7a2

2023-10-01 Thread Laurenz Albe
On Sun, 2023-10-01 at 10:55 -0400, Andrew Dunstan wrote:
> Thanks, pushed.

Thanks for taking care of that.

Yours,
Laurenz Albe




Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-10-01 Thread Gurjeet Singh
On Tue, Sep 26, 2023 at 4:02 PM Bruce Momjian  wrote:
>
> Patch applied back to PG 11.

+Peter E. since I received the following automated note:

> Closed in commitfest 2023-09 with status: Moved to next CF (petere)

Just a note that this patch has been committed (3fea854691), so I have
marked the CF item [1] as 'Committed', and specified Bruce as the
committer.

[1]: https://commitfest.postgresql.org/45/4333/


Best regards,
Gurjeet
http://Gurje.et




Re: document the need to analyze partitioned tables

2023-10-01 Thread Laurenz Albe
On Fri, 2023-09-29 at 22:34 -0400, Bruce Momjian wrote:
> Very good point!   Updated patch attached.

Thanks!  Some small corrections:

> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
> index 9cf9d030a8..be1c522575 100644
> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -861,10 +861,16 @@ analyze threshold = analyze base threshold + analyze 
> scale factor * number of tu
> 
>  
> 
> -Partitioned tables are not processed by autovacuum.  Statistics
> -should be collected by running a manual ANALYZE when 
> it is
> -first populated, and again whenever the distribution of data in its
> -partitions changes significantly.
> +Partitioned tables do not directly store tuples and consequently
> +autovacuum does not VACUUM them.  (Autovacuum does

... does not VACUUM or ANALYZE them.

Perhaps it would be shorter to say "does not process them" like the
original wording.

> +perform VACUUM on table partitions just like other

Just like *on* other tables, right?

> +tables.)  Unfortunately, this also means that autovacuum doesn't
> +run ANALYZE on partitioned tables, and this
> +can cause suboptimal plans for queries that reference partitioned
> +table statistics.  You can work around this problem by manually
> +running ANALYZE on partitioned tables when they
> +are first populated, and again whenever the distribution of data in
> +their partitions changes significantly.
> 
>  
> 

Yours,
Laurenz Albe




Re: pgstatindex vs. !indisready

2023-10-01 Thread Noah Misch
On Mon, Oct 02, 2023 at 09:24:33AM +0900, Michael Paquier wrote:
> On Sun, Oct 01, 2023 at 04:20:42PM -0700, Peter Geoghegan wrote:
> > On Sun, Oct 1, 2023 at 2:00 PM Noah Misch  wrote:
> >> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
> >> gin_clean_pending_list] currently succeed.  I propose to make them emit a
> >> DEBUG1 message and return early, like amcheck does, except on !indisready.
> >> This would allow users to continue running them on all indexes of the
> >> applicable access method.  Doing these operations on an
> >> indisready&&!indisvalid index is entirely reasonable, since they relate to
> >> INSERT/UPDATE/DELETE operations.
> 
> Hmm.  Still slightly incorrect in some cases?  Before being switched
> to indisvalid, an index built concurrently may miss some tuples
> deleted before the reference snapshot used to build the index was
> taken.

The !indisvalid index may be missing tuples, yes.  In what way does that make
one of those four operations incorrect?




Re: pgstatindex vs. !indisready

2023-10-01 Thread Peter Geoghegan
On Sun, Oct 1, 2023 at 5:24 PM Michael Paquier  wrote:
> > FWIW, amcheck's handling of unlogged relations when in hot standby
> > mode is based on the following theory: if recovery were to end, the
> > index would become an empty index, so just treat it as if it was
> > already empty during recovery. Not everybody thought that this
> > behavior was ideal, but ISTM that it has fewer problems than any
> > alternative approach you can think of. The same argument works just as
> > well with any function that accepts a regclass argument IMV.
>
> It depends, I guess, on how "user-friendly" all that should be.  I
> have seen in the past as argument that it may be sometimes better to
> have a function do nothing rather than ERROR when these are used
> across a scan of pg_class, for example, particularly for non-SRFs.
> Even if sometimes errors can be bypassed with more quals on the
> relkind or such (aka less complicated queries with less JOINs to
> write).

I think of recovery as a property of the whole system. So throwing an
error about one particular unlogged index that the user (say) checked
during recovery doesn't seem sensible. After all, the answer that
RecoveryInProgress() gives can change in a way that's observable
within individual transactions.

Again, I wouldn't claim that this is very elegant. Just that it seems
to have the fewest problems.

-- 
Peter Geoghegan




Re: pgstatindex vs. !indisready

2023-10-01 Thread Michael Paquier
On Sun, Oct 01, 2023 at 04:20:42PM -0700, Peter Geoghegan wrote:
> On Sun, Oct 1, 2023 at 2:00 PM Noah Misch  wrote:
> > Okay.  If no other preferences appear, I'll do pgstatindex that way.
> 
> Sounds reasonable.
> 
>> [pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
>> pgstatindex, they should ERROR, including in the back branches.
> 
> Makes sense.

These are already restrictive, makes sense.

>> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
>> gin_clean_pending_list] currently succeed.  I propose to make them emit a
>> DEBUG1 message and return early, like amcheck does, except on !indisready.
>> This would allow users to continue running them on all indexes of the
>> applicable access method.  Doing these operations on an
>> indisready&&!indisvalid index is entirely reasonable, since they relate to
>> INSERT/UPDATE/DELETE operations.

Hmm.  Still slightly incorrect in some cases?  Before being switched
to indisvalid, an index built concurrently may miss some tuples
deleted before the reference snapshot used to build the index was
taken.

> +1 to all that (including the part about these operations being a
> little different to the amcheck functions in one particular respect).

Making them return early sounds sensible here.

> FWIW, amcheck's handling of unlogged relations when in hot standby
> mode is based on the following theory: if recovery were to end, the
> index would become an empty index, so just treat it as if it was
> already empty during recovery. Not everybody thought that this
> behavior was ideal, but ISTM that it has fewer problems than any
> alternative approach you can think of. The same argument works just as
> well with any function that accepts a regclass argument IMV.

It depends, I guess, on how "user-friendly" all that should be.  I
have seen in the past as argument that it may be sometimes better to
have a function do nothing rather than ERROR when these are used
across a scan of pg_class, for example, particularly for non-SRFs.
Even if sometimes errors can be bypassed with more quals on the
relkind or such (aka less complicated queries with less JOINs to
write).
--
Michael


signature.asc
Description: PGP signature


Re: Making the subquery alias optional in the FROM clause

2023-10-01 Thread Tom Lane
Erwin Brandstetter  writes:
> On Mon, 2 Oct 2023 at 00:33, Dean Rasheed  wrote:
>> The only place that exposes the eref's made-up relation name is the
>> existing query deparsing code in ruleutils.c, which uniquifies it and
>> generates SQL spec-compliant output. For example:

> I ran into one other place: error messages.
> SELECT unnamed_subquery.a
> FROM (SELECT 1 AS a)
> ERROR:  There is an entry for table "unnamed_subquery", but it cannot be
> referenced from this part of the query.invalid reference to FROM-clause
> entry for table "unnamed_subquery"

Yeah, that's exposing more of the implementation than we really want.

> Notably, the same does not happen for "unnamed_subquery_1":
> SELECT unnamed_subquery_1.a
> FROM (SELECT 1 AS a), (SELECT 1 AS a)
> ERROR:  missing FROM-clause entry for table "unnamed_subquery_1"

Actually, that happens because "unnamed_subquery_1" *isn't* in the
parse tree.  As implemented, both RTEs are labeled "unnamed_subquery"
in the parser output, and it's ruleutils that de-duplicates them.

I'm inclined to think we should avoid letting "unnamed_subquery"
appear in the parse tree, too.  It might not be a good idea to
try to leave the eref field null, but could we set it to an
empty string instead, that is

-   eref = alias ? copyObject(alias) : makeAlias("unnamed_subquery", NIL);
+   eref = alias ? copyObject(alias) : makeAlias("", NIL);

and then let ruleutils replace that with "unnamed_subquery"?  This
would prevent accessing the subquery name in the way Erwin shows,
because we don't let you write an empty identifier in SQL:

regression=# select "".a from (select 1 as a);
ERROR:  zero-length delimited identifier at or near 
LINE 1: select "".a from (select 1 as a);
   ^

However, there might then be some parser error messages that
refer to subquery "", so I'm not sure if this is totally
without surprises either.

regards, tom lane




Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-01 Thread Peter Geoghegan
On Sun, Oct 1, 2023 at 11:46 AM Peter Eisentraut  wrote:
> What is the status of this patch discussion now?  It had been set as
> Ready for Committer for some months.  Do these recent discussions
> invalidate that?  Does it need more discussion?

I don't think that recent discussion invalidated anything. I meant to
follow-up on investigating the extent to which anything could hold up
OldestMXact without also holding up OldestXmin/removable cutoff, but
that doesn't seem essential.

This patch does indeed seem "ready for committer". John?

-- 
Peter Geoghegan




Re: pgstatindex vs. !indisready

2023-10-01 Thread Peter Geoghegan
On Sun, Oct 1, 2023 at 2:00 PM Noah Misch  wrote:
> Okay.  If no other preferences appear, I'll do pgstatindex that way.

Sounds reasonable.

> [pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
> pgstatindex, they should ERROR, including in the back branches.

Makes sense.

> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
> gin_clean_pending_list] currently succeed.  I propose to make them emit a
> DEBUG1 message and return early, like amcheck does, except on !indisready.
> This would allow users to continue running them on all indexes of the
> applicable access method.  Doing these operations on an
> indisready&&!indisvalid index is entirely reasonable, since they relate to
> INSERT/UPDATE/DELETE operations.

+1 to all that (including the part about these operations being a
little different to the amcheck functions in one particular respect).

FWIW, amcheck's handling of unlogged relations when in hot standby
mode is based on the following theory: if recovery were to end, the
index would become an empty index, so just treat it as if it was
already empty during recovery. Not everybody thought that this
behavior was ideal, but ISTM that it has fewer problems than any
alternative approach you can think of. The same argument works just as
well with any function that accepts a regclass argument IMV.

--
Peter Geoghegan




Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-10-01 Thread Karl O. Pinc
Version 5, this time with attachments.

Changed word order in a sentence:
v5-0012-Explain-role-management.patch

Added a hyperlink:
v5-0013-Hyperlink-from-CREATE-FUNCTION-reference-page-to-.patch

Added 3 index entries:
v5-0014-Add-index-entries-for-parallel-safety.patch

> On Mon, 25 Sep 2023 23:37:44 -0500
> "Karl O. Pinc"  wrote:
> 
> > > On Mon, 25 Sep 2023 17:55:59 -0500
> > > "Karl O. Pinc"  wrote:
> > > 
> > > > On Mon, 25 Sep 2023 14:14:37 +0200
> > > > Daniel Gustafsson  wrote:  
> > > 
> > > > > Once done you can do "git format-patch origin/master -v 1"
> > > > > which will generate a set of n patches named v1-0001 through
> > > > > v1-000n.  
> 
> > > > I am not particularly confident in the top-line commit
> > > > descriptions.  
> > > 
> > > > The bulk of the commit descriptions are very wordy  
> > > 
> > > > Listing all the attachments here for future discussion:  
v5-0001-Change-section-heading-to-better-reflect-saving-a.patch
v5-0002-Change-section-heading-to-better-describe-referen.patch
v5-0003-Better-section-heading-for-plpgsql-exception-trap.patch
v5-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch
v5-0005-Improve-sentences-in-overview-of-system-configura.patch
v5-0006-Provide-examples-of-listing-all-settings.patch
v5-0007-Cleanup-summary-of-role-powers.patch
v5-0008-Explain-the-difference-between-role-attributes-an.patch
v5-0009-Document-the-oidvector-type.patch
v5-0010-Improve-sentences-about-the-significance-of-the-s.patch
v5-0011-Add-a-sub-section-to-describe-schema-resolution.patch
v5-0012-Explain-role-management.patch
v5-0013-Hyperlink-from-CREATE-FUNCTION-reference-page-to-.patch
v5-0014-Add-index-entries-for-parallel-safety.patch

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
>From 122665c4155698abe88e2bd17639a991791b94e3 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:49:30 -0500
Subject: [PATCH v5 01/14] Change section heading to better reflect saving a
 result in variable(s)

The current section title of "Executing a Command with a Single-Row
Result" does not reflect what the section is really about.  Other
sections make clear how to _execute_ commands, single-row result or not.
What this section is about is how to _save_ a single row of results into
variable(s).

It would be nice to talk about saving results into variables in the
section heading but I couldn't come up with anything pithy.  "Saving a
Single-Row of a Command's Result" seems good enough, especially since
there's few other places to save results other than in variables.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index f55e901c7e..8747e84245 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1126,7 +1126,7 @@ PERFORM create_mv('cs_session_page_requests_mv', my_query);

 

-Executing a Command with a Single-Row Result
+Saving a Single-Row of a Command's Result
 
 
  SELECT INTO
-- 
2.30.2

>From 4de4a31d41124dfa793cc5cce0516673811ea414 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:52:21 -0500
Subject: [PATCH v5 02/14] Change section heading to better describe reference
 of existing types

The section heading of "Copying Types" does not reflect what the
section is about.  It is not about making copies of data types but
about using the data type of existing columns (or rows) in new type
declarations without having to know what the existing type is.

"Re-Using the Type of Columns and Variables" seems adequate.  Getting
something in there about declartions seems too wordy.  I thought
perhaps "Referencing" instead of "Re-Using", but "referencing" isn't
perfect and "re-using" is generic enough, shorter, and simpler to read.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 8747e84245..874578265e 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -672,7 +672,7 @@ DECLARE

 
   
-   Copying Types
+   Re-Using the Type of Columns and Variables
 
 
 variable%TYPE
-- 
2.30.2

>From 80c2b8ef7ad6e610f5c7bdc61b827983a87110e2 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 16:03:29 -0500
Subject: [PATCH v5 03/14] Better section heading for plpgsql exception
 trapping

The docs seem to use "error" and "exception" interchangeably, perhaps
50% each.  But they never say that the are the same thing, and in the
larger world they are not.  Errors tend to be something that drop on
the floor and usually halt execution whereas exceptions can be trapped
and give the programmer more control over the flow of the program.
(Although, to be fair, exceptions are a subset of errors.)  "Trapping
Errors" is not a good section title for these reasons, and because
when it comes to 

Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-10-01 Thread Karl O. Pinc
Version 5

Changed word order in a sentence:
v5-0012-Explain-role-management.patch

Added a hyperlink:
v5-0013-Hyperlink-from-CREATE-FUNCTION-reference-page-to-.patch

Added 3 index entries:
v5-0014-Add-index-entries-for-parallel-safety.patch

> On Mon, 25 Sep 2023 23:37:44 -0500
> "Karl O. Pinc"  wrote:
> 
> > > On Mon, 25 Sep 2023 17:55:59 -0500
> > > "Karl O. Pinc"  wrote:
> > > 
> > > > On Mon, 25 Sep 2023 14:14:37 +0200
> > > > Daniel Gustafsson  wrote:  
> > > 
> > > > > Once done you can do "git format-patch origin/master -v 1"
> > > > > which will generate a set of n patches named v1-0001 through
> > > > > v1-000n.  
> 
> > > > I am not particularly confident in the top-line commit
> > > > descriptions.  
> > > 
> > > > The bulk of the commit descriptions are very wordy  
> > > 
> > > > Listing all the attachments here for future discussion:  
v5-0001-Change-section-heading-to-better-reflect-saving-a.patch
v5-0002-Change-section-heading-to-better-describe-referen.patch
v5-0003-Better-section-heading-for-plpgsql-exception-trap.patch
v5-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch
v5-0005-Improve-sentences-in-overview-of-system-configura.patch
v5-0006-Provide-examples-of-listing-all-settings.patch
v5-0007-Cleanup-summary-of-role-powers.patch
v5-0008-Explain-the-difference-between-role-attributes-an.patch
v5-0009-Document-the-oidvector-type.patch
v5-0010-Improve-sentences-about-the-significance-of-the-s.patch
v5-0011-Add-a-sub-section-to-describe-schema-resolution.patch
v5-0012-Explain-role-management.patch
v5-0013-Hyperlink-from-CREATE-FUNCTION-reference-page-to-.patch
v5-0014-Add-index-entries-for-parallel-safety.patch

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Making the subquery alias optional in the FROM clause

2023-10-01 Thread Erwin Brandstetter
On Mon, 2 Oct 2023 at 00:33, Dean Rasheed  wrote:

> The only place that exposes the eref's made-up relation name is the
> existing query deparsing code in ruleutils.c, which uniquifies it and
> generates SQL spec-compliant output. For example:
>

I ran into one other place: error messages.

SELECT unnamed_subquery.a
FROM (SELECT 1 AS a)

> ERROR:  There is an entry for table "unnamed_subquery", but it cannot be
referenced from this part of the query.invalid reference to FROM-clause
entry for table "unnamed_subquery"

Normally, we would find the cited name somewhere in the query. Confusing.
Notably, the same does not happen for "unnamed_subquery_1":

SELECT unnamed_subquery_1.a
FROM (SELECT 1 AS a), (SELECT 1 AS a)

> ERROR:  missing FROM-clause entry for table "unnamed_subquery_1"

That's the message anybody would expect.
Also makes sense, as "uniquification" only happens in the above quoted
case, and all invisible aliases seem to be "unnamed_subquery" at this
point? But a bit confusing on a different level.

Maybe error messages should not be aware of invisible aliases, and just
complain about "missing FROM-clause entry"?
Not sure whether a fix would be easy, nor whether it would be worth the
effort. Just wanted to document the corner case issue in this thread.

Regards
Erwin


Re: pgstatindex vs. !indisready

2023-10-01 Thread Noah Misch
On Sun, Oct 01, 2023 at 04:37:25PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
> > could not read block 0 in file".  This reproduces it:
> > ...
> > Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, 
> > it's
> > not a good fit for this scenario.  I propose to have pgstatindex fail early 
> > on
> > !indisready indexes.
> 
> +1
> 
> > We could go a step further and also fail on
> > indisready&&!indisvalid indexes, which are complete enough to accept inserts
> > but not complete enough to answer queries.  I don't see a reason to do that,
> > but maybe someone else will.
> 
> Hmm.  Seems like the numbers pgstatindex would produce for a
> not-yet-complete index would be rather irrelevant, even if the case
> doesn't risk any outright problems.  I'd be inclined to be
> conservative and insist on indisvalid being true too.

Okay.  If no other preferences appear, I'll do pgstatindex that way.

> > This made me wonder about functions handling unlogged rels during recovery. 
> >  I
> > used the attached hack to test most regclass-arg functions.

I forgot to test the same battery of functions on !indisready indexes.  I've
now done that, using the attached script.  While I didn't get additional
class-XX errors, more should change:

[pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
pgstatindex, they should ERROR, including in the back branches.

[brin_desummarize_range brin_summarize_new_values brin_summarize_range
gin_clean_pending_list] currently succeed.  I propose to make them emit a
DEBUG1 message and return early, like amcheck does, except on !indisready.
This would allow users to continue running them on all indexes of the
applicable access method.  Doing these operations on an
indisready&&!indisvalid index is entirely reasonable, since they relate to
INSERT/UPDATE/DELETE operations.

[pg_freespace pg_indexes_size pg_prewarm] currently succeed, and I propose to
leave them that way.  No undefined behavior arises.  pg_freespace needs to be
resilient to garbage data anyway, given the lack of WAL for the FSM.  One
could argue for a relkind check in pg_indexes_size.  One could argue for
treating pg_prewarm like amcheck (DEBUG1 and return).
rollback;
begin;
create extension if not exists btree_gin;
create or replace function error(text) returns text language plpgsql as $$
begin
raise exception 'break index build';
return $1;
end;
$$ immutable;
drop table if exists u;
create table u (c text);
insert into u values ('foo');
commit;
create index concurrently ubtree on u (error(c));
create index concurrently ugin on u using gin (error(c));
create index concurrently uhash on u using hash (error(c));
create index concurrently ubrin on u using brin (error(c));

\set utable '''ubtree''::regclass'
\set useq '''ubtree''::regclass'
\set VERBOSITY verbose
begin;
create or replace function error(text) returns text language sql
  as 'select $1' immutable;

SET log_error_verbosity = verbose;
SET log_min_messages = debug1; -- for amcheck
--SET client_min_messages = debug1;

create extension pg_visibility;
create extension amcheck;
create extension pgstattuple;
create extension pg_surgery;
create extension pg_prewarm;
create extension pg_freespacemap;
create extension pgrowlocks;

SELECT * FROM pgrowlocks(:utable::text);
SELECT brin_desummarize_range('ubrin',1);
SELECT brin_summarize_new_values('ubrin');
SELECT brin_summarize_range('ubrin',1);
SELECT bt_index_check('ubtree');
SELECT bt_index_check('ubtree',true);
SELECT bt_index_parent_check('ubtree');
SELECT bt_index_parent_check('ubtree',true);
SELECT bt_index_parent_check('ubtree',true,true);
SELECT gin_clean_pending_list('ugin');
SELECT heap_force_freeze(:utable,array['(0,1)'::tid]);
SELECT heap_force_kill(:utable,array['(0,1)'::tid]);
SELECT nextval(:useq);
SELECT currval(:useq);
SELECT pg_check_frozen(:utable);
SELECT pg_check_visible(:utable);
SELECT pg_column_is_updatable(:utable,'1',true);
--SELECT pg_extension_config_dump(:utable,'select 1');
SELECT pg_freespace(:utable);
SELECT pg_freespace(:utable,1);
SELECT pg_get_replica_identity_index(:utable);
SELECT pg_index_column_has_property(:utable,1,'asc');
SELECT pg_indexes_size(:utable);
SELECT pg_index_has_property('ubrin','asc');
--SELECT pg_nextoid(:utable,name,:utable);
SELECT pg_partition_ancestors(:utable);
SELECT pg_partition_root(:utable);
SELECT pg_partition_tree(:utable);
SELECT pg_prewarm(:utable);
SELECT pg_relation_filenode(:utable);
SELECT pg_relation_filepath(:utable);
SELECT pg_relation_is_publishable(:utable);
SELECT pg_relation_is_updatable(:utable,true);
SELECT pg_relation_size(:utable);
SELECT pg_relation_size(:utable,'main');
SELECT pg_relpages(:utable);
SELECT pg_sequence_last_value(:useq);
SELECT pgstatginindex('ugin');
SELECT pgstathashindex('uhash');
SELECT pgstatindex('ubtree');
SELECT pgstattuple_approx(:utable);
SELECT pgstattuple(:utable);
SELECT 

Re: pgstatindex vs. !indisready

2023-10-01 Thread Tom Lane
Noah Misch  writes:
> Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
> could not read block 0 in file".  This reproduces it:
> ...
> Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, it's
> not a good fit for this scenario.  I propose to have pgstatindex fail early on
> !indisready indexes.

+1

> We could go a step further and also fail on
> indisready&&!indisvalid indexes, which are complete enough to accept inserts
> but not complete enough to answer queries.  I don't see a reason to do that,
> but maybe someone else will.

Hmm.  Seems like the numbers pgstatindex would produce for a
not-yet-complete index would be rather irrelevant, even if the case
doesn't risk any outright problems.  I'd be inclined to be
conservative and insist on indisvalid being true too.

regards, tom lane




pgstatindex vs. !indisready

2023-10-01 Thread Noah Misch
Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
could not read block 0 in file".  This reproduces it:

begin;
drop table if exists not_indisready_t;
create table not_indisready_t (c int);
insert into not_indisready_t values (1),(1);
commit;
create unique index concurrently not_indisready_i on not_indisready_t(c);
begin;
create extension pgstattuple;
\set VERBOSITY verbose
select * from pgstatindex('not_indisready_i');
\set VERBOSITY default
rollback;

Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, it's
not a good fit for this scenario.  I propose to have pgstatindex fail early on
!indisready indexes.  We could go a step further and also fail on
indisready&&!indisvalid indexes, which are complete enough to accept inserts
but not complete enough to answer queries.  I don't see a reason to do that,
but maybe someone else will.

This made me wonder about functions handling unlogged rels during recovery.  I
used the attached hack to test most regclass-arg functions.  While some of
these errors from src/test/recovery/tmp_check/log/001_stream_rep_standby_2.log
may deserve improvement, there were no class-XX messages:

2023-10-01 12:24:05.992 PDT [646914:11] 001_stream_rep.pl ERROR:  58P01: could 
not open file "base/5/16862": No such file or directory
2023-10-01 12:24:05.996 PDT [646914:118] 001_stream_rep.pl ERROR:  22023: fork 
"main" does not exist for this relation

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 



diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index c60314d..ad54859 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -10,6 +10,14 @@
 #-
 
 EXTRA_INSTALL=contrib/pg_prewarm contrib/pg_stat_statements 
contrib/test_decoding
+EXTRA_INSTALL += \
+  contrib/pg_visibility \
+  contrib/amcheck \
+  contrib/pgstattuple \
+  contrib/btree_gin \
+  contrib/pg_surgery \
+  contrib/pg_freespacemap \
+  contrib/pgrowlocks
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/001_stream_rep.pl 
b/src/test/recovery/t/001_stream_rep.pl
index 0c72ba0..e441cb2 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -45,6 +45,24 @@ $node_standby_2->start;
 # Create some content on primary and check its presence in standby nodes
 $node_primary->safe_psql('postgres',
"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
+$node_primary->safe_psql('postgres', "
+create extension pg_visibility;
+create extension amcheck;
+create extension pgstattuple;
+create extension btree_gin;
+create extension pg_surgery;
+create extension pg_prewarm;
+create extension pg_freespacemap;
+create extension pgrowlocks;
+
+create unlogged sequence useq;
+create unlogged table u (c text);
+insert into u values ('foo');
+create index ubtree on u (c);
+create index ugin on u using gin (c);
+create index uhash on u using hash (c);
+create index ubrin on u using brin (c);
+");
 
 # Wait for standbys to catch up
 $node_primary->wait_for_replay_catchup($node_standby_1);
@@ -60,6 +78,88 @@ $result =
 print "standby 2: $result\n";
 is($result, qq(1002), 'check streamed content on standby 2');
 
+# Call regclass-arg functions on unlogged objects.  Function list from:
+#
+# DO $$
+# DECLARE
+#  extname text;
+# BEGIN
+#  FOR extname in SELECT name from pg_available_extensions LOOP
+#  EXECUTE format('CREATE EXTENSION IF NOT EXISTS %I CASCADE', 
extname);
+#  END LOOP;
+# END
+# $$;
+# select oid::regprocedure from pg_proc
+# where proargtypes::oid[] @> array['regclass'::regtype::oid]
+# order by oid::regprocedure::text;
+#
+# Removed pageinspect functions, since the superuser might choose to run them
+# in cases we wouldn't expect.  Added pgrowlocks(text).
+$node_standby_2->psql('postgres', <<'EOSQL', on_error_stop => 0);
+\set utable '''u''::regclass'
+\set VERBOSITY verbose
+SET log_error_verbosity = verbose;
+SET log_min_messages = debug1; -- for amcheck
+SET client_min_messages = debug1;
+
+SELECT * FROM pgrowlocks(:utable::text);
+SELECT brin_desummarize_range('ubrin',1);
+SELECT brin_summarize_new_values('ubrin');
+SELECT brin_summarize_range('ubrin',1);
+SELECT bt_index_check('ubtree');
+SELECT bt_index_check('ubtree',true);
+SELECT bt_index_parent_check('ubtree');
+SELECT bt_index_parent_check('ubtree',true);
+SELECT bt_index_parent_check('ubtree',true,true);
+SELECT gin_clean_pending_list('ugin');
+SELECT heap_force_freeze(:utable,array['(0,1)'::tid]);
+SELECT heap_force_kill(:utable,array['(0,1)'::tid]);
+SELECT nextval('useq');
+SELECT currval('useq');
+SELECT pg_check_frozen(:utable);
+SELECT pg_check_visible(:utable);
+SELECT pg_column_is_updatable(:utable,'1',true);
+--SELECT pg_extension_config_dump(:utable,'select 1');
+SELECT pg_freespace(:utable);
+SELECT pg_freespace(:utable,1);
+SELECT 

Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-10-01 Thread Tom Lane
Peter Eisentraut  writes:
> Is this patch still being worked on?

I thought Andres simply hadn't gotten back to it yet.
It still seems like a worthwhile improvement.

regards, tom lane




Re: stopgap fix for signal handling during restore_command

2023-10-01 Thread Peter Eisentraut

On 22.04.23 01:07, Nathan Bossart wrote:

On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote:

On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:

Here is an attempt at adding a signal safe function for writing to STDERR.


Cool.


I'm gently bumping this thread to see if anyone had additional feedback on
the proposed patches [0].  The intent was to back-patch these as needed and
to pursue a long-term fix in v17.  Are there any concerns with that?

[0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13


Is this still being contemplated?  What is the status of this?




Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-01 Thread Peter Eisentraut

On 20.09.23 05:41, Peter Geoghegan wrote:

On Sun, May 14, 2023 at 6:06 PM Peter Geoghegan  wrote:

BTW, Google cloud already just instruct their users to ignore the
xidStopLimit HINT about single user mode:

https://cloud.google.com/sql/docs/postgres/txid-wraparound


I read this just today, and was reminded of this thread:

https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum

It reads:

"1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are
32-bit unsigned integers that are assigned to each transaction and
also get incremented. When they reach their maximum value, it would
wrap around to zero (similar to a ring buffer) and can lead to data
corruption."


What is the status of this patch discussion now?  It had been set as 
Ready for Committer for some months.  Do these recent discussions 
invalidate that?  Does it need more discussion?






Re: False "pg_serial": apparent wraparound” in logs

2023-10-01 Thread Heikki Linnakangas

On 30/09/2023 02:16, Imseih (AWS), Sami wrote:

The initial idea was to advance the latest_page_number
during SerialSetActiveSerXmin, but the initial approach is
obviously wrong.


That approach at high level could work, a


When SerialSetActiveSerXmin is called for a new active
serializable xmin, and at that point we don't need to keep any
any earlier transactions, should SimpleLruZeroPage be called
to ensure there is a target page for the xid?

I tried something like below, which fixes my repro, by calling
SimpleLruZeroPage at the end of SerialSetActiveSerXmin.

@@ -953,6 +953,8 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
  static void
  SerialSetActiveSerXmin(TransactionId xid)
  {
+   int targetPage = SerialPage(xid);
+
 LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
  
 /*

@@ -992,6 +994,9 @@ SerialSetActiveSerXmin(TransactionId xid)
  
 serialControl->tailXid = xid;
  
+   if (serialControl->headPage != targetPage)

+   SimpleLruZeroPage(SerialSlruCtl, targetPage);
+
 LWLockRelease(SerialSLRULock);
  }


No, that's very wrong too. You are zeroing the page containing the 
oldest XID that's still needed. That page still contains important 
information. It might work if you zero the previous page, but I think 
you need to do a little more than that. (I wish we had tests that would 
catch that.)


The crux of the problem is that 'tailXid' can advance past 'headXid'. I 
was bit surprised by that, but I think it's by design. I wish it was 
called out explicitly in a comment though. The code mostly handles that 
fine, except that it confuses the "apparent wraparound" check.


'tailXid' is the oldest XID that we might still need to look up in the 
SLRU, based on the transactions that are still active, and 'headXid' is 
the newest XID that has been written out to the SLRU. But we only write 
an XID out to the SLRU and advance headXid if the shared memory data 
structure fills up. So it's possible that as old transactions age out, 
we advance 'tailXid' past 'headXid'.


SerialAdd() tolerates tailXid > headXid. It will zero out all the pages 
between the old headXid and tailXid, even though no lookups can occur on 
those pages. That's unnecessary but harmless.


I think the smallest fix here would be to change CheckPointPredicate() 
so that if tailPage > headPage, pass headPage to SimpleLruTruncate() 
instead of tailPage. Or perhaps it should go into the "The SLRU is no 
longer needed" codepath in that case. If tailPage > headPage, the SLRU 
isn't needed at the moment.


In addition to that, we could change SerialAdd() to not zero out the 
pages between old headXid and tailXid unnecessarily, but that's more of 
an optimization than bug fix.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-10-01 Thread Peter Eisentraut

Is this patch still being worked on?

On 07.03.23 01:51, Tom Lane wrote:

Andres Freund  writes:

On 2023-03-03 15:09:18 -0500, Tom Lane wrote:

It'd be good to have a header comment for ExecInitExprRec documenting
the arguments, particularly that resv/resnull are where to put the
subplan's eventual result.



Did you mean ExecInitSubPlanExpr()?


Right, copy-and-pasteo, sorry.


You could avoid having to assume ExprState's resvalue/resnull being
safe to use by instead using the target resv/resnull.  This would
require putting those into the EEOP_PARAM_SET step so that
ExecEvalParamSet knows where to fetch from, so maybe it's not an
improvement, but perhaps worth considering.



I think that'd be a bit worse - we'd have more pointers that can't be handled
in a generic way in JIT.


OK.

regards, tom lane








Re: Fix receiving large legal tsvector from binary format

2023-10-01 Thread Tom Lane
=?koi8-r?B?5dLPyMnOIOTFzsnTIPfMwcTJzcnSz9fJ3g==?=  writes:
> There is a problem on receiving large tsvector from binary format with
> getting error "invalid tsvector: maximum total lexeme length exceeded".

Good catch!  Even without an actual failure, we'd be wasting space
on-disk anytime we stored a tsvector received through binary input.

I pushed your 0001 and 0002, but I don't really agree that 0003
is an improvement.  It looks to me like it'll result in one
repalloc per lexeme, instead of the O(log N) behavior we had before.
It's not that important to keep the palloc chunk size small here,
given that we don't allow tsvectors to get anywhere near 1Gb anyway.

regards, tom lane




Re: Skip Orderby Execution for Materialized Views

2023-10-01 Thread David G. Johnston
On Sun, Oct 1, 2023 at 8:57 AM Zhang Mingli  wrote:

> And if it’s true, shall we skip the order by clause for Materialized
> View  when executing create/refresh statement?
>

We tend to do precisely what the user writes into their query.  If they
don't want an order by they can remove it.  I don't see any particular
reason we should be second-guessing them here. And what makes the trade-off
worse is the reasonable expectation that we'd provide a way to force an
ordering of the inserts should the user actually want it after we defaulted
to ignoring that part of their query.

But yes, you are correct that adding an order by to a materialized view is
typically pointless.  To the extent it is detrimental varies since even
partially ordered results can save on processing time.

David J.


Re: Skip Orderby Execution for Materialized Views

2023-10-01 Thread Zhang Mingli
HI,

> On Oct 1, 2023, at 22:54, Tom Lane  wrote:
> 
>  For one example,
> you can't just remove the sort clause if the query uses DISTINCT ON


Hi, Tom, got it, thanks, 

Zhang Mingli
HashData https://www.hashdata.xyz



Re: Regression in COPY FROM caused by 9f8377f7a2

2023-10-01 Thread Andrew Dunstan



On 2023-09-26 Tu 04:11, Laurenz Albe wrote:

Here is an improved version of the patch with regression tests.



Thanks, pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Skip Orderby Execution for Materialized Views

2023-10-01 Thread Tom Lane
Zhang Mingli  writes:
> When create  or refresh a Materialized View, if the view’s query has order 
> by, we may sort and insert the sorted data into view.

Indeed.

> And if it’s true, shall we skip the order by clause for Materialized View  
> when executing create/refresh statement?

No.  The intent of a materialized view is to execute the query
as presented.  If the user doesn't understand the consequences
of that, it's not our job to think we are smarter than they are.

I think this patch should be rejected.

BTW, I'm pretty certain that this patch breaks some valid cases
even if we take your point of view as correct.  For one example,
you can't just remove the sort clause if the query uses DISTINCT ON.

regards, tom lane




Skip Orderby Execution for Materialized Views

2023-10-01 Thread Zhang Mingli
Hi, all


When create  or refresh a Materialized View, if the view’s query has order by, 
we may sort and insert the sorted data into view.

Create Materialized View mv1 as select c1, c2 from t1 order by c2;
Refresh Materialized View mv1;

And it appears that we could get ordered data  when select from Materialized 
View;

Select * from mv1;

But it’s not true if we use other access methods, or we choose a parallel 
seqscan plan.
A non-parallel seqscan on heap table appears ordered as we always create new 
rel file and swap them, in my opinion, it’s more like a free lunch.

So, conclusion1:  We couldn’t rely on the `ordered-data` even the mv’s sql has 
order by clause, is it right?

And if it’s true, shall we skip the order by clause for Materialized View  when 
executing create/refresh statement?

Materialized View’s order by clause could be skipped if

1. Order by clause is on the top query level
2. There is no real limit clause

The benefit is the query may be speeded up without sort nodes each time 
creating/refreshing Materialized View.


A simple results:

create table t1 as select i as c1 , i/2 as c2 , i/5 as c3 from 
generate_series(1, 10) i;
create materialized view mvt1_order as select c1, c2, c3 from t1 order 
by c2, c3, c1 asc

Without this patch:
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 228.548 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 230.374 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 217.079 ms


With this patch:

zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 192.409 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 204.398 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 197.510 ms



Zhang Mingli
www.hashdata.xyz


v1-01-Skip-Orderby-clause-execution-for-Materialized-Views.patch
Description: Binary data


Re: make add_paths_to_append_rel aware of startup cost

2023-10-01 Thread Andy Fan
 Hi David,

But overall, I'm more inclined to just go with the more simple "add a
> cheap unordered startup append path if considering cheap startup
> plans" version. I see your latest patch does both. So, I'd suggest two
> patches as I do see the merit in keeping this simple and cheap.  If we
> can get the first part in and you still find cases where you're not
> getting the most appropriate startup plan based on the tuple fraction,
> then we can reconsider what extra complexity we should endure in the
> code based on the example query where we've demonstrated the planner
> is not choosing the best startup path appropriate to the given tuple
> fraction.
>

I think this is a fair point,  I agree that your first part is good enough
to be
committed first.   Actually I tried a lot to make a test case which can
prove
the value of cheapest fractional cost but no gain so far:(

-- 
Best Regards
Andy Fan


Re: POC: GROUP BY optimization

2023-10-01 Thread Andrei Lepikhov
New version of the patch. Fixed minor inconsistencies and rebased onto 
current master.


--
regards,
Andrey Lepikhov
Postgres Professional
From 2f5a42c8a53286f830e3376ff4d3f8b7d4215b4b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 600 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1769 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-   
QUERY PLAN  
 
-
+  QUERY PLAN