xmlserialize bug - extra empty row at the end

2023-04-22 Thread Pavel Stehule
Hi

maybe I found a bug in xmlserialize

SELECT xmlserialize(DOCUMENT '42' AS
varchar INDENT);

(2023-04-23 07:27:53) postgres=# SELECT xmlserialize(DOCUMENT
'42' AS varchar INDENT);
┌─┐
│  xmlserialize   │
╞═╡
│   ↵│
│   ↵│
│ 42↵│
│  ↵│
│  ↵│
│ │
└─┘
(1 row)

Looks so there is an extra empty row.

Regards

Pavel


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Tatsuo Ishii
> Vik Fearing  writes:
>> On 4/22/23 14:14, Tatsuo Ishii wrote:
>>> Note that RESPECT/IGNORE are not registered as reserved keywords in
>>> this patch (but registered as unreserved keywords). I am not sure if
>>> this is acceptable or not.
> 
>> For me, this is perfectly okay.  Keep them at the lowest level of 
>> reservation as possible.
> 
> Yeah, keep them unreserved if at all possible.  Any higher reservation
> level risks breaking existing applications that might be using these
> words as column or function names.

Agreed.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Tatsuo Ishii
> Excellent.  I was thinking about picking my version of this patch up
> again, but I think this might be better than mine.

Thanks.

> I am curious why set_mark is false in the IGNORE version instead of
> also being const_offset.  Surely the nth non-null in the frame will
> never go backwards.

Initially I thought that too. But when I used const_offset instead of
false. I got an error:

ERROR:  cannot fetch row before WindowObject's mark position

> I do agree that we can have  without  last> so let's move forward with this and handle the latter later.

Agreed.

>> Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
>> NULLS.
>
> This should not be hard coded.  It should be a new field in pg_proc
> (with a sanity check that it is only true for window functions).  That
> way custom window functions can implement it.

There were some discussions on this in the past.
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

It seems Tom and Andrew thought that "1.1.2. Change the behavior of
the windowapi in some consistent way" is ambitious. If we follow this
direction, I think each window function should check WindowFunc struct
passed by WinGetWindowFunc (added in my patch) to check whether IGNORE
NULLS can be applied or not in the function. If not, error out. This
way, we don't need to add a new field to pg_proc.

>> No document nor test patches are included for now.
> 
> I can volunteer to work on these if you want.

Thanks! I think you can work on top of the last patch posted by Krasiyan 
Andreev:
https://www.postgresql.org/message-id/CAN1PwonAnC-KkRyY%2BDtRmxQ8rjdJw%2BgcOsHruLr6EnF7zSMH%3DQ%40mail.gmail.com

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Bufferless buffered files

2023-04-22 Thread Thomas Munro
"$SUBJECT" may sound like a Zen kōan, but buffile.c does a few useful
things other than buffering (I/O time tracking, inter-process file
exchange though file sets, segmentation, etc).  A couple of places
such as logtape.c, sharedtuplestore.c and gistbuildbuffers.c do
block-oriented I/O in temporary files, but unintentionally also pay
the price of buffile.c's buffering scheme for no good reason, namely:

 * extra memory for the buffer
 * extra memcpy() to/from that buffer
 * I/O chopped up into BLCKSZ-sized system calls

Melanie and I were bouncing some ideas around about the hash join
batch memory usage problem, and I thought it might be good to write
down these thoughts about lower level buffer management stuff in a new
thread.  They don't address the hard algorithmic issues in that topic
(namely: at some point, doubling the number of partition files will
take more buffer space than can be saved by halving the size of the
hash table, parallel or not), but we probably should remove a couple
of the constants that make the parallel case worse.  As a side benefit
it would be nice to also fix some obvious silly efficiency problems
for all users of BufFiles.

First, here is a toy patch to try to skip the buffering layer in
BufFile when possible, without changing any interfaces.  Other ways
would of course be possible, including a new separate module with a
different name.  The approach taken here is to defer allocation of
BufFile::buffer, and use a fast path that just passes I/O straight
through to the OS without touching the sides, until the first
non-block-aligned I/O is seen, at which point we switch to the less
efficient mode.  Don't take this code too seriously, and I'm not sure
what to think about short reads/writes yet; it's just a cheap demo to
accompany this problem statement.

We could go further (and this one is probably more important).
SharedTuplestore works in 32KB chunks, which are the "parallel grain"
for distributing data to workers that read the data back later
(usually the workers try to read from different files to avoid
stepping on each others' toes, but when there aren't enough files left
they gang up and read from one file in parallel, and we need some way
to scatter the data to different workers; these chunks are analogous
to the heap table blocks handed out by Parallel Seq Scan).  With the
bufferless-BufFile optimisation, data is written out directly from
sharedtuplestore.c's buffer to the OS in a 32KB pwrite(), which is a
nice improvement, but that's 32KB of buffer space that non-parallel HJ
doesn't have.  We could fix that by making the chunk size dynamic, and
turning it down as low as 8KB when the number of partitions shoots up
to silly numbers by some heuristics.  (I guess the buffer size and the
chunk size don't have to be strictly the same, but it'd probably make
the code simpler...)  With those changes, PHJ would use strictly no
more buffer memory than HJ, unless npartitions is fairly low when it
should be OK to do so (and we could even consider making it larger
sometimes for bigger I/Os).

(Obviously it's just terrible in every way when we have very high
numbers of partitions, and this just mitigates one symptom.  For
example, once we get to our max fds limit, we're re-opening and
re-closing files at high speed, which is bonkers, and the kernel has
an identical buffer space management problem on its side of the wall.
In the past we've talked about something a bit more logtape.c-like
(multiple streams in one file with our own extent management) which
might help a bit.  But... it feels like what we really need is a
higher level algorithmic solution that lets us cap the number of
partitions, as chronicled elsewhere.)

We could go further again.  The above concerns the phase where we're
writing data out to many partitions, which requires us to have
in-memory state for all of them.  But during the reading back phase,
we read from just a single partition at a time, so we don't have the
memory explosion problem.  For dumb implementation reasons
(essentially because it was styled on the older pre-parallel HJ code),
SharedTuplestore reads one tuple at a time from buffile.c with
BufFileRead(), so while reading it *needs* BufFile's internal
buffering, which means 8KB pread() calls, no optimisation for you.  We
could change it to read in whole 32KB chunks at a time (or dynamic
size as mentioned already), via the above faster bufferless path.  The
next benefit would be that, with whole chunks in *its* buffer, it
could emit MinimalTuples using a pointer directly into that buffer,
skipping another memcpy() (unless the tuple is too big and involves
overflow chunks, a rare case).  (Perhaps there could even be some
clever way to read whole chunks directly into the hash table memory
chunks at once, instead of copying tuples in one-by-one, not
examined.)

Another closely related topic that came up in other recent I/O work:
How can we arrange for all of these buffers to be allocated more

Re: Mark a transaction uncommittable

2023-04-22 Thread Julien Rouhaud
Hi,

On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote:
>
> I have an application for this: creating various dev/test versions of data
> from production.
>
> Start by restoring a copy of production from backup. Then successively
> create several altered versions of the data and save them to a place where
> developers can pick them up. For example, you might have one version which
> has all data old than 1 year deleted, and another where 99% of the
> students/customers/whatever are deleted. Anonymization could also be
> applied. This would give you realistic (because it ultimately originates
> from production) test data.
>
> This could be done by starting a non-committable transaction, making the
> adjustments, then doing a pg_dump in the same transaction (using --snapshot
> to allow it to see that transaction). Then rollback, and repeat for the
> other versions. This saves repeatedly restoring the (probably very large)
> production data each time.

There already are tools to handle those use cases.  Looks for instance at
https://github.com/mla/pg_sample to backup a consistent subset of the data, or
https://github.com/rjuju/pg_anonymize to transparently pg_dump (or
interactively query) anonymized data.

Both tool also works when connected on a physical standby, while trying to
update data before dumping them wouldn't.




Re: Should we remove vacuum_defer_cleanup_age?

2023-04-22 Thread Andres Freund
Hi,

On 2023-04-13 13:18:38 +0200, Alvaro Herrera wrote:
> On 2023-Apr-11, Andres Freund wrote:
> 
> > Updated patch attached. I think we should either apply something like that
> > patch, or at least add a  to the docs.
> 
> I gave this patch a look.  The only code change is that
> ComputeXidHorizons() and GetSnapshotData() no longer handle the case
> where vacuum_defer_cleanup_age is different from zero.  It looks good.
> The TransactionIdRetreatSafely() code being removed is pretty weird (I
> spent a good dozen minutes writing a complaint that your rewrite was
> faulty, but it turns out I had misunderstood the function), so I'm glad
> it's being retired.

My rewrite of what? The creation of TransactionIdRetreatSafely() in
be504a3e974?

I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more
things to 64bit xids (lest they end up with the same bug as fixed by
be504a3e974), so it's perhaps worth thinking about how to make it less
confusing.


> > 
> > -Similarly, 
> > -and  provide protection 
> > against
> > -relevant rows being removed by vacuum, but the former provides no
> > -protection during any time period when the standby is not connected,
> > -and the latter often needs to be set to a high value to provide 
> > adequate
> > -protection.  Replication slots overcome these disadvantages.
> > +Similarly,  on its own, 
> > without
> > +also using a replication slot, provides protection against relevant 
> > rows
> > +being removed by vacuum, but provides no protection during any time 
> > period
> > +when the standby is not connected.  Replication slots overcome these
> > +disadvantages.
> 
> I think it made sense to have this paragraph be separate from the
> previous one when it was talking about two separate variables, but now
> that it's just one, it looks a bit isolated.  I would merge it with the
> one above, which is talking about pretty much the same thing, and
> reorder the whole thing approximately like this
> 
>
> In lieu of using replication slots, it is possible to prevent the removal
> of old WAL segments using , or by
> storing the segments in an archive using
>  or  linkend="guc-archive-library"/>.
> However, these methods often result in retaining more WAL segments than
> required.
> Similarly,  without
> a replication slot provides protection against relevant rows
> being removed by vacuum, but provides no protection during any time period
> when the standby is not connected.
>
>
> Replication slots overcome these disadvantages by retaining only the 
> number
> of segments known to be needed.
> On the other hand, replication slots can retain so
> many WAL segments that they fill up the space allocated
> for pg_wal;
>  limits the size of WAL files
> retained by replication slots.
>

It seems a bit confusing now, because "by retaining only the number of
segments ..." now also should cover hs_feedback (due to merging), but doesn't.


> Though the "However," looks a poor fit; I would do this:

I agree, I don't like the however.


I think I'll push the version I had. Then we can separately word-smith the
section? Unless somebody protests I'm gonna do that soon.

Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Noah Misch
On Sat, Apr 22, 2023 at 04:15:23PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > - skip if the break-glass "pgindent: no" appears in a commit message
> 
> There are two things that bother me about putting this functionality
> into a server hook, beyond the possible speed issue:
> 
> * The risk of failure.  I don't have a terribly difficult time imagining
> situations where things get sufficiently wedged that the server accepts
> *no* commits, not even ones fixing the problem.  An override such as
> you suggest here could assuage that fear, perhaps.

I agree that deserves some worry.

> * The lack of user-friendliness.  AFAIK, if a pre-receive hook fails
> you learn little except that it failed.

That is incorrect.  The client gets whatever the hook prints.  I'd probably
make it print the first 1 lines of the diff.


I'm okay with a buildfarm animal.  It's going to result in a more-cluttered
git history as people push things, break that animal, and push followup fixes.
While that's sad, I expect the level of clutter will go down pretty quickly
and will soon be no worse than we already get from typo-fix pushes.




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Tom Lane
Noah Misch  writes:
> - skip if the break-glass "pgindent: no" appears in a commit message

There are two things that bother me about putting this functionality
into a server hook, beyond the possible speed issue:

* The risk of failure.  I don't have a terribly difficult time imagining
situations where things get sufficiently wedged that the server accepts
*no* commits, not even ones fixing the problem.  An override such as
you suggest here could assuage that fear, perhaps.

* The lack of user-friendliness.  AFAIK, if a pre-receive hook fails
you learn little except that it failed.  This could be extremely
frustrating to debug, especially in a situation where your local
pgindent is giving you different results than the server gets.

The idea of a buildfarm animal failing if --show-diff isn't empty
is attractive to me mainly because it is far nicer from the
debuggability standpoint.

Maybe, after we get some amount of experience with trying to keep
things always indent-clean, we will decide that it's reliable enough
to enforce in a server hook.  I think going for that right away is
sheer folly though.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Noah Misch
(Given that another commentator is "absolutely against" a hook, this message
is mostly for readers considering this for other projects.)

On Sat, Apr 22, 2023 at 03:23:59PM +0200, Magnus Hagander wrote:
> On Tue, Feb 7, 2023 at 5:43 AM Noah Misch  wrote:
> > On Mon, Feb 06, 2023 at 06:17:02PM +0100, Peter Eisentraut wrote:
> > > Also, pgindent takes tens of seconds to run, so hooking that into the git
> > > push process would slow this down quite a bit.
> >
> > The pre-receive hook would do a full pgindent when you change typedefs.list.
> > Otherwise, it would reindent only the files being changed.  The average push
> > need not take tens of seconds.
> 
> It would probably ont be tens of seconds, but it would be slow. It
> would need to do a clean git checkout into an isolated environment and
> spawn in there, and just that takes time.

That would be slow, but I wouldn't do it that way.  I'd make "pg_bsd_ident
--pre-receive --show-diff" that, instead of reading from the filesystem, gets
the bytes to check from the equivalent of this Perl-like pseudocode:

while (<>) {
my($old_hash, $new_hash, $ref) = split;
foreach my $filename (split /\n/, `git diff --name-only 
$old_hash..$new_hash`) {
$file_content = `git show $new_hash $filename`;
}
}

I just ran pgindent on the file name lists of the last 1000 commits, and
runtime was less than 0.5s for each of 998/1000 commits.  There's more a real
implementation might handle:

- pg_bsd_indent changes
- typedefs.list changes
- skip if the break-glass "pgindent: no" appears in a commit message
- commits changing so many files that a clean "git checkout" would be faster

> And it would have to also
> know to rebuild pg_bsd_indent on demand, which would require a full
> ./configure run (or meson equivalent). etc.
> 
> So while it might not be tens of seconds, it most definitely won't be fast.

A project more concerned about elapsed time than detecting all defects might
even choose to take no synchronous action for pg_bsd_indent and typedefs.list
changes.  When a commit changes either of those, the probability that the
committer already ran pgindent rises substantially.




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-04-22 Sa 11:37, Tom Lane wrote:
>> * I see that there's now a 20230309 release, should we consider that
>> instead?

> A test I just ran gave identical results to those from 20221112

Cool, let's use perltidy 20230309 then.

> The great advantage of not doing this alignment is that there is far 
> less danger of perltidy trying to realign lines that have not in fact 
> changed, because some nearby line has changed. So we'd have a good deal 
> less pointless churn.

Yes, exactly.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Andrew Dunstan


On 2023-04-22 Sa 11:37, Tom Lane wrote:

Andrew Dunstan  writes:

On 2023-04-22 Sa 10:39, Tom Lane wrote:

Another obstacle in the way of (1) is that there was some discussion
of changing perltidy version and/or options.  But I don't believe
we have a final proposal on that, much less committed code.

Well, I posted a fairly concrete suggestion with an example patch
upthread at

I still think that's worth doing.

OK, so plan is (a) update perltidyrc to add --valign-exclusion-list,
(b) adjust pgindent/README to recommend perltidy version 20221112.

Questions:

* I see that there's now a 20230309 release, should we consider that
instead?



A test I just ran gave identical results to those from 20221112




* emacs.samples provides pgsql-perl-style that claims to match
perltidy's rules.  Does that need any adjustment?  I don't see
anything in it that looks relevant, but I'm not terribly familiar
with emacs' Perl formatting options.



At least w.r.t. the vertical alignment issue, AFAICT the emacs style 
does not attempt to align anything vertically except the first non-space 
thing on the line. So if anything, by abandoning a lot of vertical 
alignment it would actually be closer to what the sample emacs style does.


The great advantage of not doing this alignment is that there is far 
less danger of perltidy trying to realign lines that have not in fact 
changed, because some nearby line has changed. So we'd have a good deal 
less pointless churn.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada

2023-04-22 Thread Jonathan S. Katz

On 4/20/23 1:40 PM, Tom Lane wrote:

The Core Team would like to extend our congratulations to
Nathan Bossart, Amit Langote, and Masahiko Sawada, who have
accepted invitations to become our newest Postgres committers.

Please join me in wishing them much success and few reverts.


Congratulations Nathan, Amit, and Masahiko!

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: pg_stat_io not tracking smgrwriteback() is confusing

2023-04-22 Thread Jonathan S. Katz

On 4/19/23 1:23 PM, Andres Freund wrote:

Hi,

I noticed that the numbers in pg_stat_io dont't quite add up to what I
expected in write heavy workloads. Particularly for checkpointer, the numbers
for "write" in log_checkpoints output are larger than what is visible in
pg_stat_io.

That partially is because log_checkpoints' "write" covers way too many things,
but there's an issue with pg_stat_io as well:

Checkpoints, and some other sources of writes, will often end up doing a lot
of smgrwriteback() calls - which pg_stat_io doesn't track. Nor do any
pre-existing forms of IO statistics.

It seems pretty clear that we should track writeback as well. I wonder if it's
worth doing so for 16? It'd give a more complete picture that way. The
counter-argument I see is that we didn't track the time for it in existing
stats either, and that nobody complained - but I suspect that's mostly because
nobody knew to look.


[RMT hat]

(sorry for slow reply on this, I've been out for a few days).

It does sound generally helpful to track writeback to ensure anyone 
building around pg_stat_io can see tthe more granular picture. How big 
of an effort is this? Do you think this helps to complete the feature 
for v16?


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: check_strxfrm_bug()

2023-04-22 Thread Jonathan S. Katz

On 4/19/23 9:34 PM, Thomas Munro wrote:

On Wed, Apr 19, 2023 at 2:31 PM Jonathan S. Katz  wrote:

To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?

Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.


Thanks all.  I went ahead and removed check_strxfrm_bug().


Thanks! For housekeeping, I put this into "Open Items" and marked it as 
resolved.



I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?


I believe we'd be qualifying this as an open item too? If so, let's add it.

Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Tom Lane
Vik Fearing  writes:
> On 4/22/23 14:14, Tatsuo Ishii wrote:
>> Note that RESPECT/IGNORE are not registered as reserved keywords in
>> this patch (but registered as unreserved keywords). I am not sure if
>> this is acceptable or not.

> For me, this is perfectly okay.  Keep them at the lowest level of 
> reservation as possible.

Yeah, keep them unreserved if at all possible.  Any higher reservation
level risks breaking existing applications that might be using these
words as column or function names.

regards, tom lane




Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Vik Fearing

On 4/22/23 14:14, Tatsuo Ishii wrote:

I revisited the thread:
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users.


Excellent.  I was thinking about picking my version of this patch up 
again, but I think this might be better than mine.


I am curious why set_mark is false in the IGNORE version instead of also 
being const_offset.  Surely the nth non-null in the frame will never go 
backwards.


Dealing with marks was the main reason (I think) that my patch was not 
accepted.



For FIRST/LAST I am not so excited since there are alternatives
as our document stats,


I disagree with this.  The point of having FROM LAST is to avoid 
calculating a new window and running a new pass over it.


> so FIRST/LAST are not included in the patch.

I do agree that we can have  without last> so let's move forward with this and handle the latter later.



Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS.


This should not be hard coded.  It should be a new field in pg_proc 
(with a sanity check that it is only true for window functions).  That 
way custom window functions can implement it.



I think it's not hard to implement it for others (lead, lag,
first_value and last_value).


It doesn't seem like it should be, no.


No document nor test patches are included for now.


I can volunteer to work on these if you want.


Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.


For me, this is perfectly okay.  Keep them at the lowest level of 
reservation as possible.

--
Vik Fearing





Re: pg_collation.collversion for C.UTF-8

2023-04-22 Thread Daniel Verite
Thomas Munro wrote:

> It looks like for technical reasons
> inside glibc, that couldn't be done before 2.35:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=17318
> 
> That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied
> by the glibc project) isn't supposed to be versioned, but it's
> extremely unfortunate that a bunch of OSes (Debian and maybe more)
> have been sorting text in some other order under that name for
> years.

Yes. This is consistent with Debian/Ubuntu patches in 
glibc/localedata/locales/C

glibc-2.35 is not patched, and upstream has this:
  LC_COLLATE
  % The keyword 'codepoint_collation' in any part of any LC_COLLATE
  % immediately discards all collation information and causes the
  % locale to use strcmp/wcscmp for collation comparison.  This is
  % exactly what is needed for C (ASCII) or C.UTF-8.
  codepoint_collation
  END LC_COLLATE

But in older versions, glibc doesn't have the locales/C data file.
Debian adds it in debian/patches/localedata/C with that kind of
content:

* glibc 2.31  Debian 11
  LC_COLLATE
  order_start forward
  
  ..
  
  
  ..
  
  etc...

But as explained in the above-linked bugzilla entry, that did not
result in true byte-comparison semantics, for several reasons
that got fixed in 2.35.

So this looks like a solved problem for anyone starting to use these
collation with glibc 2.35 or newer (or other OSes that don't have a
compatibility issue with them in the first place).
But Debian/Ubuntu users upgrading from the older C.* to 2.35+ will not
be having the normal warning about the need to reindex.

I understand that my proposal to version C.* like any other collation
might be erring on the side of caution, but ignoring these collation
changes on at least one major OS does not feel right either.
Maybe we should consider doing platform-dependent checks?



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Mark a transaction uncommittable

2023-04-22 Thread Isaac Morland
On Sat, 22 Apr 2023 at 11:01, Gurjeet Singh  wrote:

> This is a proposal for a new transaction characteristic. I haven't
> written any code, yet, and am interested in hearing if others may find
> this feature useful.
>
> Many a times we start a transaction that we never intend to commit;
> for example, for testing, or for EXPLAIN ANALYZE, or after detecting
> unexpected results but still interested in executing more commands
> without risking commit,  etc.
>
> A user would like to declare their intent to eventually abort the
> transaction as soon as possible, so that the transaction does not
> accidentally get committed.
>

I have an application for this: creating various dev/test versions of data
from production.

Start by restoring a copy of production from backup. Then successively
create several altered versions of the data and save them to a place where
developers can pick them up. For example, you might have one version which
has all data old than 1 year deleted, and another where 99% of the
students/customers/whatever are deleted. Anonymization could also be
applied. This would give you realistic (because it ultimately originates
from production) test data.

This could be done by starting a non-committable transaction, making the
adjustments, then doing a pg_dump in the same transaction (using --snapshot
to allow it to see that transaction). Then rollback, and repeat for the
other versions. This saves repeatedly restoring the (probably very large)
production data each time.

What I’m not sure about is how long it takes to rollback a transaction. I'm
assuming that it’s very quick compared to restoring from backup.

It would be nice if pg_basebackup could also have the --snapshot option.


Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-04-22 Thread Gurjeet Singh
On Fri, Apr 21, 2023 at 12:14 AM Nikita Malakhov  wrote:
> This limitation applies not only to wide tables - it also applies to tables 
> where TOASTed values
> are updated very often. You would soon be out of available TOAST value ID 
> because in case of
> high frequency updates autovacuum cleanup rate won't keep up with the 
> updates. It is discussed
> in [1].
>
> On Fri, Apr 21, 2023 at 9:39 AM Jakub Wartak  
> wrote:
>> I would like to ask if it wouldn't be good idea to copy the
>> https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit
>> discussion (out-of-line OID usage per TOAST-ed columns / potential
>> limitation) to the official "Appendix K. PostgreSQL Limits" with also
>> little bonus mentioning the "still searching for an unused OID in
>> relation" notice. Although it is pretty obvious information for some
>> and from commit 7fbcee1b2d5f1012c67942126881bd492e95077e and the
>> discussion in [1], I wonder if the information shouldn't be a little
>> more well known via the limitation (especially to steer people away
>> from designing very wide non-partitioned tables).
>>
>> [1] - 
>> https://www.postgresql.org/message-id/flat/16722-93043fb459a41073%40postgresql.org
>
> [1] 
> https://www.postgresql.org/message-id/CAN-LCVPRvRzxeUdYdDCZ7UwZQs1NmZpqBUCd%3D%2BRdMPFTyt-bRQ%40mail.gmail.com

These 2 discussions show that it's a painful experience to run into
this problem, and that the hackers have ideas on how to fix it, but
those fixes haven't materialized for years. So I would say that, yes,
this info belongs in the hard-limits section, because who knows how
long it'll take this to be fixed.

Please submit a patch.

I anticipate that edits to Appendix K Postgres Limits will prompt
improving the note in there about the maximum column limit, That note
is too wordy, and sometimes confusing, especially for the audience
that it's written for: newcomers to Postgres ecosystem.

Best regards,
Gurjeet https://Gurje.et
http://aws.amazon.com




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-04-22 Sa 10:39, Tom Lane wrote:
>> Another obstacle in the way of (1) is that there was some discussion
>> of changing perltidy version and/or options.  But I don't believe
>> we have a final proposal on that, much less committed code.

> Well, I posted a fairly concrete suggestion with an example patch 
> upthread at
> 
> I still think that's worth doing.

OK, so plan is (a) update perltidyrc to add --valign-exclusion-list,
(b) adjust pgindent/README to recommend perltidy version 20221112.

Questions:

* I see that there's now a 20230309 release, should we consider that
instead?

* emacs.samples provides pgsql-perl-style that claims to match
perltidy's rules.  Does that need any adjustment?  I don't see
anything in it that looks relevant, but I'm not terribly familiar
with emacs' Perl formatting options.

regards, tom lane




Re: Improving worst-case merge join performance with often-null foreign key

2023-04-22 Thread Tom Lane
Steinar Kaldager  writes:
> First-time potential contributor here. We recently had an incident due
> to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
> due to a join with a foreign key that was often null. We found that it
> was caused by a merge join with an index scan on one join path --
> whenever the non-null data happened to be such that the merge join
> couldn't be terminated early, the index would proceed to scan all of
> the null rows and filter each one out individually. Since this was an
> inner join, this was pointless; the nulls would never have matched the
> join clause anyway.

Hmm.  I don't entirely understand why the existing stop-at-nulls logic
in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.

Pushing down an IS NOT NULL restriction could possibly be of value
if the join is being done in the nulls-first direction, but that's
an extreme minority use-case.  I'm dubious that it'd be worth the
overhead in general.  It'd probably be more useful to make sure that
the planner's cost model is aware of this effect, so that it's prodded
to use nulls-last not nulls-first sort order when there are enough
nulls to make a difference.

regards, tom lane




Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2023-04-22 Thread Gurjeet Singh
On Thu, Dec 22, 2022 at 10:07 AM Nikita Malakhov  wrote:
> Any suggestions on the previous message (64-bit toast value ID with 
> individual counters)?

Was this patch ever added to CommitFest? I don't see it in the current
Open Commitfest.

https://commitfest.postgresql.org/43/

Best regards,
Gurjeet http://Gurje.et
http://aws.amazon.com




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Andrew Dunstan


On 2023-04-22 Sa 10:39, Tom Lane wrote:


Another obstacle in the way of (1) is that there was some discussion
of changing perltidy version and/or options.  But I don't believe
we have a final proposal on that, much less committed code.



Well, I posted a fairly concrete suggestion with an example patch 
upthread at




I still think that's worth doing.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Mark a transaction uncommittable

2023-04-22 Thread Gurjeet Singh
This is a proposal for a new transaction characteristic. I haven't
written any code, yet, and am interested in hearing if others may find
this feature useful.

Many a times we start a transaction that we never intend to commit;
for example, for testing, or for EXPLAIN ANALYZE, or after detecting
unexpected results but still interested in executing more commands
without risking commit,  etc.

A user would like to declare their intent to eventually abort the
transaction as soon as possible, so that the transaction does not
accidentally get committed.

This feature would allow the user to mark a transaction such that it
can never be committed. We must allow such marker to be placed when
the transaction is being started, or while it's in progress.

Once marked uncommittable, do not allow the marker to be removed.
Hence, once deemed uncommittable, the transaction cannot be committed,
even intentionally. This protects against cases where one script
includes another (e.g. psql's \i command), and the included script may
have statements that turn this marker back on.

Any command that ends a transaction (END, COMMIT, ROLLBACK) must
result in a rollback.

All of these properties seem useful for savepoints, too. But I want to
focus on just the top-level transactions, first.

I feel like the BEGIN and SET TRANSACTION commands would be the right
places to introduce this feature.

BEGIN [ work | transaction ] [ [ NOT ] COMMITTABLE ];
SET TRANSACTION [ [ NOT ] COMMITTABLE ];

I'm not yet sure if the COMMIT AND CHAIN command should carry this
characteristic to the next transaction.

Thoughts?

Best regards,
Gurjeet https://Gurje.et
http://aws.amazon.com




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Tom Lane
Jelte Fennema  writes:
> I think there's two things needed to actually start doing this:
> 1. We need to reindent the tree to create an indented baseline

As far as (1) goes, I've been holding off on that because there
are some large patches that still seem in danger of getting
reverted, notably 2489d76c4 and follow-ups.  A pgindent run
would change any such reversions from being mechanical into
possibly a fair bit of work.  We still have a couple of weeks
before it's necessary to make such decisions, so I don't want
to do the pgindent run before that.

Another obstacle in the way of (1) is that there was some discussion
of changing perltidy version and/or options.  But I don't believe
we have a final proposal on that, much less committed code.

regards, tom lane




Re: Move un-parenthesized syntax docs to "compatibility" for few SQL commands

2023-04-22 Thread Melanie Plageman
On Fri, Apr 21, 2023 at 09:44:51PM -0700, Nathan Bossart wrote:
> I've attached two patches.  0001 adds a parenthesized CLUSTER syntax that
> doesn't require a table name.  0002 is your patch with a couple of small
> adjustments.
> 
> On Fri, Apr 21, 2023 at 07:29:59PM -0400, Melanie Plageman wrote:
> > Attached v2 includes changes for CLUSTER syntax docs. I wasn't quite
> > sure that we can move down CLUSTER [VERBOSE] syntax to the compatibility
> > section since CLUSTER (VERBOSE) doesn't work. This seems like it should
> > work (VACUUM and ANALYZE do). I put extra "[ ]" around the main CLUSTER
> > syntax but I don't know that this is actually the same as documenting
> > that CLUSTER VERBOSE does work but CLUSTER (VERBOSE) doesn't.
> 
> I'm not sure about moving CLUSTER [VERBOSE] to the compatibility section,
> either.  At the very least, I don't think what I have in 0002 is accurate.
> It claims that syntax was used before v14, but it's still the only way to
> do a "verbose" CLUSTER without a table name in v15 and v16, too.  Perhaps
> we should break it apart, or maybe we can just say it was used before v17.
> WDYT?

If you are planning to wait and commit the change to support CLUSTER
(VERBOSE) until July, then you can consolidate the two and say before
v17. If you plan to commit it before then (making it available in v16),
I would move CLUSTER [VERBOSE] down to Compatibility and say that both
of the un-parenthesized syntaxes were used before v16. Since all of the
syntaxes still work, I think it is probably more confusing to split them
apart so granularly. The parenthesized syntax was effectively not
"feature complete" without your patch to support CLUSTER (VERBOSE).

Also, isn't this:
  CLUSTER [VERBOSE] [  [ USING  ] ]
inclusive of this:
  CLUSTER [VERBOSE]
So, it would have been correct for them to be consolidated in the
existing documentation?

> From 48fff177a8f0096c99c77b4e1368cc73f7e86585 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Fri, 21 Apr 2023 20:16:25 -0700
> Subject: [PATCH v3 1/2] Support parenthesized syntax for CLUSTER without a
>  table name.
> 
> b5913f6 added a parenthesized syntax for CLUSTER, but it requires
> specifying a table name.  This is unlike commands such as VACUUM
> and ANALYZE, which do not require specifying a table in the
> parenthesized syntax.  This change resolves this inconsistency.
> 
> The documentation for the CLUSTER syntax has also been consolidated
> in anticipation of a follow-up change that will move the
> unparenthesized syntax to the "Compatibility" section.

I suppose we should decide if unparenthesized is a word or if we are
sticking with the hyphen.

>  doc/src/sgml/ref/cluster.sgml |  5 ++---
>  src/backend/parser/gram.y | 21 ++---
>  2 files changed, 16 insertions(+), 10 deletions(-)
> 
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
...
> @@ -11593,7 +11592,17 @@ ClusterStmt:
>   n->params = lappend(n->params, 
> makeDefElem("verbose", NULL, @2));
>   $$ = (Node *) n;
>   }
> + | CLUSTER opt_verbose
> + {
> + ClusterStmt *n = makeNode(ClusterStmt);
>  
> + n->relation = NULL;
> + n->indexname = NULL;
> + n->params = NIL;
> + if ($2)
> + n->params = lappend(n->params, 
> makeDefElem("verbose", NULL, @2));
> + $$ = (Node *) n;
> + }

Maybe it is worth moving the un-parenthesized options all to the end and
specifying what version they were needed for.


>   | CLUSTER '(' utility_option_list ')' qualified_name 
> cluster_index_specification
>   {
>   ClusterStmt *n = makeNode(ClusterStmt);
> @@ -11603,15 +11612,13 @@ ClusterStmt:
>   n->params = $3;
>   $$ = (Node *) n;
>   }
> - | CLUSTER opt_verbose
> + | CLUSTER '(' utility_option_list ')'

It is too bad we can't do this the way VACUUM and ANALYZE do -- but
since qualified_name is required if USING is included, I suppose we
can't.

> diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
> index 20c6f9939f..ea42ec30bd 100644
> --- a/doc/src/sgml/ref/analyze.sgml
> +++ b/doc/src/sgml/ref/analyze.sgml
> @@ -22,7 +22,6 @@ PostgreSQL documentation
>   
>  
>  ANALYZE [ ( option [, ...] ) ] 
> [ table_and_columns [, ...] ]
> -ANALYZE [ VERBOSE ] [  class="parameter">table_and_columns [, ...] ]
...
> @@ -346,6 +338,14 @@ ANALYZE [ VERBOSE ] [  class="parameter">table_and_columns  

Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Tom Lane
Magnus Hagander  writes:
> On Sat, Apr 22, 2023 at 1:42 PM Andrew Dunstan  wrote:
>> For 2 the upstream thread listed two approaches:
>> a. Install a pre-receive git hook on the git server that rejects
>> pushes to master that are not indented
>> b. Add a test suite that checks if the code is correctly indented, so
>> the build farm would complain about it. (Suggested by Peter E)
>> 
>> I think both a and b would work to achieve 2. But as Peter E said, b
>> indeed sounds like less of a divergence of the status quo. So my vote
>> would be for b.

I am absolutely against a pre-receive hook on gitmaster.  A buildfarm
check seems far more appropriate.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Andrew Dunstan


On 2023-04-22 Sa 08:47, Magnus Hagander wrote:

On Sat, Apr 22, 2023 at 1:42 PM Andrew Dunstan  wrote:


On 2023-04-22 Sa 04:50, Michael Paquier wrote:

On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote:

For 2 the upstream thread listed two approaches:
a. Install a pre-receive git hook on the git server that rejects
pushes to master that are not indented
b. Add a test suite that checks if the code is correctly indented, so
the build farm would complain about it. (Suggested by Peter E)

I think both a and b would work to achieve 2. But as Peter E said, b
indeed sounds like less of a divergence of the status quo. So my vote
would be for b.

FWIW, I think that there is value for both of them.  Anyway, isn't 'a'
exactly the same as 'b' in design?  Both require a build of
pg_bsd_indent, meaning that 'a' would also need to run an equivalent
of the regression test suite, but it would be actually costly
especially if pg_bsd_indent itself is patched.  I think that getting
more noisy on this matter with 'b' would be enough, but as an extra
PG_TEST_EXTRA for committers to set.

Such a test suite would need a dependency to the 'git' command itself,
which is not something that could be safely run in a release tarball,
in any case.


Perhaps we should start with a buildfarm module, which would run pg_indent 
--show-diff. That would only need to run on one animal, so a failure wouldn't 
send the whole buildfarm red. This would be pretty easy to do.


Just to be clear, you guys are aware we  already have a git repo
that's supposed to track "head + pg_indent" at
https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=shortlog;h=refs/heads/master-pgindent
right?

I see it is currently not working and this has not been noticed by
anyone, so I guess it kind of indicates nobody is using it today. The
reason appears to be that it uses pg_bsd_indent that's in our apt
repos and that's 2.1.1 and not 2.1.2 at this point. But if this is a
service that would actually be useful, this could certainly be ficked
pretty easy.

But bottom line is that if pgindent is as predictable as it should be,
it might be easier to use that one central place that already does it
rather than have to build a buildfarm module?



Now that pg_bsd_indent is in the core code why not just use that?


Happy if you can make something work without further effort on my part :-)


cheers


andew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Autogenerate some wait events code and documentation

2023-04-22 Thread Drouvot, Bertrand

Hi,

On 4/20/23 3:09 AM, Michael Paquier wrote:

On Wed, Mar 29, 2023 at 02:51:27PM +0200, Drouvot, Bertrand wrote:

Just realized that more polishing was needed.

Done in V2 attached.


That would be pretty cool to get that done in an automated way, I've
wanted that for a few years now.  And I guess that a few others have
the same feeling after missing to update these docs when adding a new
wait event, or just to enforce this alphabetically, so let's do
something about it in v17.


Thanks for the feedback!


About the alphabetical order, could we have the script enforce a sort
of the elements parsed from waiteventnames.txt, based on the second
column?  This now relies on the order of the items in the file, but
my history with this stuff has proved that forcing an ordering rule
would be a very good thing long-term.


Not having the lines in order would not have been a problem for the perl script
(as it populated the hash table based on the category column while reading the
text file).

That said I do agree that enforcing an order is a good idea, as it's "easier" 
to read
the generated output files (their content is now somehow "ordered").

This is done in V3 attached.


Seeing waiteventnames.txt, I think that we should have something
closer to errcodes.txt.  Well, seeing the patch, I assume that this is
inspired by errcodes.txt, but this new file should be able to do more
IMO:
- Supporting the parsing of comments, by ignoring them in
generate-waiteventnames.pl.
- Ignore empty likes.
- Add a proper header, copyright, the output generated from it, etc.
- Document the format lines of the file.



Fully agree, it's done in V3 attached.


It is clear that the format of the file is:
- category
- C symbol in enums.
- Format in the system views.
- Description in the docs.
Or perhaps it would be better to divide this file by sections (like
errcodes.txt) for each category so as we eliminate entirely the first
column?



Yeah, that could be an option. V3 is still using the category as the first 
column
but I'm ok to change it by a section if you prefer (though I don't really see 
the need).

Perhaps waiteventnames.c should be named pgstat_wait_event.c? 


Agree, done.


Similarly,
wait_event_types.h would be a better name for the set of enums?



Also agree, done.



   utils/adt/jsonpath_scan.c \
+ utils/activity/waiteventnames.c \
+ utils/activity/waiteventnames.h \
+ utils/adt/jsonpath_scan.c \

Looks like a copy-pasto.


Why do you think so? both files have to be removed.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom aa359811161198d7a83e2853e22aec10a40f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Sat, 22 Apr 2023 10:37:56 +
Subject: [PATCH v3] Generating wait_event_types.h, pgstat_wait_event.c and
 waiteventnames.sgml

Purpose is to auto-generate those files based on the newly created 
waiteventnames.txt file.

Having auto generated files would help to avoid:

- wait event without description like observed in 
https://www.postgresql.org/message-id/CA%2BhUKGJixAHc860Ej9Qzd_z96Z6aoajAgJ18bYfV3Lfn6t9%3D%2BQ%40mail.gmail.com
- orphaned wait event like observed in 
https://www.postgresql.org/message-id/CA%2BhUKGK6tqm59KuF1z%2Bh5Y8fsWcu5v8%2B84kduSHwRzwjB2aa_A%40mail.gmail.com
---
 doc/src/sgml/.gitignore   |   1 +
 doc/src/sgml/Makefile |   5 +-
 doc/src/sgml/filelist.sgml|   1 +
 doc/src/sgml/meson.build  |  10 +
 doc/src/sgml/monitoring.sgml  | 792 +-
 src/backend/Makefile  |  14 +-
 src/backend/utils/activity/Makefile   |  10 +
 .../utils/activity/generate-waiteventnames.pl | 171 
 src/backend/utils/activity/meson.build|  24 +
 src/backend/utils/activity/wait_event.c   | 567 +
 src/backend/utils/activity/waiteventnames.txt | 188 +
 src/include/Makefile  |   2 +-
 src/include/utils/.gitignore  |   1 +
 src/include/utils/meson.build |  18 +
 src/include/utils/wait_event.h| 214 +
 src/tools/msvc/Solution.pm|  19 +
 src/tools/msvc/clean.bat  |   3 +
 17 files changed, 468 insertions(+), 1572 deletions(-)
  35.9% doc/src/sgml/
  53.1% src/backend/utils/activity/
   8.6% src/include/utils/

diff --git a/doc/src/sgml/.gitignore b/doc/src/sgml/.gitignore
index d8e3dab338..e8cbd687d5 100644
--- a/doc/src/sgml/.gitignore
+++ b/doc/src/sgml/.gitignore
@@ -17,6 +17,7 @@
 /errcodes-table.sgml
 /keywords-table.sgml
 /version.sgml
+/waiteventnames.sgml
 # Assorted byproducts from building the above
 /postgres-full.xml
 /INSTALL.html
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 71cbef230f..01a6aa8187 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -58,7 +58,7 @@ 

Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Magnus Hagander
On Tue, Feb 7, 2023 at 5:43 AM Noah Misch  wrote:
>
> On Mon, Feb 06, 2023 at 06:17:02PM +0100, Peter Eisentraut wrote:
> > Also, pgindent takes tens of seconds to run, so hooking that into the git
> > push process would slow this down quite a bit.
>
> The pre-receive hook would do a full pgindent when you change typedefs.list.
> Otherwise, it would reindent only the files being changed.  The average push
> need not take tens of seconds.

It would probably ont be tens of seconds, but it would be slow. It
would need to do a clean git checkout into an isolated environment and
spawn in there, and just that takes time. And it would have to also
know to rebuild pg_bsd_indent on demand, which would require a full
./configure run (or meson equivalent). etc.

So while it might not be tens of seconds, it most definitely won't be fast.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Improving worst-case merge join performance with often-null foreign key

2023-04-22 Thread Steinar Kaldager
Hi,

First-time potential contributor here. We recently had an incident due
to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
due to a join with a foreign key that was often null. We found that it
was caused by a merge join with an index scan on one join path --
whenever the non-null data happened to be such that the merge join
couldn't be terminated early, the index would proceed to scan all of
the null rows and filter each one out individually. Since this was an
inner join, this was pointless; the nulls would never have matched the
join clause anyway.

Test script to reproduce + example explain output:
https://paste.depesz.com/s/VUj

Once you're aware of it for a given index, this is a solvable issue.
We solved it by adding a partial index. Including an IS NOT NULL
clause in the query also seems to solve it.

However, I've gotten the notion that it should be possible to solve
this on the Postgres side, with no user action required. When doing an
inner-join merge join with a normal equality operator, we should
already "know" that we don't have to look at rows where the join keys
are null, since any such comparison involving NULL will not return
true. If we can just push that knowledge down to the index scan node,
then we should be able to avoid this entire problem, leaving one less
performance trap to trip people up.

Proof-of-concept patch (please ignore style):
https://github.com/steinarvk-oda/postgres/pull/1/files

I have a few questions:

(1) Does this approach seem reasonable and worthwhile?
(2) Can I determine programmatically at query planning time whether
the binary operator in an OpExpr has the property that all comparisons
involving nulls will be non-true? Or, failing that, can I at least
somehow hardcode and identify the built-in equality operators (which
have this property)? (Integer equality and UUID equality probably
covers a lot when it comes to joins.)
(3) Is there a systematic way to check whether adding an "IS NOT NULL"
condition would be redundant? E.g. if there is such a check in the
query already, adding another is just messy.
(4) Should this sort of thing be made conditional on a high null_frac?
Or is that unnecessary complexity, and the simplicity of just always
doing it would be better?

Thanks!
Steinar Kaldager




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Magnus Hagander
On Sat, Apr 22, 2023 at 1:42 PM Andrew Dunstan  wrote:
>
>
> On 2023-04-22 Sa 04:50, Michael Paquier wrote:
>
> On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote:
>
> For 2 the upstream thread listed two approaches:
> a. Install a pre-receive git hook on the git server that rejects
> pushes to master that are not indented
> b. Add a test suite that checks if the code is correctly indented, so
> the build farm would complain about it. (Suggested by Peter E)
>
> I think both a and b would work to achieve 2. But as Peter E said, b
> indeed sounds like less of a divergence of the status quo. So my vote
> would be for b.
>
> FWIW, I think that there is value for both of them.  Anyway, isn't 'a'
> exactly the same as 'b' in design?  Both require a build of
> pg_bsd_indent, meaning that 'a' would also need to run an equivalent
> of the regression test suite, but it would be actually costly
> especially if pg_bsd_indent itself is patched.  I think that getting
> more noisy on this matter with 'b' would be enough, but as an extra
> PG_TEST_EXTRA for committers to set.
>
> Such a test suite would need a dependency to the 'git' command itself,
> which is not something that could be safely run in a release tarball,
> in any case.
>
>
> Perhaps we should start with a buildfarm module, which would run pg_indent 
> --show-diff. That would only need to run on one animal, so a failure wouldn't 
> send the whole buildfarm red. This would be pretty easy to do.


Just to be clear, you guys are aware we  already have a git repo
that's supposed to track "head + pg_indent" at
https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=shortlog;h=refs/heads/master-pgindent
right?

I see it is currently not working and this has not been noticed by
anyone, so I guess it kind of indicates nobody is using it today. The
reason appears to be that it uses pg_bsd_indent that's in our apt
repos and that's 2.1.1 and not 2.1.2 at this point. But if this is a
service that would actually be useful, this could certainly be ficked
pretty easy.

But bottom line is that if pgindent is as predictable as it should be,
it might be easier to use that one central place that already does it
rather than have to build a buildfarm module?

//Magnus




Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Oliver Ford
On Sat, 22 Apr 2023, 13:14 Tatsuo Ishii,  wrote:

> I revisited the thread:
>
> https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com
>
> and came up with attached POC patch (I used some varibale names
> appearing in the Krasiyan Andreev's patch). I really love to have
> RESPECT/IGNORE NULLS because I believe they are convenient for
> users. For FIRST/LAST I am not so excited since there are alternatives
> as our document stats, so FIRST/LAST are not included in the patch.
>
> Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
> NULLS. I think it's not hard to implement it for others (lead, lag,
> first_value and last_value).  No document nor test patches are
> included for now.
>

I've actually recently been looking at this feature again recently as well.
One thing I wondered, but would need consensus, is to take the
SEEK_HEAD/SEEK_TAIL case statements out of WinGetFuncArgInPartition. This
function is only called by leadlag_common, which uses SEEK_CURRENT, so
those case statements are never reached. Taking them out simplifies the
code as it is but means future features might need it re-added (although
I'm not sure the use case for it, as that function is for window funcs that
ignore the frame options).


> Note that RESPECT/IGNORE are not registered as reserved keywords in
> this patch (but registered as unreserved keywords). I am not sure if
> this is acceptable or not.
>
> > The questions of how we interface to the individual window functions
> > are really independent of how we handle the parsing problem.  My
> > first inclination is to just pass the flags down to the window functions
> > (store them in WindowObject and provide some additional inquiry functions
> > in windowapi.h) and let them deal with it.

I agree with this.  Also I do not change the prototype of
> nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
> parser to parse/analysis and finally to WindowObject.
>

This is a much better option than my older patch which needed to change the
functions.


> > It's also worth wondering if we couldn't just implement the flags in
> > some generic fashion and not need to involve the window functions at
> > all.  FROM LAST, for example, could and perhaps should be implemented
> > by inverting the sort order.  Possibly IGNORE NULLS could be implemented
> > inside the WinGetFuncArgXXX functions?  These behaviors might or might
> > not make much sense with other window functions, but that doesn't seem
> > like it's our problem.
>
> Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
> this direction (not implemented in the patch at this point).
>

+1 for doing it here. Maybe also refactor WinGetFuncArgInFrame, putting the
exclusion checks in a static function as that function is already pretty
big?


> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2023-04-22 Thread Tatsuo Ishii
I revisited the thread:
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value).  No document nor test patches are
included for now.

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

> The questions of how we interface to the individual window functions
> are really independent of how we handle the parsing problem.  My
> first inclination is to just pass the flags down to the window functions
> (store them in WindowObject and provide some additional inquiry functions
> in windowapi.h) and let them deal with it.

I agree with this.  Also I do not change the prototype of
nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.

> It's also worth wondering if we couldn't just implement the flags in
> some generic fashion and not need to involve the window functions at
> all.  FROM LAST, for example, could and perhaps should be implemented
> by inverting the sort order.  Possibly IGNORE NULLS could be implemented
> inside the WinGetFuncArgXXX functions?  These behaviors might or might
> not make much sense with other window functions, but that doesn't seem
> like it's our problem.

Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 07f01f8859e159c908ada72e8f53daf51e0b8bdf Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Sat, 22 Apr 2023 16:52:50 +0900
Subject: [PATCH v1 1/3] Implement IGNORE or RESPECT NULLS parse/analysis part.

Implement SQL standard's IGNORE/RESPECT NULLS clause for window functions.
For now, only nth_value() can use this option.
---
 src/backend/parser/gram.y   | 22 ++
 src/backend/parser/parse_func.c | 13 +
 src/include/nodes/parsenodes.h  |  8 
 src/include/nodes/primnodes.h   |  2 ++
 src/include/parser/kwlist.h |  2 ++
 5 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index acf6cf4866..2980ecd666 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	MergeWhenClause *mergewhen;
 	struct KeyActions *keyactions;
 	struct KeyAction *keyaction;
+	NullTreatment	nulltreatment;
 }
 
 %type 	stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -661,6 +662,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	json_object_constructor_null_clause_opt
 	json_array_constructor_null_clause_opt
 
+%type 		opt_null_treatment
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -718,7 +721,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	HANDLER HAVING HEADER_P HOLD HOUR_P
 
-	IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+	IDENTITY_P IF_P IGNORE_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
 	INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
 	INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
 	INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -752,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 	RANGE READ REAL REASSIGN RECHECK RECURSIVE REF_P REFERENCES REFERENCING
 	REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
-	RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
+	RESET RESPECT RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
 	ROUTINE ROUTINES ROW ROWS RULE
 
 	SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
@@ -15213,7 +15216,7 @@ func_application: func_name '(' ')'
  * (Note that many of the special SQL functions wouldn't actually make any
  * sense as functional index entries, but we ignore that consideration here.)
  */
-func_expr: func_application within_group_clause filter_clause over_clause
+func_expr: func_application within_group_clause filter_clause opt_null_treatment over_clause
 {
 	FuncCall   *n 

Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Michael Paquier
On Sat, Apr 22, 2023 at 07:42:36AM -0400, Andrew Dunstan wrote:
> Perhaps we should start with a buildfarm module, which would run pg_indent
> --show-diff.

Nice, I didn't know this one and it has been mentioned a bit on this
thread.  Indeed, it is possible to just rely on that.
--
Michael


signature.asc
Description: PGP signature


Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Andrew Dunstan


On 2023-04-22 Sa 04:50, Michael Paquier wrote:

On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote:

For 2 the upstream thread listed two approaches:
a. Install a pre-receive git hook on the git server that rejects
pushes to master that are not indented
b. Add a test suite that checks if the code is correctly indented, so
the build farm would complain about it. (Suggested by Peter E)

I think both a and b would work to achieve 2. But as Peter E said, b
indeed sounds like less of a divergence of the status quo. So my vote
would be for b.

FWIW, I think that there is value for both of them.  Anyway, isn't 'a'
exactly the same as 'b' in design?  Both require a build of
pg_bsd_indent, meaning that 'a' would also need to run an equivalent
of the regression test suite, but it would be actually costly
especially if pg_bsd_indent itself is patched.  I think that getting
more noisy on this matter with 'b' would be enough, but as an extra
PG_TEST_EXTRA for committers to set.

Such a test suite would need a dependency to the 'git' command itself,
which is not something that could be safely run in a release tarball,
in any case.



Perhaps we should start with a buildfarm module, which would run 
pg_indent --show-diff. That would only need to run on one animal, so a 
failure wouldn't send the whole buildfarm red. This would be pretty easy 
to do.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Logging parallel worker draught

2023-04-22 Thread Amit Kapila
On Fri, Apr 21, 2023 at 6:34 PM Benoit Lobréau
 wrote:
>
> Following my previous mail about adding stats on parallelism[1], this
> patch introduces the log_parallel_worker_draught parameter, which
> controls whether a log message is produced when a backend attempts to
> spawn a parallel worker but fails due to insufficient worker slots.
>

I don't think introducing a GUC for this is a good idea. We can
directly output this message in the server log either at LOG or DEBUG1
level.

-- 
With Regards,
Amit Kapila.




Re: The order of queues in row lock is changed (not FIFO)

2023-04-22 Thread Amit Kapila
On Tue, Mar 7, 2023 at 4:49 PM Ryo Yamaji (Fujitsu)
 wrote:
>
> From: Tom Lane 
> > I don't see a bug here, or at least I'm not willing to move the goalposts 
> > to where you want them to be.
> > I believe that we do guarantee arrival-order locking of individual tuple 
> > versions.  However, in the
> > example you show, a single row is being updated over and over.  So, 
> > initially we have a single "winner"
> > transaction that got the tuple lock first and updated the row.  When it 
> > commits, each other transaction
> > serially comes off the wait queue for that tuple lock and discovers that it 
> > now needs a lock on a
> > different tuple version than it has got.
> > So it tries to get lock on whichever is the latest tuple version.
> > That might still appear serial as far as the original 100 sessions go, 
> > because they were all queued on the
> > same tuple lock to start with.
> > But when the new sessions come in, they effectively line-jump because they 
> > will initially try to lock
> > whichever tuple version is committed live at that instant, and thus they 
> > get ahead of whichever remain of
> > the original 100 sessions for the lock on that tuple version (since those 
> > are all still blocked on some older
> > tuple version, whose lock is held by whichever session is performing the 
> > next-to-commit update).
>
> > I don't see any way to make that more stable that doesn't involve requiring 
> > sessions to take locks on
> > already-dead-to-them tuples; which sure seems like a nonstarter, not least 
> > because we don't even have a
> > way to find such tuples.  The update chains only link forward not back.
>
> Thank you for your reply.
> When I was doing this test, I confirmed the following two actions.
> (1) The first 100 sessions are overtaken by the last 10.
> (2) the order of the preceding 100 sessions changes
>
> (1) I was concerned from the user's point of view that the lock order for the 
> same tuple was not preserved.
> However, as you pointed out, in many cases the order of arrival is guaranteed 
> from the perspective of the tuple.
> You understand the PostgreSQL architecture and understand that you need to 
> use it.
>
> (2) This behavior is rare. Typically, the first session gets 
> AccessExclusiveLock to the tuple and ShareLock to the
> transaction ID. Subsequent sessions will wait for AccessExclusiveLock to the 
> tuple. However, we ignored
> AccessExclusiveLock in the tuple from the log and observed multiple sessions 
> waiting for ShareLock to the
> transaction ID. The log shows that the order of the original 100 sessions has 
> been changed due to the above
> movement.
>

I think for (2), the test is hitting the case of walking the update
chain via heap_lock_updated_tuple() where we don't acquire the lock on
the tuple. See comments atop heap_lock_updated_tuple(). You can verify
if that is the case by adding some DEBUG logs in that function.

> At first, I thought both (1) and (2) were obstacles. However, I understood 
> from your indication that (1) is not a bug.
> I would be grateful if you could also give me your opinion on (2).
>

If my above observation is correct then it is not a bug as it is
behaving as per the current design.

-- 
With Regards,
Amit Kapila.




Re: Mistake in freespace/README?

2023-04-22 Thread Aleksander Alekseev
Hi,

> Hopefully I didn't miss or misunderstood anything.

And for sure I did. Particularly the fact that the tree inside the FSM
page is not a perfect binary tree:

"""
   0
   1   2
 3   4   5   6
7 8 9 A B
"""

So there are 13 levels and approximately 4K slots per FSM page after all:

```
>>> 8*1024-24-4 - sum([pow(2,n) for n in range(0,12) ])
4069
```

-- 
Best regards,
Aleksander Alekseev




Mistake in freespace/README?

2023-04-22 Thread Aleksander Alekseev
Hi,

I think there could be a mistake in freespace/README. There are
several places where it says about ~4000 slots per FSM page for the
default BLKSZ:

"""
For example, assuming each FSM page can hold information about 4 pages (in
reality, it holds (BLCKSZ - headers) / 2, or ~4000 with default BLCKSZ),
"""

and:

"""
To keep things simple, the tree is always constant height. To cover the
maximum relation size of 2^32-1 blocks, three levels is enough with the default
BLCKSZ (4000^3 > 2^32).
"""

Let's determine the amount of levels in each FSM page first. I'm going
to use Python for this. Note that range(0,13) returns 13 numbers from
0 to 12:

```
>>> sum([pow(2,n) for n in range(0,13) ])
8191
>>> 8*1024
8192
```

13 levels are not going to fit since we need extra 24 bytes per
PageHeaderData and a few more bytes for an int value fp_next_slot.
Which gives us 12 levels and the number of slots:

```
>>> # there are pow(2,0) == 1 byte on the 1st level of the tree
>>> pow(2,12 - 1)
2048
```

The number of levels in the entire, high-level tree, seems to be correct:

```
>>> pow(2048,3) > pow(2,32) - 1
True
```

Hopefully I didn't miss or misunderstood anything.

Thoughts?

-- 
Best regards,
Aleksander Alekseev




Re: Fix documentation for max_wal_size and min_wal_size

2023-04-22 Thread Fujii Masao




On 2023/04/22 17:39, Michael Paquier wrote:

On Tue, Apr 18, 2023 at 01:46:21AM -0700, sirisha chamarthi wrote:

"The default size is 80MB. However, if you have changed the WAL segment size
  from the default of 16MB with the initdb option --wal-segsize, it will be
five times the segment size."


Yes, I think that something close to that would be OK.  Do others have
any comments or extra ideas to offer?


If the WAL segment size is changed from the default value of 16MB during initdb,
the value of min_wal_size in postgresql.conf is set to five times the new 
segment
size by initdb. However, pg_settings.boot_val (the value of min_wal_size used
when the parameter is not otherwise set) is still 80MB. So if 
pg_settings.boot_val
should be documented as the default value, the current description seems OK.

Or we can clarify that the default value of min_wal_size is 80MB, but when 
initdb
is run with a non-default segment size, the value in postgresql.conf is changed
to five times the new segment size? This will help users better understand
the behavior of the setting and how it is affected by changes made during 
initdb.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: run pgindent on a regular basis / scripted manner

2023-04-22 Thread Michael Paquier
On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote:
> For 2 the upstream thread listed two approaches:
> a. Install a pre-receive git hook on the git server that rejects
> pushes to master that are not indented
> b. Add a test suite that checks if the code is correctly indented, so
> the build farm would complain about it. (Suggested by Peter E)
> 
> I think both a and b would work to achieve 2. But as Peter E said, b
> indeed sounds like less of a divergence of the status quo. So my vote
> would be for b.

FWIW, I think that there is value for both of them.  Anyway, isn't 'a'
exactly the same as 'b' in design?  Both require a build of
pg_bsd_indent, meaning that 'a' would also need to run an equivalent
of the regression test suite, but it would be actually costly
especially if pg_bsd_indent itself is patched.  I think that getting
more noisy on this matter with 'b' would be enough, but as an extra
PG_TEST_EXTRA for committers to set.

Such a test suite would need a dependency to the 'git' command itself,
which is not something that could be safely run in a release tarball,
in any case.
--
Michael


signature.asc
Description: PGP signature


Re: A Question about InvokeObjectPostAlterHook

2023-04-22 Thread Michael Paquier
On Fri, Apr 21, 2023 at 04:16:10PM +0800,  Legs Mansion wrote:
> actually, some location can be tricky to add.
> it looks like CREATE, but it’s actually ALTER, should call
> InvokeObjectPostAlterHook instead
> ofInvokeObjectPostCreateHook? eg.,CREATE OR REPLACE, CREATE
> TYPE(perfecting shell type) 

Sure, it could be possible to plaster more of these depending on the
control one may want with OATs.  Coming back to the main you point of
the thread you were making, what are the use cases with ALTER TABLE
you were interested in for sepgsql on top of what the patch I sent
upthread is doing?

Note that it is perfectly fine to do the changes incrementally, though
I'd rather add some proper coverage for each one of them using the
module I've patched (sepgsql's tests are annoying to setup and run).
--
Michael


signature.asc
Description: PGP signature


Re: Fix documentation for max_wal_size and min_wal_size

2023-04-22 Thread Michael Paquier
On Tue, Apr 18, 2023 at 01:46:21AM -0700, sirisha chamarthi wrote:
> "The default size is 80MB. However, if you have changed the WAL segment size
>  from the default of 16MB with the initdb option --wal-segsize, it will be
> five times the segment size."

Yes, I think that something close to that would be OK.  Do others have
any comments or extra ideas to offer?
--
Michael


signature.asc
Description: PGP signature