Re: doc: Fix description of how the default user name is chosen

2022-11-24 Thread Peter Eisentraut

On 01.11.22 22:31, David G. Johnston wrote:
This is the only sentence I claim is factually incorrect, with a 
suggested re-wording.


committed





Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-24 Thread Masahiko Sawada
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila  wrote:
>
> On Tue, Nov 22, 2022 at 10:33 PM Maxim Orlov  wrote:
> >>
> >>
> >> Regarding the tests, the patch includes a new scenario to
> >> reproduce this issue. However, since the issue can be reproduced also
> >> by the existing scenario (with low probability, though), I'm not sure
> >> it's worth adding the new scenario.
> >
> > AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it 
> > does not worth it. But, I do not have a strong opinion here.
> > Let's add tests in a separate commit and let the actual committer to decide 
> > what to do, should we?
> >
>
> +1 to not have a test for this as the scenario can already be tested
> by the existing set of tests.

Agreed not to have a test case for this.

I've attached an updated patch. I've confirmed this patch works for
all supported branches.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


0001-Reset-InitialRunningXacts-array-when-freeing-SnapBui.patch
Description: Binary data


Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-11-24 Thread mahendrakar s
Hi Jacob,

I had validated Github by skipping the discovery mechanism and letting
the provider extension pass on the endpoints. This is just for
validation purposes.
If it needs to be supported, then need a way to send the discovery
document from extension.


Thanks,
Mahendrakar.

On Thu, 24 Nov 2022 at 09:16, Andrey Chudnovsky  wrote:
>
> > How does this differ from the previous proposal? The OAUTHBEARER SASL
> > mechanism already relies on OIDC for discovery. (I think that decision
> > is confusing from an architectural and naming standpoint, but I don't
> > think they really had an alternative...)
> Mostly terminology questions here. OAUTHBEARER SASL appears to be the
> spec about using OAUTH2 tokens for Authentication.
> While any OAUTH2 can generally work, we propose to specifically
> highlight that only OIDC providers can be supported, as we need the
> discovery document.
> And we won't be able to support Github under that requirement.
> Since the original patch used that too - no change on that, just
> confirmation that we need OIDC compliance.
>
> > 0) The original hook proposal upthread, I thought, was about allowing
> > libpq's flow implementation to be switched out by the application. I
> > don't see that approach taken here. It's fine if that turned out to be a
> > bad idea, of course, but this patch doesn't seem to match what we were
> > talking about.
> We still plan to allow the client to pass the token. Which is a
> generic way to implement its own OAUTH flows.
>
> > 1) I'm really concerned about the sudden explosion of flows. We went
> > from one flow (Device Authorization) to six. It's going to be hard
> > enough to validate that *one* flow is useful and can be securely
> > deployed by end users; I don't think we're going to be able to maintain
> > six, especially in combination with my statement that iddawc is not an
> > appropriate dependency for us.
>
> > I'd much rather give applications the ability to use their own OAuth
> > code, and then maintain within libpq only the flows that are broadly
> > useful. This ties back to (0) above.
> We consider the following set of flows to be minimum required:
> - Client Credentials - For Service to Service scenarios.
> - Authorization Code with PKCE - For rich clients,including pgAdmin.
> - Device code - for psql (and possibly other non-GUI clients).
> - Refresh code (separate discussion)
> Which is pretty much the list described here:
> https://oauth.net/2/grant-types/ and in OAUTH2 specs.
> Client Credentials is very simple, so does Refresh Code.
> If you prefer to pick one of the richer flows, Authorization code for
> GUI scenarios is probably much more widely used.
> Plus it's easier to implement too, as interaction goes through a
> series of callbacks. No polling required.
>
> > 2) Breaking the refresh token into its own pseudoflow is, I think,
> > passing the buck onto the user for something that's incredibly security
> > sensitive. The refresh token is powerful; I don't really want it to be
> > printed anywhere, let alone copy-pasted by the user. Imagine the
> > phishing opportunities.
>
> > If we want to support refresh tokens, I believe we should be developing
> > a plan to cache and secure them within the client. They should be used
> > as an accelerator for other flows, not as their own flow.
> It's considered a separate "grant_type" in the specs / APIs.
> https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens
>
> For the clients, it would be storing the token and using it to authenticate.
> On the question of sensitivity, secure credentials stores are
> different for each platform, with a lot of cloud offerings for this.
> pgAdmin, for example, has its own way to secure credentials to avoid
> asking users for passwords every time the app is opened.
> I believe we should delegate the refresh token management to the clients.
>
> >3) I don't like the departure from the OAUTHBEARER mechanism that's
> > presented here. For one, since I can't see a sample plugin that makes
> > use of the "flow type" magic numbers that have been added, I don't
> > really understand why the extension to the mechanism is necessary.
> I don't think it's much of a departure, but rather a separation of
> responsibilities between libpq and upstream clients.
> As libpq can be used in different apps, the client would need
> different types of flows/grants.
> I.e. those need to be provided to libpq at connection initialization
> or some other point.
> We will change to "grant_type" though and use string to be closer to the spec.
> What do you think is the best way for the client to signal which OAUTH
> flow should be used?
>
> On Wed, Nov 23, 2022 at 12:05 PM Jacob Champion  
> wrote:
> >
> > On 11/23/22 01:58, mahendrakar s wrote:
> > > We validated on  libpq handling OAuth natively with different flows
> > > with different OIDC certified providers.
> > >
> > > Flows: Device Code, Client Credentials and Refresh Token.
> > > Providers: Microsoft, Google 

Re: Non-decimal integer literals

2022-11-24 Thread Peter Eisentraut

On 23.11.22 09:54, David Rowley wrote:

On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
 wrote:

Here is a new patch.


This looks like quite an inefficient way to convert a hex string into an int64:

 while (*ptr && isxdigit((unsigned char) *ptr))
 {
 int8digit = hexlookup[(unsigned char) *ptr];

 if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
 unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
 goto out_of_range;

 ptr++;
 }

I wonder if you'd be better off with something like:

 while (*ptr && isxdigit((unsigned char) *ptr))
 {
 if (unlikely(tmp & UINT64CONST(0xF000)))
 goto out_of_range;

 tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
 }

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc.  It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.


My code follows the style used for parsing the decimal integers. 
Keeping that consistent is valuable I think.  I think the proposed 
change makes the code significantly harder to understand.  Also, what 
you are suggesting here would amount to an attempt to make parsing 
hexadecimal integers even faster than parsing decimal integers.  Is that 
useful?





Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-24 Thread Aleksander Alekseev
Hi Chris,

> XID wraparound doesn't happen to healthy databases
> If you disagree, I would like to hear why.

Consider the case when you run a slow OLAP query that takes 12h to
complete and 100K TPS of fast OLTP-type queries on the same system.
The fast queries will consume all 32-bit XIDs in less than 12 hours,
while the OLAP query started 12 hours ago didn't finish yet and thus
its tuples can't be frozen.

> I agree that 64-bit xids are a good idea.  I just don't think that existing 
> safety measures should be ignored or reverted.

Fair enough.

> The problem isn't just the lack of disk space, but the difficulty that stuck 
> autovacuum runs pose in resolving the issue.  Keep in mind that everything 
> you can do to reclaim disk space (vacuum full, cluster, pg_repack) will be 
> significantly slowed down by an extremely bloated table/index comparison.  
> The problem is that if you are running out of disk space, and your time to 
> recovery much longer than expected, then you have a major problem.  It's not 
> just one or the other, but the combination that poses the real risk here.
>
> Now that's fine if you want to run a bloatless table engine but to my 
> knowledge none of these are production-ready yet.  ZHeap seems mostly 
> stalled.  Oriole is still experimental.  But with the current PostgreSQL 
> table structure.
>
> A DBA can monitor disk space, but if the DBA is not also monitoring xid lag, 
> then by the time corrective action is taken it may be too late.

Good point but I still don't think this is related to the XID
wraparound problem.

-- 
Best regards,
Aleksander Alekseev




Re: [PATCH] minor optimization for ineq_histogram_selectivity()

2022-11-24 Thread Frédéric Yhuel




On 11/23/22 16:59, Tom Lane wrote:

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?=  writes:

On 10/24/22 17:26, Frédéric Yhuel wrote:

When studying the weird planner issue reported here [1], I came up with
the attached patch. It reduces the probability of calling
get_actual_variable_range().



This isn't very useful anymore thanks to this patch:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c


I hadn't looked at this patch before, but now that I have, I'm inclined
to reject it anyway.  It just moves the problem around: now, instead of
possibly doing an unnecessary index probe at the right end, you're
possibly doing an unnecessary index probe at the left end.


Indeed... it seemed to me that both versions would do an unnecessary 
index probe at the left end, but I wasn't careful enough :-/



It also
looks quite weird compared to the normal coding of binary search.



That's right.


I wonder if there'd be something to be said for leaving the initial
probe calculation alone and doing this:

 else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
+   {
+   /* Don't probe the endpoint until we have to. */
+   if (probe > lobound)
+   probe--;
+   else
 have_end = get_actual_variable_range(root,
  vardata,
  sslot.staop,
  collation,
  NULL,
  &sslot.values[probe]);
+   }

On the whole though, it seems like a wart.




Yeah... it's probably wiser not risking introducing a bug, only to save 
an index probe in rare cases (and only 100 reads, thanks to 9c6ad5ea).


Thank you for having had a look at it.

Best regards,
Frédéric





Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-24 Thread Julien Rouhaud
Hi,

On Thu, Nov 24, 2022 at 02:37:23PM +0800, Julien Rouhaud wrote:
> On Thu, Nov 24, 2022 at 02:07:21PM +0900, Michael Paquier wrote:
> > On Wed, Nov 23, 2022 at 03:56:50PM +0800, Julien Rouhaud wrote:
> > > The depth 0 is getting used quite a lot now, maybe we should have a 
> > > define for
> > > it to make it easier to grep, like TOP_LEVEL_AUTH_FILE or something like 
> > > that?
> > > And also add a define for the magical 10 for the max inclusion depth, for 
> > > both
> > > auth files and GUC files while at it?
> > 
> > Sounds like a good idea to me, and it seems to me that this had better
> > be unified between the GUCs (see ParseConfigFp() that hardcodes a
> > depth of 0) and hba.c.  It looks like they could be added to
> > conffiles.h, as of CONF_FILE_START_{LEVEL,DEPTH} and
> > CONF_FILE_MAX_{LEVEL,DEPTH}.  Would you like to send a patch?

So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH.  Attached v22
that fixes it in all the places I found.

> > Now, to the tests..
> > 
> > > Mmm, I haven't looked deeply so I'm not sure if the perl podules are 
> > > aware of
> > > it or not, but maybe we could somehow detect the used delimiter at the
> > > beginning after normalizing the directory, and use a $DELIM rather than a 
> > > plain
> > > "/"?
> > 
> > I am not sure.  Could you have a look and see if you can get the CI
> > back to green?  The first thing I would test is to switch the error
> > patterns to be regexps based on the basenames rather than the full
> > paths (tweaking the queries on the system views to do htat), so as we
> > avoid all this business with slash and backslash transformations.

Apparently just making sure that the $node->data_dir consistently uses forward
slashes is enough to make the CI happy, for VS 2019 [1] and MinGW64 [2], so
done this way with an extra normalization step.

[1] https://cirrus-ci.com/task/4944203946917888
[2] https://cirrus-ci.com/task/6070103853760512
>From 879cf469d00d9274b67b80eb5fe47dfccf03022d Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 24 Nov 2022 16:57:53 +0800
Subject: [PATCH v22 1/2] Introduce macros for initial/maximum depth levels for
 configuration files

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
---
 src/backend/commands/extension.c  | 4 +++-
 src/backend/libpq/hba.c   | 8 
 src/backend/utils/misc/guc-file.l | 5 +++--
 src/backend/utils/misc/guc.c  | 8 +---
 src/include/utils/conffiles.h | 3 +++
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 806d6056ab..de01b792b9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -60,6 +60,7 @@
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/conffiles.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
@@ -515,7 +516,8 @@ parse_extension_control_file(ExtensionControlFile *control,
 * Parse the file content, using GUC's file parsing code.  We need not
 * check the return value since any errors will be thrown at ERROR 
level.
 */
-   (void) ParseConfigFp(file, filename, 0, ERROR, &head, &tail);
+   (void) ParseConfigFp(file, filename, CONF_FILE_START_DEPTH, ERROR, 
&head,
+&tail);
 
FreeFile(file);
 
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 862ec18e91..8f1a0c4c73 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -620,7 +620,7 @@ free_auth_file(FILE *file, int depth)
FreeFile(file);
 
/* If this is the last cleanup, remove the tokenization context */
-   if (depth == 0)
+   if (depth == CONF_FILE_START_DEPTH)
{
MemoryContextDelete(tokenize_context);
tokenize_context = NULL;
@@ -650,7 +650,7 @@ open_auth_file(const char *filename, int elevel, int depth,
 * avoid dumping core due to stack overflow if an include file loops 
back
 * to itself.  The maximum nesting depth is pretty arbitrary.
 */
-   if (depth > 10)
+   if (depth > CONF_FILE_MAX_DEPTH)
{
ereport(elevel,
(errcode_for_file_access(),
@@ -684,7 +684,7 @@ open_auth_file(const char *filename, int elevel, int depth,
 * tokenization.  This will be closed with this file when coming back to
 * this level of cleanup.
 */
-   if (depth == 0)
+   if (depth == CONF_FILE_START_DEPTH)
{
/*
 * A context may be present, but assume that it has been 
eliminated
@@ -762,7 +762,7 @@ tokenize_auth_file(const char *filename, FILE *file, List 
**tok_lines,
 
initStringInfo(&buf);
 
-   if (depth == 0)
+   if (depth == CONF_FILE_START_DEPTH)
*tok_lines = 

Re: Non-decimal integer literals

2022-11-24 Thread David Rowley
On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut
 wrote:
> My code follows the style used for parsing the decimal integers.
> Keeping that consistent is valuable I think.  I think the proposed
> change makes the code significantly harder to understand.  Also, what
> you are suggesting here would amount to an attempt to make parsing
> hexadecimal integers even faster than parsing decimal integers.  Is that
> useful?

Isn't it being faster one of the major use cases for this feature?   I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?

David




Re: Transparent column encryption

2022-11-24 Thread Jehan-Guillaume de Rorthais
On Wed, 23 Nov 2022 19:45:06 +0100
Peter Eisentraut  wrote:
> On 28.10.22 12:16, Jehan-Guillaume de Rorthais wrote:
[...]

> >* I wonder if encryption related fields in ParameterDescription and
> >  RowDescription could be optional somehow? The former might be quite
> >  large when using a lot of parameters (like, imagine a large and ugly
> >  "IN($1...$N)"). On the other hand, these messages are not sent in high
> >  frequency anyway...  
> 
> They are only used if you turn on the column_encryption protocol option. 
>   Or did you mean make them optional even then?

I meant even when column_encryption is turned on.

Regards,




Re: Prefetch the next tuple's memory during seqscans

2022-11-24 Thread David Rowley
On Wed, 23 Nov 2022 at 10:58, David Rowley  wrote:
> My current thoughts are that it might be best to go with 0005 to start
> with.  I know Melanie is working on making some changes in this area,
> so perhaps it's best to leave 0002 until that work is complete.

I tried running TPC-H @ scale 5 with master (@d09dbeb9) vs master +
0001 + 0005 patch. The results look quite promising.  Query 15 seems
to run 15% faster and overall it's 4.23% faster.

Full results are attached.

David
query   master  Master + 0001 + 0005compare
1   25999.5 25793.6 100.8%
2   1171.0  1152.0  101.6%
3   6180.5  5456.5  113.3%
4   1167.1  1107.0  105.4%
5   4968.3  4604.8  107.9%
6   3696.6  3306.4  111.8%
7   5501.4  4905.6  112.1%
8   1394.8  1345.2  103.7%
9   10861.2 11159.8 97.3%
10  4354.3  4356.4  100.0%
11  382.5   386.3   99.0%
12  3888.6  3838.0  101.3%
13  6905.0  6622.5  104.3%
14  3886.1  3429.8  113.3%
15  8009.7  6927.8  115.6%
16  2406.2  2363.9  101.8%
17  14.614.998.1%
18  11735.6 11453.9 102.5%
19  44.944.7100.4%
20  262.8   246.6   106.6%
21  3014.1  3027.6  99.6%
22  176.4   179.3   98.4%


Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-24 Thread Alvaro Herrera
On 2022-Nov-23, Peter Geoghegan wrote:

> On Wed, Nov 23, 2022 at 2:54 AM Alvaro Herrera  
> wrote:
> > Something like the attached.  It would result in output like this:
> > WARNING:  new multixact has more than one updating member: 0 2[17378 
> > (keysh), 17381 (nokeyupd)]
> >
> > Then it should be possible to trace (in pg_waldump output) the
> > operations of each of the transactions that have any status in the
> > multixact that includes some form of "upd".
> 
> That seems very useful.

Okay, pushed to all branches.

> Separately, I wonder if it would make sense to add additional
> defensive checks to FreezeMultiXactId() for this. There is an
> assertion that should catch the presence of multiple updaters in a
> single Multi when it looks like we have to generate a new Multi to
> carry the XID members forward (typically something we only need to do
> during a VACUUM FREEZE). We could at least make that
> "Assert(!TransactionIdIsValid(update_xid));" line into a defensive
> "can't happen" ereport(). It couldn't hurt, at least -- we already
> have a similar relfrozenxid check nearby, added after the "freeze the
> dead" bug was fixed.

Hmm, agreed.  I'll see about that separately.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I'm always right, but sometimes I'm more right than other times."
  (Linus Torvalds)




Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-24 Thread Maxim Orlov
>
> Agreed not to have a test case for this.
>
> I've attached an updated patch. I've confirmed this patch works for
> all supported branches.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
It works for me as well. Thanks!

I've created a commitfest entry for this patch, see
https://commitfest.postgresql.org/41/4024/
Hope, it will be committed soon.

-- 
Best regards,
Maxim Orlov.


Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-24 Thread Amit Kapila
On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada  wrote:
>
> On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila  wrote:
>
> Agreed not to have a test case for this.
>
> I've attached an updated patch. I've confirmed this patch works for
> all supported branches.
>

I have slightly changed the checks used in the patch, otherwise looks
good to me. I am planning to push (v11-v15) the attached tomorrow
unless there are more comments.

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-uninitialized-access-to-InitialRunningXacts-d.patch
Description: Binary data


Re: postgres_fdw: batch inserts vs. before row triggers

2022-11-24 Thread Etsuro Fujita
On Fri, Nov 18, 2022 at 8:46 PM Etsuro Fujita  wrote:
> Attached is a patch for fixing these issues.

Here is an updated patch.  In the attached, I added an assertion to
ExecInsert().  Also, I tweaked comments and test cases a little bit,
for consistency.  Also, I noticed a copy-and-pasteo in a comment in
ExecBatchInsert(), so I fixed it as well.

Barring objections, I will commit the patch.

Best regards,
Etsuro Fujita


fix-handling-of-pending-inserts-2.patch
Description: Binary data


Report roles in pg_upgrade pg_ prefix check

2022-11-24 Thread Daniel Gustafsson
Looking at a recent pg_upgrade thread I happened to notice that the check for
roles with a pg_ prefix only reports the error, not the roles it found.  Other
similar checks where the user is expected to alter the old cluster typically
reports the found objects in a textfile.  The attached adds reporting to make
that class of checks consistent (the check for prepared transactions which also
isn't reporting is different IMO as it doesn't expect ALTER commands).

As this check is only executed against the old cluster the patch removes the
check when printing the error.

--
Daniel Gustafsson   https://vmware.com/



v1-0001-Report-incompatible-roles-in-pg_upgrade-checking.patch
Description: Binary data


Re: Bug in row_number() optimization

2022-11-24 Thread Sergey Shinderuk

On 24.11.2022 06:16, Richard Guo wrote:

Regarding how to fix this problem, firstly I believe we need to evaluate
window functions in the per-tuple memory context, as the HEAD does.
When we decide we need to go into pass-through mode, I'm thinking that
we can just copy out the results of the last evaluation to the per-query
memory context, while still storing their pointers in ecxt_aggvalues.

Does this idea work?

Although I'm not familiar with the code, this makes sense to me.

You proposed:

+#ifdef USE_FLOAT8_BYVAL
+   evalWfuncContext = 
winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory;

+#else
+   evalWfuncContext = 
winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;

+#endif

Shouldn't we handle any pass-by-reference type the same? I suppose, a 
user-defined window function can return some other type, not int8.


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/





Re: postgres_fdw binary protocol support

2022-11-24 Thread Ilya Gladyshev



> 22 нояб. 2022 г., в 17:10, Ashutosh Bapat  
> написал(а):
> 
> Hi Illya,
> 
> On Mon, Nov 21, 2022 at 8:50 PM Ilya Gladyshev
>  wrote:
>> 
>> Hi everyone,
>> 
>> I have made a patch that introduces support for libpq binary protocol
>> in postgres_fdw. The idea is simple, when a user knows that the foreign
>> server is binary compatible with the local and his workload could
>> somehow benefit from using binary protocol, it can be switched on for a
>> particular server or even a particular table.
>> 
> 
> Why do we need this feature? If it's for performance then do we have
> performance numbers?
Yes, it is for performance, but I am yet to do the benchmarks. My initial idea 
was that binary protocol must be more efficient than text, because as I 
understand that’s the whole point of it. However, the minor tests that I have 
done do not prove this and I couldn’t find any benchmarks for it online, so I 
will do further tests to find a use case for it.
> About the patch itself, I see a lot of if (binary) {} else {} block
> which are repeated. It will be good if we can add functions/macros to
> avoid duplication.
Yea, that’s true, I have some ideas about improving it



Re: Lockless queue of waiters in LWLock

2022-11-24 Thread Pavel Borisov
Hi, hackers!
Andres Freund recently committed his nice LWLock optimization
a4adc31f6902f6f. So I've rebased the patch on top of the current
master (PFA v5).

Regards,
Pavel Borisov,
Supabase.


v5-0001-Lockless-queue-of-LWLock-waiters.patch
Description: Binary data


indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu
Hi,
I was looking at :

commit d09dbeb9bde6b9faabd30e887eff4493331d6424
Author: David Rowley 
Date:   Thu Nov 24 17:21:44 2022 +1300

Speedup hash index builds by skipping needless binary searches

In _hash_pgaddtup(), it seems the indentation is off for the assertion.

Please take a look at the patch.

Thanks


hash-pgaddtup-indent.patch
Description: Binary data


Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

2022-11-24 Thread Bharath Rupireddy
Hi,

While working on something else, I noticed that
WaitXLogInsertionsToFinish() goes the LWLockWaitForVar() route even
for a process that's holding a WAL insertion lock. Typically, a
process holding WAL insertion lock reaches
WaitXLogInsertionsToFinish() when it's in need of WAL buffer pages for
its insertion and waits for other older in-progress WAL insertions to
finish. This fact guarantees that the process holding a WAL insertion
lock will never have its insertingAt less than 'upto'.

With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.

I'm attaching a patch herewith. Here's the cirrus-ci tests -
https://github.com/BRupireddy/postgres/tree/avoid_LWLockWaitForVar_for_currently_held_wal_ins_lock_v1.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Avoid-LWLockWaitForVar-for-currently-held-WAL-ins.patch
Description: Binary data


Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Daniel Gustafsson
> On 24 Nov 2022, at 13:42, Ted Yu  wrote:

> In _hash_pgaddtup(), it seems the indentation is off for the assertion.
> 
> Please take a look at the patch.

Indentation is handled by applying src/tools/pgindent to the code, and
re-running it on this file yields no re-indentation so this is in fact correct
according to the pgindent rules.

--
Daniel Gustafsson   https://vmware.com/





Re: TAP output format in pg_regress

2022-11-24 Thread Daniel Gustafsson
> On 23 Nov 2022, at 00:56, Andres Freund  wrote:
> On 2022-11-22 23:17:44 +0100, Daniel Gustafsson wrote:
>> The attached v10 attempts to address the points raised above.  Notes and
>> diagnostics are printed to stdout/stderr respectively and the TAP emitter is
>> changed to move more of the syntax into it making it less painful on the
>> translators.
> 
> Thanks! I played a bit with it locally and it's nice.

Thanks for testing and reviewing!

> I think it might be worth adding a few more details to the output on stderr,
> e.g. a condensed list of failed tests, as that's then printed in the errorlogs
> summary in meson after running all tests. As we don't do that in the current
> output, that seems more like an optimization for later.

Since this patch already change the output verbosity it seems within the goal-
posts to do this now rather than later.  I've added a first stab at this in the
attached patch.

> My compiler complains with:
> 
> [6/80 7   7%] Compiling C object src/test/regress/pg_regress.p/pg_regress.c.o
> ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c: In 
> function ‘emit_tap_output_v’:
> ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:354:9: 
> warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ 
> format attribute [-Wsuggest-attribute=format]
>  354 | vfprintf(fp, fmt, argp);
>  | ^~~~
> ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:356:17: 
> warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ 
> format attribute [-Wsuggest-attribute=format]
>  356 | vfprintf(logfile, fmt, argp_logfile);
>  | ^~~~
> 
> Which seems justified and easily remedied via pg_attribute_printf().

Fixed.

> I think there might be something slightly wrong with 'tags' - understandable,
> since there's basically no comment explaining what it's supposed to do. I
> added an intentional failure to 'show.pgc', which then yielded the following
> tap output:
> ok 51sql/quote  15 ms
> not ok 52sql/show9 ms
> # tags: stdout source ok 53sql/insupd 
> 11 ms
> ok 54sql/parser 13 ms
> 
> which then subsequently leads meson to complain that
> TAP parsing error: out of order test numbers
> TAP parsing error: Too few tests run (expected 62, got 61)
> 
> Looks like all that's needed is a \n after "tags: %s"

Ugh, yes, I forgot to run my ecpg tests before submitting. Fixed.

> I think the patch is also missing a \n after in log_child_failure().

Fixed.

>> Subject: [PATCH v10 2/2] Experimental: meson: treat regress tests as TAP
> 
> I'd just squash that in I think. Isn't really experimental either IMO ;)

Done.

>> +if (type == DIAG || type == BAIL)
>> +fp = stderr;
> 
> Personally I'd move the initialization of fp to an else branch here, but
> that's a very minor taste thing.

I actually had it like that before, moved it and wasn't sure what I preferred.
Moved back.

>> +fprintf(stdout, "Bail Out!");
>> +if (logfile)
>> +fprintf(logfile, "Bail Out!");
> 
> I think this needs a \n.

Fixed.

> I was wondering whether it's worth having an printf wrapper that does the
> vfprintf(); if (logfile) vfprintf() dance. But it's proably not.

Not sure if the saved lines of code justifies the loss of readability.

>> +/* Not using the normal bail() here as we want _exit */
>> +_bail(_("could not exec \"%s\": %s\n"), shellprog, 
>> strerror(errno));
> 
> Not in love with _bail, but I don't immediately have a better idea.

Agreed on both counts.

>> +pg_log_error(_("# could not open file \"%s\" for reading: %s"),
>> + file, strerror(errno));
> 
> Hm. Shouldn't this just use diag()?

It should.  I've changed all uses of pg_log_error to be diag or bail for
consistency in output, except for the one in getopt.

>> +test_status_ok(tests[i], 
>> INSTR_TIME_GET_MILLISEC(stoptimes[i]), (num_tests > 1));
>> 
>>  if (statuses[i] != 0)
>>  log_child_failure(statuses[i]);
> 
> Hm. We probably shouldn't treat the test as a success if statuses[i] != 0?
> Although it sure looks like we did so far.

Thats a good point, I admittedly hadn't thought about it.  55de145d1cf6f1d1b9
doesn't mention why it's done this way, maybe it's assumed that if the process
died then the test would've failed anyways (which is likely true but not
guaranteed to be so).

As it's unrelated to the question of output format I'll open a new thread with
that question to keep it from being buried here.

>> -printf("%s ", tl->str);
>> +appendStringInfo(&tagbuf, "%s ", tl->str);
> 
> Why does tagbuf exist? Afaict it 

Re: Patch: Global Unique Index

2022-11-24 Thread Thomas Kellerer
Pavel Stehule schrieb am 24.11.2022 um 07:03:
> There are many Oracle users that find global indexes useful despite
> their disadvantages.
>
> I have seen this mostly when the goal was to get the benefits of
> partition pruning at runtime which turned the full table scan (=Seq Scan)
> on huge tables to partition scans on much smaller partitions.
> Partition wise joins were also helpful for query performance.
> The substantially slower drop partition performance was accepted in thos 
> cases
>
>
> I think it would be nice to have the option in Postgres as well.
>
> I do agree however, that the global index should not be created 
> automatically.
>
> Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better
>
>
> Is it necessary to use special marks like GLOBAL if this index will
> be partitioned, and uniqueness will be ensured by repeated
> evaluations?
>
> Or you think so there should be really forced one relation based
> index?
>
> I can imagine a unique index on partitions without a special mark,
> that will be partitioned,  and a second variant classic index created
> over a partitioned table, that will be marked as GLOBAL.


My personal opinion is, that a global index should never be created
automatically.

The user should consciously decide on using a feature
that might have a serious impact on performance in some areas.





Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 24 Nov 2022, at 13:42, Ted Yu  wrote:
>> In _hash_pgaddtup(), it seems the indentation is off for the assertion.

> Indentation is handled by applying src/tools/pgindent to the code, and
> re-running it on this file yields no re-indentation so this is in fact correct
> according to the pgindent rules.

It is one messy bit of code though --- perhaps a little more thought
about where to put line breaks would help?  Alternatively, it could
be split into multiple statements, along the lines of

#ifdef USE_ASSERT_CHECKING
if (PageGetMaxOffsetNumber(page) > 0)
{
IndexTuple lasttup = PageGetItem(page,
 PageGetItemId(page,
   
PageGetMaxOffsetNumber(page)));

Assert(_hash_get_indextuple_hashkey(lasttup) <=
   _hash_get_indextuple_hashkey(itup));
}
#endif

(details obviously tweakable)

regards, tom lane




Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-24 Thread Takamichi Osumi (Fujitsu)
On Wednesday, October 5, 2022 6:42 PM Peter Smith  wrote:
> Hi Euler, a long time ago you ask me a few questions about my previous review
> [1].
> 
> Here are my replies, plus a few other review comments for patch v7-0001.
Hi, thank you for your comments.


> ==
> 
> 1. doc/src/sgml/catalogs.sgml
> 
> +  
> +   Application delay of changes by a specified amount of time. The
> +   unit is in milliseconds.
> +  
> 
> The wording sounds a bit strange still. How about below
> 
> SUGGESTION
> The length of time (ms) to delay the application of changes.
Fixed.


> ===
> 
> 2. Other documentation?
> 
> Maybe should say something on the Logical Replication Subscription page
> about this? (31.2 Subscription)
Added.

 
> ===
> 
> 3. doc/src/sgml/ref/create_subscription.sgml
> 
> +  synchronized, this may lead to apply changes earlier than expected.
> +  This is not a major issue because a typical setting of this 
> parameter
> +  are much larger than typical time deviations between servers.
> 
> Wording?
> 
> SUGGESTION
> ... than expected, but this is not a major issue because this parameter is
> typically much larger than the time deviations between servers.
Fixed.



> ~~~
> 
> 4. Q/A
> 
> From [2] you asked:
> 
> > Should there also be a big warning box about the impact if using
> > synchronous_commit (like the other streaming replication page has this
> > warning)?
> Impact? Could you elaborate?
> 
> ~
> 
> I noticed the streaming replication docs for recovery_min_apply_delay has a 
> big
> red warning box saying that setting this GUC may block the synchronous
> commits. So I was saying won’t a similar big red warning be needed also for
> this min_apply_delay parameter if the delay is used in conjunction with a
> publisher wanting synchronous commit because it might block everything?
I agree with you. Fixed.


 
> ~~~
> 
> 4. Example
> 
> +
> +CREATE SUBSCRIPTION foo
> + CONNECTION 'host=192.168.1.50 port=5432 user=foo
> dbname=foodb'
> +PUBLICATION baz
> +   WITH (copy_data = false, min_apply_delay = '4h');
> +
> 
> If the example named the subscription/publication as ‘mysub’ and ‘mypub’ I
> think it would be more consistent with the existing examples.
Fixed.


 
> ==
> 
> 5. src/backend/commands/subscriptioncmds.c - SubOpts
> 
> @@ -89,6 +91,7 @@ typedef struct SubOpts
>   bool disableonerr;
>   char*origin;
>   XLogRecPtr lsn;
> + int64 min_apply_delay;
>  } SubOpts;
> 
> I feel it would be better to be explicit about the storage units. So call this
> member ‘min_apply_delay_ms’. E.g. then other code in
> parse_subscription_options will be more natural when you are converting using
> and assigning them to this member.
I don't think we use such names including units explicitly.
Could you please tell me a similar example for this ?



> ~~~
> 
> 6. - parse_subscription_options
> 
> + /*
> + * If there is no unit, interval_in takes second as unit. This
> + * parameter expects millisecond as unit so add a unit (ms) if
> + * there isn't one.
> + */
> 
> The comment feels awkward. How about below
> 
> SUGGESTION
> If no unit was specified, then explicitly add 'ms' otherwise the interval_in
> function would assume 'seconds'
Fixed.


> ~~~
> 
> 7. - parse_subscription_options
> 
> (This is a repeat of [1] review comment #12)
> 
> + if (opts->min_apply_delay < 0 && IsSet(supported_opts,
> SUBOPT_MIN_APPLY_DELAY))
> + ereport(ERROR,
> + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> + errmsg("option \"%s\" must not be negative", "min_apply_delay"));
> 
> Why is this code here instead of inside the previous code block where the
> min_apply_delay was assigned in the first place?
Changed.


> ==
> 
> 8. src/backend/replication/logical/worker.c  - apply_delay
> 
> + * When min_apply_delay parameter is set on subscriber, we wait long
> + enough to
> + * make sure a transaction is applied at least that interval behind the
> + * publisher.
> 
> "on subscriber" -> "on the subscription"
Fixed.



> ~~~
> 
> 9.
> 
> + * Apply delay only after all tablesync workers have reached READY
> + state. A
> + * tablesync worker are kept until it reaches READY state. If we allow
> + the
> 
> 
> Wording ??
> 
> "A tablesync worker are kept until it reaches READY state." ??
I removed the sentence.


> ~~~
> 
> 10.
> 
> 10a.
> + /* nothing to do if no delay set */
> 
> Uppercase comment
> /* Nothing to do if no delay set */
> 
> ~
> 
> 10b.
> + /* set apply delay */
> 
> Uppercase comment
> /* Set apply delay */
Both are fixed.


 
> ~~~
> 
> 11. - apply_handle_stream_prepare / apply_handle_stream_commit
> 
> The previous concern about incompatibility with the "Parallel Apply"
> work (see [1] review comments #17, #18) is still a pending issue, isn't it?
Yes, I think so.
Kindly have a look at [1].


> ==
> 
> 12. src/backend/utils/adt/timestamp.c interval_to_ms
> 
> +/*
> + * Given an Interval returns the number of milliseconds.
> + */
> +int64
> +

Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-24 Thread Takamichi Osumi (Fujitsu)
On Wednesday, November 16, 2022 12:58 PM Ian Lawrence Barwick 
 wrote:
> 2022年11月14日(月) 10:09 Takamichi Osumi (Fujitsu)
> :
> > I've simply rebased the patch to make it applicable on top of HEAD and
> > make the tests pass. Note there are still open pending comments and
> > I'm going to start to address those.
> Thanks for the updated patch.
> 
> While reviewing the patch backlog, we have determined that this patch adds
> one or more TAP tests but has not added the test to the "meson.build" file.
> 
> To do this, locate the relevant "meson.build" file for each test and add it 
> in the
> 'tests' dictionary, which will look something like this:
> 
>   'tap': {
> 'tests': [
>   't/001_basic.pl',
> ],
>   },
> 
> For some additional details please see this Wiki article:
> 
>   https://wiki.postgresql.org/wiki/Meson_for_patch_authors
> 
> For more information on the meson build system for PostgreSQL see:
> 
>   https://wiki.postgresql.org/wiki/Meson
Hi, thanks for your notification.


You are right. Modified.
The updated patch can be seen in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-24 Thread Takamichi Osumi (Fujitsu)
Hi,


On Tuesday, November 22, 2022 6:15 PM vignesh C  wrote:
> On Mon, 14 Nov 2022 at 12:14, Amit Kapila  wrote:
> >
> > Hi,
> >
> > The thread title doesn't really convey the topic under discussion, so
> > changed it. IIRC, this has been mentioned by others as well in the
> > thread.
> >
> > On Sat, Nov 12, 2022 at 7:21 PM vignesh C  wrote:
> > >
> > > Few comments:
> > > 1) I feel if the user has specified a long delay there is a chance
> > > that replication may not continue if the replication slot falls
> > > behind the current LSN by more than max_slot_wal_keep_size. I feel
> > > we should add this reference in the documentation of min_apply_delay
> > > as the replication will not continue in this case.
> > >
> >
> > This makes sense to me.
Modified accordingly. The updated patch is in [1].


> >
> > > 2) I also noticed that if we have to shut down the publisher server
> > > with a long min_apply_delay configuration, the publisher server
> > > cannot be stopped as the walsender waits for the data to be
> > > replicated. Is this behavior ok for the server to wait in this case?
> > > If this behavior is ok, we could add a log message as it is not very
> > > evident from the log files why the server could not be shut down.
> > >
> >
> > I think for this case, the behavior should be the same as for physical
> > replication. Can you please check what is behavior for the case you
> > are worried about in physical replication? Note, we already have a
> > similar parameter for recovery_min_apply_delay for physical
> > replication.
> 
> In the case of physical replication by setting recovery_min_apply_delay, I
> noticed that both primary and standby nodes were getting stopped successfully
> immediately after the stop server command. In case of logical replication, 
> stop
> server fails:
> pg_ctl -D publisher -l publisher.log stop -c waiting for server to shut
> down...
> failed
> pg_ctl: server does not shut down
> 
> In case of logical replication, the server does not get stopped because the
> walsender process is not able to exit:
> ps ux | grep walsender
> vignesh  1950789 75.3  0.0 8695216 22284 ?   Rs   11:51   1:08
> postgres: walsender vignesh [local] START_REPLICATION
Thanks, I could reproduce this and I'll update this point in a subsequent 
version.



[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi



Re: Patch: Global Unique Index

2022-11-24 Thread Dilip Kumar
On Fri, Nov 18, 2022 at 3:31 AM Cary Huang  wrote:
>
> Patch: Global Unique Index
> - Optimizer, query planning and vacuum -
> Since no major modification is done on global unique index's structure and 
> storage, it works in the same way as a regular partitioned index. No major 
> change is required to be done on optimizer, planner and vacuum process as 
> they should work in the same way as regular index.

It might not need changes in the vacuum to make it work.  But this can
not be really useful without modifying the vacuum the way it works.  I
mean currently, the indexes are also partitioned based on the table so
whenever we do table vacuum it's fine to do index vacuum but now you
will have one gigantic index and which will be vacuumed every time we
vacuum any of the partitions.  So for example, if you have 1
partitions then by the time you vacuum the whole table (all 1
partitions) the global index will be vacuumed 1 times.

There was some effort in past (though it was not concluded) about
decoupling the index and heap vacuuming such that instead of doing the
index vacuum for each partition we remember the dead tids and we only
do the index vacuum when we think there are enough dead items so that
the index vacuum makes sense[1].

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu
On Thu, Nov 24, 2022 at 7:11 AM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> >> On 24 Nov 2022, at 13:42, Ted Yu  wrote:
> >> In _hash_pgaddtup(), it seems the indentation is off for the assertion.
>
> > Indentation is handled by applying src/tools/pgindent to the code, and
> > re-running it on this file yields no re-indentation so this is in fact
> correct
> > according to the pgindent rules.
>
> It is one messy bit of code though --- perhaps a little more thought
> about where to put line breaks would help?  Alternatively, it could
> be split into multiple statements, along the lines of
>
> #ifdef USE_ASSERT_CHECKING
> if (PageGetMaxOffsetNumber(page) > 0)
> {
> IndexTuple lasttup = PageGetItem(page,
>  PageGetItemId(page,
>
>  PageGetMaxOffsetNumber(page)));
>
> Assert(_hash_get_indextuple_hashkey(lasttup) <=
>_hash_get_indextuple_hashkey(itup));
> }
> #endif
>
> (details obviously tweakable)
>
> regards, tom lane
>

Thanks Tom for the suggestion.

Here is patch v2.


hash-pgaddtup-indent-v2.patch
Description: Binary data


RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-24 Thread Takamichi Osumi (Fujitsu)
Hi,


On Monday, November 14, 2022 7:15 PM Amit Kapila  
wrote:
> On Wed, Nov 9, 2022 at 12:11 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Wed, 10 Aug 2022 17:33:00 -0300, "Euler Taveira"
> >  wrote in
> > > On Wed, Aug 10, 2022, at 9:39 AM, osumi.takami...@fujitsu.com wrote:
> > > > Minor review comments for v6.
> > > Thanks for your review. I'm attaching v7.
> >
> > Using interval is not standard as this kind of parameters but it seems
> > convenient. On the other hand, it's not great that the unit month
> > introduces some subtle ambiguity.  This patch translates a month to 30
> > days but I'm not sure it's the right thing to do. Perhaps we shouldn't
> > allow the units upper than days.
> >
> 
> Agreed. Isn't the same thing already apply to recovery_min_apply_delay for
> which the maximum unit seems to be in days? If so, there is no reason to do
> something different here?
The corresponding one of physical replication had the
upper limit of INT_MAX(like it means 24 days is OK, but 25 days isn't).
I added this test in the patch posted in [1].


> 
> > apply_delay() chokes the message-receiving path so that a not-so-long
> > delay can cause a replication timeout to fire.  I think we should
> > process walsender pings even while delaying.  Needing to make
> > replication timeout longer than apply delay is not great, I think.
> >
> 
> Again, I think for this case also the behavior should be similar to how we 
> handle
> recovery_min_apply_delay.
Yes, I agree with you.
This feature makes it easier to trigger the publisher's timeout,
which can't be observed in the physical replication.
I'll do the investigation and modify this point in a subsequent version.


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: logical replication restrictions

2022-11-24 Thread Takamichi Osumi (Fujitsu)
Hi,


On Thursday, August 11, 2022 7:33 PM Amit Kapila  
wrote:
> On Tue, Aug 9, 2022 at 3:52 AM Euler Taveira  wrote:
> >
> > On Wed, Aug 3, 2022, at 10:27 AM, Amit Kapila wrote:
> >
> > Your explanation makes sense to me. The other point to consider is
> > that there can be cases where we may not apply operation for the
> > transaction because of empty transactions (we don't yet skip empty
> > xacts for prepared transactions). So, won't it be better to apply the
> > delay just before we apply the first change for a transaction? Do we
> > want to apply the delay during table sync as we sometimes do need to
> > enter apply phase while doing table sync?
> >
> > I thought about the empty transactions but decided to not complicate
> > the code mainly because skipping transactions is not a code path that
> > will slow down this feature. As explained in the documentation, there
> > is no harm in delaying a transaction for more than min_apply_delay; it
> > cannot apply earlier. Having said that I decided to do nothing. I'm
> > also not sure if it deserves a comment or if this email is a possible 
> > explanation
> for this decision.
> >
> 
> I don't know what makes you think it will complicate the code. But anyway
> thinking further about the way apply_delay is used at various places in the 
> patch,
> as pointed out by Peter Smith it seems it won't work for the parallel apply
> feature where we start applying the transaction immediately after start 
> stream.
> I was wondering why don't we apply delay after each commit of the transaction
> rather than at the begin command. We can remember if the transaction has
> made any change and if so then after commit, apply the delay. If we can do 
> that
> then it will alleviate the concern of empty and skipped xacts as well.
I agree with this direction. I'll update this point in a subsequent patch.


> Another thing I was wondering how to determine what is a good delay time for
> tests and found that current tests in replay_delay.pl uses 3s, so should we 
> use
> the same for apply delay tests in this patch as well?
Fixed in the patch posted in [1].



[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi



Re: Patch: Global Unique Index

2022-11-24 Thread Justin Pryzby
On Thu, Nov 24, 2022 at 07:03:24AM +0100, Pavel Stehule wrote:
> I can imagine a unique index on partitions without a special mark, that
> will be partitioned, 

That exists since v11, as long as the index keys include the partition
keys.

> and a second variant classic index created over a partitioned table,
> that will be marked as GLOBAL.

That's not what this patch is about, though.

On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote:
> but now you will have one gigantic index and which will be vacuumed
> every time we vacuum any of the partitions.

This patch isn't implemented as "one gigantic index", though.

-- 
Justin




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-24 Thread Simon Riggs
On Tue, 22 Nov 2022 at 18:44, Tom Lane  wrote:
>
> I wrote:
> > Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere
> > else in this loop.
>
> I did some experimentation using the test case Jakub presented
> to start with, and verified that that loop does respond promptly
> to control-C even in HEAD.  So there are CFI(s) in the loop as
> I thought, and we don't need another.

Thanks for checking. Sorry for not responding earlier.

> What we do need is some more work on nearby comments.  I'll
> see about that and push it.

Thanks; nicely streamlined.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: TAP output format in pg_regress

2022-11-24 Thread Nikolay Shaplov
You guys are really fast... I only think about problem, it is already 
mentioned here... Most issues I've noticed are already fixed before I was able 
to say something. 

Nevertheless...

 /* 
 
  * Bailing out is for unrecoverable errors which prevents further testing to   
 
  * occur and after which the test run should be aborted. By passing non_rec
 
  * as true the process will exit with _exit(2) and skipping registered exit
 
  * handlers.   
 
  */
 
 static void
 
 bail_out(bool non_rec, const char *fmt,...)
 
 {   

In original code, where _exit were called, there were mention about recursion 
(whatever it is) as a reason for using _exit() instead of exit(). Now this 
mention is gone:

-   _exit(2);   /* not exit(), that could be recursive */
+   /* Not using the normal bail() as we want _exit */
+   _bail(_("could not stop postmaster: exit code was %d\n"), r);

The only remaining part of recursion is _rec suffix.

I guess we should keep mention of recursion in comments, and for me _rec 
stands for "record", not "recursion", I will not guess original meaning by 
_rec suffix. I would suggest to change it to _recurs or _recursive to keep 
things clear

-

+#define _bail(...) bail_out(true, __VA_ARGS__)
+#define bail(...)  bail_out(false, __VA_ARGS__)

I will never guess what the difference between bail, _bail and bail_out. We 
need really good explanation here, or better to use self-explanatory names 
here... 

I would start bail_out with _ as it is "internal", not used in the code.
And for _bail I would try to find some name that shows it's nature. Like 
bail_in_recursion or something.

-

+   if (ok || ignore)
{
-   va_start(ap, fmt);
-   vfprintf(logfile, fmt, ap);
-   va_end(ap);
+   emit_tap_output(TEST_STATUS, "ok %-5i %s%-*s %8.0f ms%s\n",
+   testnumber,
+   (parallel ? PARALLEL_INDENT : ""),
+   padding, testname,
+   runtime,
+   (ignore ? " # SKIP (ignored)" : ""));
+   }
+   else
+   {
+   emit_tap_output(TEST_STATUS, "not ok %-5i %s%-*s %8.0f ms\n",
+   testnumber,
+   (parallel ? PARALLEL_INDENT : ""),
+   padding, testname,
+   runtime);
}

This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even two 
times. I understand problems with spaces... But may be it would be better 
somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to 
buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something like 
that...

I am not strongly insisting on that, but I feel strong urge to remove code 
duplication here...


-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-24 Thread Simon Riggs
On Thu, 24 Nov 2022 at 00:19, Nathan Bossart  wrote:
>
> On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> > rebased
>
> another rebase for cfbot

0001 seems good to me
* I like that it sleeps forever until requested
* not sure I believe that everything it does can always be aborted out
of and shutdown - to achieve that you will need a
CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
* not sure why you want immediate execution of custodian tasks - I
feel supporting two modes will be a lot harder. For me, I would run
locally when !IsUnderPostmaster and also in an Assert build, so we can
test it works right - i.e. running in its own process is just a
production optimization for performance (which is the stated reason
for having this)

0005 seems good from what I know
* There is no check to see if it worked in any sane time
* It seems possible that "Old" might change meaning - will that make
it break/fail?

0006 seems good also
* same comments for 5

Rather than explicitly use DEBUG1 everywhere I would have an
#define CUSTODIAN_LOG_LEVEL LOG
so we can run with it in LOG mode and then set it to DEBUG1 with a one
line change in a later phase of Beta

I can't really comment with knowledge on sub-patches 0002 to 0004.

Perhaps you should aim to get 1, 5, 6 committed first and then return
to the others in a later CF/separate thread?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-24 Thread Alvaro Herrera
On 2022-Nov-24, Dimos Stamatakis wrote:

> Thanks for your feedback!
> I applied the patch to print more information about the error. Here’s what I 
> got:
> 
> 2022-11-23 20:33:03 UTC [638 test_database]: [5458] ERROR:  new multixact has 
> more than one updating member: 0 2[248477 (nokeyupd), 248645 (nokeyupd)]
> 2022-11-23 20:33:03 UTC [638 test_database]: [5459] STATEMENT:  UPDATE 
> warehouse1
>   SET w_ytd = w_ytd + 498
> WHERE w_id = 5
> 
> I then inspected the WAL and I found the log records for these 2 transactions:
> 
> …
> rmgr: MultiXact   len (rec/tot): 54/54, tx: 248477, lsn: 
> 0/66DB82A8, prev 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: 
> 248477 (nokeyupd) 248500 (keysh)
> rmgr: Heaplen (rec/tot): 70/70, tx: 248477, lsn: 
> 0/66DB82E0, prev 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 
> IS_MULTI EXCL_LOCK ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 blk 
> 422
> rmgr: Transaction len (rec/tot): 34/34, tx: 248477, lsn: 
> 0/66DBA710, prev 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC
> …
> rmgr: Transaction len (rec/tot): 34/34, tx: 248645, lsn: 
> 0/66DBB060, prev 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC

Ah, it seems clear enough: the transaction that aborted after having
updated the tuple, is still considered live when doing the second
update.  That sounds wrong.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"




Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-24 Thread Simon Riggs
On Fri, 21 Oct 2022 at 17:09, Maxim Orlov  wrote:

> Reviews and opinions are very welcome!

I'm wondering whether the safest way to handle this is by creating a
new TAM called "heap64", so that all storage changes happens there.
(Obviously there are still many other changes in core, but they are
more easily fixed).

That would reduce the code churn around "heap", allowing us to keep it
stable while we move to the brave new world.

Many current users see stability as one of the greatest strengths of
Postgres, so while I very much support this move, I wonder if this
gives us a way to have both stability and innovation at the same time?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Questions regarding distinct operation implementation

2022-11-24 Thread Ankit Kumar Pandey


On 23/11/22 23:48, Ankit Kumar Pandey wrote:


Hello,


I have questions regarding distinct operation and would be glad if 
someone could help me out.


Consider the following table (mytable):

id, name

1, A

1, A

2, B

3, A

1, A

If we do /select avg(id) over (partition by name) from mytable/, 
partition logic goes like this:


for A: 1, 1, 3, 1

If we want to implement something like this /select avg(distinct id) 
over (partition by name) from m/ytable


and remove duplicate by storing last datum of aggregate column (id) 
and comparing it with current value. It fails here because aggregate 
column is not sorted within the partition.


Questions:

1. Is sorting prerequisite for finding distinct values?

2. Is it okay to sort aggregate column (within partition) for distinct 
to work in case of window function?


3. Is an alternative way exists to handle this scenario (because here 
sort is of no use in aggregation)?



Thanks


--
Regards,
Ankit Kumar Pandey


Hi,

After little more digging, I can see that aggregation on Window 
functions are of running type, it would be bit more effective if a 
lookup hashtable is created where every value in current aggregate 
column get inserted. Whenever frame moves ahead, a lookup if performed 
for presence of duplicate.


On performance standpoint, this might be bad idea though.

Please let me know any opinions on this.

--
Regards,
Ankit Kumar Pandey


Re: Patch: Global Unique Index

2022-11-24 Thread Cary Huang
  On Thu, 24 Nov 2022 08:00:59 -0700  Thomas Kellerer  wrote --- 
 > Pavel Stehule schrieb am 24.11.2022 um 07:03:
 > > There are many Oracle users that find global indexes useful despite
 > > their disadvantages.
 > >
 > > I have seen this mostly when the goal was to get the benefits of
 > > partition pruning at runtime which turned the full table scan (=Seq 
 > > Scan)
 > > on huge tables to partition scans on much smaller partitions.
 > > Partition wise joins were also helpful for query performance.
 > > The substantially slower drop partition performance was accepted in 
 > > thos cases
 > >
 > >
 > > I think it would be nice to have the option in Postgres as well.
 > >
 > > I do agree however, that the global index should not be created 
 > > automatically.
 > >
 > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better
 > >
 > >
 > > Is it necessary to use special marks like GLOBAL if this index will
 > > be partitioned, and uniqueness will be ensured by repeated
 > > evaluations?
 > >
 > > Or you think so there should be really forced one relation based
 > > index?
 > >
 > > I can imagine a unique index on partitions without a special mark,
 > > that will be partitioned,  and a second variant classic index created
 > > over a partitioned table, that will be marked as GLOBAL.
 > 
 > 
 > My personal opinion is, that a global index should never be created
 > automatically.
 > 
 > The user should consciously decide on using a feature
 > that might have a serious impact on performance in some areas.


Agreed, if a unique index is created on non-partition key columns without 
including the special mark (partition key columns), it may be a mistake from 
user. (At least I make this mistake all the time). Current PG will give you a 
warning to include the partition keys, which is good. 

If we were to automatically turn that into a global unique index, user may be 
using the feature without knowing and experiencing some performance impacts (to 
account for extra uniqueness check in all partitions).





Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-24 Thread Alvaro Herrera
On 2022-Nov-24, Alvaro Herrera wrote:

> On 2022-Nov-24, Dimos Stamatakis wrote:
> 
> > rmgr: MultiXact   len (rec/tot): 54/54, tx: 248477, lsn: 
> > 0/66DB82A8, prev 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: 
> > 248477 (nokeyupd) 248500 (keysh)
> > rmgr: Heaplen (rec/tot): 70/70, tx: 248477, lsn: 
> > 0/66DB82E0, prev 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 
> > IS_MULTI EXCL_LOCK ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 
> > blk 422
> > rmgr: Transaction len (rec/tot): 34/34, tx: 248477, lsn: 
> > 0/66DBA710, prev 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC
> > …
> > rmgr: Transaction len (rec/tot): 34/34, tx: 248645, lsn: 
> > 0/66DBB060, prev 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC
> 
> Ah, it seems clear enough: the transaction that aborted after having
> updated the tuple, is still considered live when doing the second
> update.  That sounds wrong.

Hmm, if a transaction is aborted but its Xid is after latestCompletedXid,
it would be reported as still running.  AFAICS that is only modified
with ProcArrayLock held in exclusive mode, and only read with it held in
share mode, so this should be safe.

I can see nothing else ATM.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php




Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

2022-11-24 Thread Andres Freund
Hi,

On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote:
> With that said, here's a small improvement I can think of, that is, to
> avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
> of WaitXLogInsertionsToFinish() currently holds. Since
> LWLockWaitForVar() does a bunch of things - holds interrupts, does
> atomic reads, acquires and releases wait list lock and so on, avoiding
> it may be a good idea IMO.

That doesn't seem like a big win. We're still going to call LWLockWaitForVar()
for all the other locks.

I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.

The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.

I'd bet that start to have visible effects in a workload with many small
records.

Greetings,

Andres Freund




Re: TAP output format in pg_regress

2022-11-24 Thread Daniel Gustafsson
> On 24 Nov 2022, at 18:07, Nikolay Shaplov  wrote:
> 
> You guys are really fast... I only think about problem, it is already 
> mentioned here... Most issues I've noticed are already fixed before I was 
> able 
> to say something. 

Thanks for looking and reviewing!

> /*
>   
>  * Bailing out is for unrecoverable errors which prevents further testing to  
>   
>  * occur and after which the test run should be aborted. By passing non_rec   
>   
>  * as true the process will exit with _exit(2) and skipping registered exit   
>   
>  * handlers.  
>   
>  */   
>   
> static void   
>   
> bail_out(bool non_rec, const char *fmt,...)   
>   
> {   
> 
> In original code, where _exit were called, there were mention about recursion 
> (whatever it is) as a reason for using _exit() instead of exit(). Now this 
> mention is gone:
> 
> -   _exit(2);   /* not exit(), that could be recursive */
> +   /* Not using the normal bail() as we want _exit */
> +   _bail(_("could not stop postmaster: exit code was %d\n"), r);
> 
> The only remaining part of recursion is _rec suffix.
> 
> I guess we should keep mention of recursion in comments, and for me _rec 
> stands for "record", not "recursion", I will not guess original meaning by 
> _rec suffix. I would suggest to change it to _recurs or _recursive to keep 
> things clear

The other original comment on _exit usage reads:

/* not exit() here... */

I do think that the longer explanation I added in the comment you quoted above
is more explanatory than those.  I can however add a small note on why skipping
registered exit handlers is useful (ie, not risk recursive calls to exit()).

> +#define _bail(...)   bail_out(true, __VA_ARGS__)
> +#define bail(...)bail_out(false, __VA_ARGS__)
> 
> I will never guess what the difference between bail, _bail and bail_out. We 
> need really good explanation here, or better to use self-explanatory names 
> here... 
> 
> I would start bail_out with _ as it is "internal", not used in the code.
> And for _bail I would try to find some name that shows it's nature. Like 
> bail_in_recursion or something.

One option could be to redefine bail() to take the exit function as a parameter
and have the caller pass the preferred exit handler.

-bail_out(bool non_rec, const char *fmt,...)
+bail(void (*exit_func)(int), const char *fmt,...)

The callsites would then look like the below, which puts a reference to the
actual exit handler used in the code where it is called.  I find it a bit
uglier, but also quite self-explanatory:

@@ -409,10 +403,7 @@ stop_postmaster(void)
fflush(NULL);
r = system(buf);
if (r != 0)
-   {
-   /* Not using the normal bail() as we want _exit */
-   _bail(_("could not stop postmaster: exit code was 
%d\n"), r);
-   }
+   bail(_exit, _("could not stop postmaster: exit code was 
%d\n"), r);

postmaster_running = false;
}
@@ -469,7 +460,7 @@ make_temp_sockdir(void)
temp_sockdir = mkdtemp(template);
if (temp_sockdir == NULL)
{
-   bail(_("could not create directory \"%s\": %s\n"),
+   bail(exit, _("could not create directory \"%s\": %s\n"),
 template, strerror(errno));
}

Not sure what is the best option, but I've been unable to think of a name which
is documenting the code well enough that a comment explaining why isn't
required.

> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even 
> two 
> times. I understand problems with spaces... But may be it would be better 
> somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to 
> buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something like 
> that...

I'm not convinced that this printf format is that hard to read (which may well
be attributed to Stockholm Syndrome), and I do think that breaking it up and
adding more code to print the line will make it less readable instead.

--
Daniel Gustafsson   https://vmware.com/





Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-24 Thread Peter Geoghegan
On Sun, Nov 20, 2022 at 11:58 PM Chris Travers  wrote:
> I can start by saying I think it would be helpful (if the other issues are 
> approached reasonably) to have 64-bit xids, but there is an important piece 
> of context in reventing xid wraparounds that seems missing from this patch 
> unless I missed something.
>
> XID wraparound is a symptom, not an underlying problem.  It usually occurs 
> when autovacuum or other vacuum strategies have unexpected stalls and 
> therefore fail to work as expected.  Shifting to 64-bit XIDs dramatically 
> changes the sorts of problems that these stalls are likely to pose to 
> operational teams.  -- you can find you are running out of storage rather 
> than facing an imminent database shutdown.  Worse, this patch delays the 
> problem until some (possibly far later!) time, when vacuum will take far 
> longer to finish, and options for resolving the problem are diminished.  As a 
> result I am concerned that merely changing xids from 32-bit to 64-bit will 
> lead to a smaller number of far more serious outages.

This is exactly what I think (except perhaps for the part about having
fewer outages overall). The more transaction ID space you need, the
more space you're likely to need in the near future.

We can all agree that having more runway is better than having less
runway, at least in some abstract sense, but that in itself doesn't
help the patch series very much. The first time the system-level
oldestXid (or database level datminfrozenxid) attains an age of 2
billion XIDs will usually *also* be the first time it attains an age
of (say) 300 million XIDs. Even 300 million is usually a huge amount
of XID space relative to (say) the number of XIDs used every 24 hours.
So I know exactly what you mean about just addressing a symptom.

The whole project seems to just ignore basic, pertinent questions.
Questions like: why are we falling behind like this in the first
place? And: If we don't catch up soon, why should we be able to catch
up later on? Falling behind on freezing is still a huge problem with
64-bit XIDs.

Part of the problem with the current design is that table age has
approximately zero relationship with the true cost of catching up on
freezing -- we are "using the wrong units", in a very real sense. In
general we may have to do zero freezing to advance a table's
relfrozenxid age by a billion XIDs, or we might have to write
terabytes of FREEZE_PAGE records to advance a similar looking table's
relfrozenxid by just one single XID (it could also swing wildly over
time for the same table). Which the system simply doesn't try to
reason about right now.

There are no settings for freezing that use physical units, and frame
the problem as a problem of being behind by this many unfrozen pages
(they are all based on XID age). And so the problem with letting table
age get into the billions isn't even that we'll never catch up -- we
actually might catch up very easily! The real problem is that we have
no way of knowing ahead of time (at least not right now). VACUUM
should be managing the debt, *and* the uncertainty about how much debt
we're really in. VACUUM needs to have a dynamic, probabilistic
understanding of what's really going on -- something much more
sophisticated than looking at table age in autovacuum.c.

One reason why you might want to advance relfrozenxid proactively is
to give the system a better general sense of the true relationship
between logical XID space and physical freezing for a given table and
workload -- it gives a clearer picture about the conditions in the
table. The relationship between SLRU space and physical heap pages and
the work of freezing is made somewhat clearer by a more proactive
approach to advancing relfrozenxid. That's one way that VACUUM can
lower the uncertainty I referred to.

-- 
Peter Geoghegan




Re: TAP output format in pg_regress

2022-11-24 Thread Andres Freund



On November 24, 2022 11:07:43 AM PST, Daniel Gustafsson  wrote:
>> On 24 Nov 2022, at 18:07, Nikolay Shaplov  wrote:
>One option could be to redefine bail() to take the exit function as a parameter
>and have the caller pass the preferred exit handler.
>
>-bail_out(bool non_rec, const char *fmt,...)
>+bail(void (*exit_func)(int), const char *fmt,...)
>
>The callsites would then look like the below, which puts a reference to the
>actual exit handler used in the code where it is called.

I'd just rename _bail to bail_noatexit().


>> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even 
>> two 
>> times. I understand problems with spaces... But may be it would be better 
>> somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to 
>> buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something 
>> like 
>> that...
>
>I'm not convinced that this printf format is that hard to read (which may well
>be attributed to Stockholm Syndrome), and I do think that breaking it up and
>adding more code to print the line will make it less readable instead.

I don't think it's terrible either. I do think it'd also be ok to switch 
between ok / not ok within a single printf, making it easier to keep them in 
sync.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: indentation in _hash_pgaddtup()

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 04:29, Ted Yu  wrote:
> Here is patch v2.

After running pgindent on v2, I see it still pushes the lines out
quite far. If I add a new line after PageGetItemId(page, and put the
variable assignment away from the variable declaration then it looks a
bit better. It's still 1 char over the limit.

David
diff --git a/src/backend/access/hash/hashinsert.c 
b/src/backend/access/hash/hashinsert.c
index 9db522051e..dfc7a90d68 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -290,12 +290,20 @@ _hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, 
IndexTuple itup,
{
itup_off = PageGetMaxOffsetNumber(page) + 1;
 
+#ifdef USE_ASSERT_CHECKING
/* ensure this tuple's hashkey is >= the final existing tuple */
-   Assert(PageGetMaxOffsetNumber(page) == 0 ||
-  _hash_get_indextuple_hashkey((IndexTuple)
-   
PageGetItem(page, PageGetItemId(page,
-   

PageGetMaxOffsetNumber(page <=
-  _hash_get_indextuple_hashkey(itup));
+   if (PageGetMaxOffsetNumber(page) > 0)
+   {
+   IndexTuple  lasttup;
+
+   lasttup = PageGetItem(page,
+ 
PageGetItemId(page,
+   
PageGetMaxOffsetNumber(page)));
+
+   Assert(_hash_get_indextuple_hashkey(lasttup) <=
+  _hash_get_indextuple_hashkey(itup));
+   }
+#endif
}
else
{


Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Tom Lane
David Rowley  writes:
> After running pgindent on v2, I see it still pushes the lines out
> quite far. If I add a new line after PageGetItemId(page, and put the
> variable assignment away from the variable declaration then it looks a
> bit better. It's still 1 char over the limit.

If you wanted to be hard-nosed about 80 character width, you could
pull out the PageGetItemId call into a separate local variable.
I wasn't going to be quite that picky, but I won't object if that
seems better to you.

regards, tom lane




Re: indentation in _hash_pgaddtup()

2022-11-24 Thread Ted Yu
On Thu, Nov 24, 2022 at 12:31 PM Tom Lane  wrote:

> David Rowley  writes:
> > After running pgindent on v2, I see it still pushes the lines out
> > quite far. If I add a new line after PageGetItemId(page, and put the
> > variable assignment away from the variable declaration then it looks a
> > bit better. It's still 1 char over the limit.
>
> If you wanted to be hard-nosed about 80 character width, you could
> pull out the PageGetItemId call into a separate local variable.
> I wasn't going to be quite that picky, but I won't object if that
> seems better to you.
>
> regards, tom lane
>

Patch v4 stores ItemId in a local variable.


hash-pgaddtup-indent-v4.patch
Description: Binary data


Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-24 Thread Peter Smith
On Fri, Nov 25, 2022 at 2:15 AM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Wednesday, October 5, 2022 6:42 PM Peter Smith  
> wrote:
...
>
> > ==
> >
> > 5. src/backend/commands/subscriptioncmds.c - SubOpts
> >
> > @@ -89,6 +91,7 @@ typedef struct SubOpts
> >   bool disableonerr;
> >   char*origin;
> >   XLogRecPtr lsn;
> > + int64 min_apply_delay;
> >  } SubOpts;
> >
> > I feel it would be better to be explicit about the storage units. So call 
> > this
> > member ‘min_apply_delay_ms’. E.g. then other code in
> > parse_subscription_options will be more natural when you are converting 
> > using
> > and assigning them to this member.
> I don't think we use such names including units explicitly.
> Could you please tell me a similar example for this ?
>

Regex search "\..*_ms[e\s]" finds some members where the unit is in
the member name.

e.g. delay_ms (see EnableTimeoutParams in timeout.h)
e.g. interval_in_ms (see timeout_paramsin timeout.c)

Regex search ".*_ms[e\s]" finds many local variables where the unit is
in the variable name

> > ==
> >
> > 16. src/include/catalog/pg_subscription.h
> >
> > + int64 subapplydelay; /* Replication apply delay */
> > +
> >
> > Consider renaming this as 'subapplydelayms' to make the units perfectly 
> > clear.
> Similar to the 5th comments, I can't find any examples for this.
> I'd like to keep it general, which makes me feel it is more aligned with
> existing codes.
>

As above.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Questions regarding distinct operation implementation

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey  wrote:
> Please let me know any opinions on this.

I think if you're planning on working on this then step 1 would have
to be checking the SQL standard to see which set of rows it asks
implementations to consider for duplicate checks when deciding if the
transition should be performed or not.  Having not looked, I don't
know if this is the entire partition or just the rows in the current
frame.

Depending on what you want, an alternative today would be to run a
subquery to uniquify the rows the way you want and then do the window
function stuff.

David




Re: indentation in _hash_pgaddtup()

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 09:40, Ted Yu  wrote:
> Patch v4 stores ItemId in a local variable.

ok, I pushed that one.  Thank you for working on this.

David




Re: indentation in _hash_pgaddtup()

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 09:31, Tom Lane  wrote:
> If you wanted to be hard-nosed about 80 character width, you could
> pull out the PageGetItemId call into a separate local variable.
> I wasn't going to be quite that picky, but I won't object if that
> seems better to you.

I wasn't too worried about the 1 char, but Ted wrote a patch and it
looked a little nicer.

David




Re: TAP output format in pg_regress

2022-11-24 Thread Daniel Gustafsson
> On 24 Nov 2022, at 20:32, Andres Freund  wrote:
> 
> On November 24, 2022 11:07:43 AM PST, Daniel Gustafsson  
> wrote:
>>> On 24 Nov 2022, at 18:07, Nikolay Shaplov  wrote:
>> One option could be to redefine bail() to take the exit function as a 
>> parameter
>> and have the caller pass the preferred exit handler.
>> 
>> -bail_out(bool non_rec, const char *fmt,...)
>> +bail(void (*exit_func)(int), const char *fmt,...)
>> 
>> The callsites would then look like the below, which puts a reference to the
>> actual exit handler used in the code where it is called.
> 
> I'd just rename _bail to bail_noatexit().

That's probably the best option, done in the attached along with the comment
fixup to mention the recursion issue.

>>> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even 
>>> two 
>>> times. I understand problems with spaces... But may be it would be better 
>>> somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to 
>>> buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something 
>>> like 
>>> that...
>> 
>> I'm not convinced that this printf format is that hard to read (which may 
>> well
>> be attributed to Stockholm Syndrome), and I do think that breaking it up and
>> adding more code to print the line will make it less readable instead.
> 
> I don't think it's terrible either. I do think it'd also be ok to switch 
> between ok / not ok within a single printf, making it easier to keep them in 
> sync.

I made it into a single printf to see what it would look like, with some
additional comments to make it more readable (I'm not a fan of where pgindent
moves those but..).

--
Daniel Gustafsson   https://vmware.com/



v12-0001-Change-pg_regress-output-format-to-be-TAP-compli.patch
Description: Binary data


Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-11-24 Thread Hamid Akhtar
On Mon, 21 Nov 2022 at 17:34, Naeem Akhter  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> i've tested and verified the documentation.


Rebasing the patch to the tip of the master branch.


pageinspect_btree_multipagestats-v6.patch
Description: Binary data


Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-24 Thread Michael Paquier
On Thu, Nov 24, 2022 at 05:07:24PM +0800, Julien Rouhaud wrote:
> So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH.  Attached v22
> that fixes it in all the places I found.

Sounds fine.  Added one comment, fixed one comment, and applied.
Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Bug in row_number() optimization

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 00:52, Sergey Shinderuk
 wrote:
> Shouldn't we handle any pass-by-reference type the same? I suppose, a
> user-defined window function can return some other type, not int8.

Thanks for reporting this and to you and Richard for working on a fix.

I've just looked at it and it seems that valgrind is complaining
because a tuple formed by an upper-level WindowAgg contains a pointer
to free'd memory due to the byref type and eval_windowaggregates() not
having been executed to fill in ecxt_aggvalues and ecxt_aggnulls on
the lower-level WindowAgg.

Since upper-level WindowAggs cannot reference values calculated in
some lower-level WindowAgg, why can't we just NULLify the pointers
instead? See attached.

It is possible to have a monotonic window function that does not
return int8.  Technically something like MAX(text_col) OVER (PARTITION
BY somecol ORDER BY text_col) is monotonically increasing, it's just
that I didn't add a support function to tell the planner about that.
Someone could come along in the future and suggest we do that and show
us some convincing use case.  So whatever the fix, it cannot assume
the window function's return type is int8.

David
diff --git a/src/backend/executor/nodeWindowAgg.c 
b/src/backend/executor/nodeWindowAgg.c
index 81ba024bba..083a3b2386 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2234,6 +2234,21 @@ ExecWindowAgg(PlanState *pstate)
if (winstate->numaggs > 0)
eval_windowaggregates(winstate);
}
+   else if (!winstate->top_window)
+   {
+   /*
+* Otherwise, if we're not the top-window, we'd better 
fill the
+* aggregate values in with something, else they might 
be pointing
+* to freed memory.  These don't need to be valid since 
WindowAggs
+* above us cannot reference the result of another 
WindowAgg.
+*/
+   numfuncs = winstate->numfuncs;
+   for (i = 0; i < numfuncs; i++)
+   {
+   econtext->ecxt_aggvalues[i] = (Datum) 0;
+   econtext->ecxt_aggnulls[i] = true;
+   }
+   }
 
/*
 * If we have created auxiliary read pointers for the frame or 
group


Re: Amcheck verification of GiST and GIN

2022-11-24 Thread Jose Arthur Benetasso Villanova

Hello.

I reviewed this patch and I would like to share some comments.

It compiled with those 2 warnings:

verify_gin.c: In function 'gin_check_parent_keys_consistency':
verify_gin.c:481:38: warning: declaration of 'maxoff' shadows a previous 
local [-Wshadow=compatible-local]
  481 | OffsetNumber maxoff = 
PageGetMaxOffsetNumber(page);

  |  ^~
verify_gin.c:453:41: note: shadowed declaration is here
  453 | maxoff;
  | ^~
verify_gin.c:423:25: warning: unused variable 'heapallindexed' 
[-Wunused-variable]

  423 | boolheapallindexed = *((bool*)callback_state);
  | ^~


Also, I'm not sure about postgres' headers conventions, inside amcheck.h, 
there is "miscadmin.h" included, and inside verify_gin.c, verify_gist.h 
and verify_nbtree.c both amcheck.h and miscadmin.h are included.


About the documentation, the bt_index_parent_check has comments about the 
ShareLock and "SET client_min_messages = DEBUG1;", and both 
gist_index_parent_check and gin_index_parent_check lack it. verify_gin 
uses DEBUG3, I'm not sure if it is on purpose, but it would be nice to 
document it or put DEBUG1 to be consistent.


I lack enough context to do a deep review on the code, so in this area 
this patch needs more eyes.


I did the following test:

postgres=# create table teste (t text, tv tsvector);
CREATE TABLE
postgres=# insert into teste values ('hello', 'hello'::tsvector);
INSERT 0 1
postgres=# create index teste_tv on teste using gist(tv);
CREATE INDEX
postgres=# select pg_relation_filepath('teste_tv');
 pg_relation_filepath
--
 base/5/16441
(1 row)

postgres=#
\q
$ bin/pg_ctl -D data -l log
waiting for server to shut down done
server stopped
$ okteta base/5/16441 # I couldn't figure out the dd syntax to change the 
1FE9 to '0'

$ bin/pg_ctl -D data -l log
waiting for server to start done
server started
$ bin/psql -U ze postgres
psql (16devel)
Type "help" for help.

postgres=# SET client_min_messages = DEBUG3;
SET
postgres=# select gist_index_parent_check('teste_tv'::regclass, true);
DEBUG:  verifying that tuples from index "teste_tv" are present in "teste"
ERROR:  heap tuple (0,1) from table "teste" lacks matching index tuple 
within index "teste_tv"

postgres=#

A simple index corruption in gin:

postgres=# CREATE TABLE "gin_check"("Column1" int[]);
CREATE TABLE
postgres=# insert into gin_check values (ARRAY[1]),(ARRAY[2]);
INSERT 0 2
postgres=# CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
CREATE INDEX
postgres=# select pg_relation_filepath('gin_check_idx');
 pg_relation_filepath
--
 base/5/16453
(1 row)

postgres=#
\q
$ bin/pg_ctl -D data -l logfile stop
waiting for server to shut down done
server stopped
$ okteta data/base/5/16453 # edited some bits near 3FCC
$ bin/pg_ctl -D data -l logfile start
waiting for server to start done
server started
$ bin/psql -U ze postgres
psql (16devel)
Type "help" for help.

postgres=# SET client_min_messages = DEBUG3;
SET
postgres=# SELECT gin_index_parent_check('gin_check_idx', true);
ERROR:  number of items mismatch in GIN entry tuple, 49 in tuple header, 1 
decoded

postgres=#

There are more code paths to follow to check the entire code, and I had a 
hard time to corrupt the indices. Is there any automated code to corrupt 
index to test such code?



--
Jose Arthur Benetasso Villanova





Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-24 Thread Julien Rouhaud
On Fri, Nov 25, 2022 at 07:41:59AM +0900, Michael Paquier wrote:
> On Thu, Nov 24, 2022 at 05:07:24PM +0800, Julien Rouhaud wrote:
> > So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH.  Attached v22
> > that fixes it in all the places I found.
> 
> Sounds fine.  Added one comment, fixed one comment, and applied.
> Thanks!

Thanks a lot!




Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-24 Thread Peter Smith
Here are some review comments for v51-0001.

==

.../replication/logical/applyparallelworker.c

1. General - Error messages, get_worker_name()

I previously wrote a comment to ask if the get_worker_name() should be
used in more places but the reply [1, #2b] was:

> 2b.
> Consider if maybe all of these ought to be calling get_worker_name()
> which is currently static in worker.c. Doing this means any future
> changes to get_worker_name won't cause more inconsistencies.

The most error message in applyparallelxx.c can only use "xx parallel
worker", so I think it's fine not to call get_worker_name

~

I thought the reply missed the point I was trying to make -- I meant
if it was arranged now so *every* message would go via
get_worker_name() then in future somebody wanted to change the names
(e.g. from "logical replication parallel apply worker" to "LR PA
worker") then it would only need to be changed in one central place
instead of hunting down every hardwired error message.

Anyway, you can do it how you want -- I just was not sure you'd got my
original point.

~~~

2. HandleParallelApplyMessage

+ case 'X': /* Terminate, indicating clean exit. */
+ shm_mq_detach(winfo->error_mq_handle);
+ winfo->error_mq_handle = NULL;
+ break;
+ default:
+ elog(ERROR, "unrecognized message type received from logical
replication parallel apply worker: %c (message length %d bytes)",
+ msgtype, msg->len);

The case 'X' code indentation is too much.

==

src/backend/replication/logical/origin.c

3. replorigin_session_setup(RepOriginId node, int acquired_by)

@@ -1075,12 +1075,20 @@ ReplicationOriginExitCleanup(int code, Datum arg)
  * array doesn't have to be searched when calling
  * replorigin_session_advance().
  *
- * Obviously only one such cached origin can exist per process and the current
+ * Normally only one such cached origin can exist per process and the current
  * cached value can only be set again after the previous value is torn down
  * with replorigin_session_reset().
+ *
+ * However, we do allow multiple processes to point to the same origin slot if
+ * requested by the caller by passing PID of the process that has already
+ * acquired it as acquired_by. This is to allow multiple parallel apply
+ * processes to use the same origin, provided they maintain commit order, for
+ * example, by allowing only one process to commit at a time. For the first
+ * process requesting this origin, the acquired_by parameter needs to be set to
+ * 0.
  */
 void
-replorigin_session_setup(RepOriginId node)
+replorigin_session_setup(RepOriginId node, int acquired_by)

I think the meaning of the acquired_by=0 is not fully described here:
"For the first process requesting this origin, the acquired_by
parameter needs to be set to 0."
IMO that seems to be describing it only from POV that you are always
going to want to allow multiple processes. But really this is an
optional feature so you might pass acquired_by=0, not just because
this is the first of multiple, but also because you *never* want to
allow multiple at all. The comment does not convey this meaning.

Maybe something worded like below is better?

SUGGESTION
Normally only one such cached origin can exist per process so the
cached value can only be set again after the previous value is torn
down with replorigin_session_reset(). For this normal case pass
acquired_by=0 (meaning the slot is not allowed to be already acquired
by another process).

However, sometimes multiple processes can safely re-use the same
origin slot (for example, multiple parallel apply processes can safely
use the same origin, provided they maintain commit order by allowing
only one process to commit at a time). For this case the first process
must pass acquired_by=0, and then the other processes sharing that
same origin can pass acquired_by=PID of the first process.

==

src/backend/replication/logical/worker.c

4. GENERAL - get_worker_name()

If you decide it is OK to hardwire some error messages instead of
unconditionally calling the get_worker_name() -- see my #1 review
comment in this post -- then there are some other messages in this
file that also seem like they can be also hardwired because the type
of worker is already known.

Here are some examples:

4a.

+ else if (am_parallel_apply_worker())
+ {
+ if (rel->state != SUBREL_STATE_READY)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ /* translator: first %s is the name of logical replication worker */
+ errmsg("%s for subscription \"%s\" will stop",
+ get_worker_name(), MySubscription->name),
+ errdetail("Cannot handle streamed replication transactions using
parallel apply workers until all tables have been synchronized.")));
+
+ return true;
+ }

In the above code from should_apply_changes_for_rel we already know
this is a parallel apply worker.

~

4b.

+ if (am_parallel_apply_worker())
+ ereport(LOG,
+ /* translator: first %s is the name of logical replication worker */
+ (errmsg("%s for subscription \

Re: Bug in row_number() optimization

2022-11-24 Thread Richard Guo
On Fri, Nov 25, 2022 at 7:34 AM David Rowley  wrote:

> Since upper-level WindowAggs cannot reference values calculated in
> some lower-level WindowAgg, why can't we just NULLify the pointers
> instead? See attached.


Verified the problem is fixed with this patch.  I'm not familiar with
the WindowAgg execution codes.  As far as I understand, this patch works
because we set ecxt_aggnulls to true, making it a NULL value.  And the
top-level WindowAgg node's "Filter" is strict so that it can filter out
all the tuples that don't match the intermediate WindowAgg node's run
condition.  So I find the comments about "WindowAggs above us cannot
reference the result of another WindowAgg" confusing.  But maybe I'm
missing something.

Thanks
Richard


Re: Patch: Global Unique Index

2022-11-24 Thread Dilip Kumar
On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby  wrote:
> On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote:
> > but now you will have one gigantic index and which will be vacuumed
> > every time we vacuum any of the partitions.
>
> This patch isn't implemented as "one gigantic index", though.

If this patch is for supporting a global index then I expect that the
global index across all the partitions is going to be big.  Anyway, my
point was about vacuuming the common index every time you vacuum any
of the partitions of the table is not the right way and that will make
global indexes less usable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Patch: Global Unique Index

2022-11-24 Thread Dilip Kumar
On Fri, Nov 25, 2022 at 8:49 AM Dilip Kumar  wrote:
>
> On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby  wrote:
> > On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote:
> > > but now you will have one gigantic index and which will be vacuumed
> > > every time we vacuum any of the partitions.
> >
> > This patch isn't implemented as "one gigantic index", though.
>
> If this patch is for supporting a global index then I expect that the
> global index across all the partitions is going to be big.  Anyway, my
> point was about vacuuming the common index every time you vacuum any
> of the partitions of the table is not the right way and that will make
> global indexes less usable.

Okay, I got your point.  After seeing the details it seems instead of
supporting one common index it is just allowing uniqueness checks
across multiple index partitions.  Sorry for the noise.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Bug in row_number() optimization

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 16:00, Richard Guo  wrote:
> Verified the problem is fixed with this patch.  I'm not familiar with
> the WindowAgg execution codes.  As far as I understand, this patch works
> because we set ecxt_aggnulls to true, making it a NULL value.  And the
> top-level WindowAgg node's "Filter" is strict so that it can filter out
> all the tuples that don't match the intermediate WindowAgg node's run
> condition.  So I find the comments about "WindowAggs above us cannot
> reference the result of another WindowAgg" confusing.  But maybe I'm
> missing something.

There are two different pass-through modes that the WindowAgg can move
into when it detects that the run condition is no longer true:

1) WINDOWAGG_PASSTHROUGH and
2) WINDOWAGG_PASSTHROUGH_STRICT

#2 is used when the WindowAgg is the top-level one in this query
level. Remember we'll need multiple WindowAgg nodes when there are
multiple different windows to evaluate.  The reason that we need #1 is
that if there are multiple WindowAggs, then the top-level one (or just
any WindowAgg above it) might need all the rows from a lower-level
WindowAgg.  For example:

SELECT * FROM (SELECT row_number() over(order by id) rn, sum(qty) over
(order by date) qty from t) t where rn <= 10;

if the "order by id" window is evaluated first, we can't stop
outputting rows when rn <= 10 is no longer true as the "order by date"
window might need those.  In this case, once rn <= 10 is no longer
true, the WindowAgg for that window would go into
WINDOWAGG_PASSTHROUGH. This means we can stop window func evaluation
on any additional rows.  The final query will never see rn==11, so we
don't need to generate that.

The problem is that once the "order by id" window stops evaluating the
window funcs, if the window result is byref, then we leave junk in the
aggregate output columns.  Since we continue to output rows from that
WindowAgg for the top-level "order by date" window, we don't want to
form tuples with free'd memory.

Since nothing in the query will ever seen rn==11 and beyond, there's
no need to put anything in that part of the output tuple. We can just
make it an SQL NULL.

What I mean by "WindowAggs above us cannot reference the result of
another WindowAgg" is that the evaluation of sum(qty) over (order by
date) can't see the "rn" column. SQL does not allow it. If it did,
that would have to look something like:

SELECT * FROM (SELECT SUM(row_number() over (order by id)) over (order
by date) qty from t); -- not valid SQL

WINDOWAGG_PASSTHROUGH_STRICT not only does not evaluate window funcs,
it also does not even bother to store tuples in the tuple store. In
this case there's no higher-level WindowAgg that will need these
tuples, so we can just read our sub-node until we find the next
partition, or stop when there's no PARTITION BY clause.

Just thinking of the patch a bit more, what I wrote ends up
continually zeroing the values and marking the columns as NULL. Likely
we can just do this once when we do: winstate->status =
WINDOWAGG_PASSTHROUGH;  I'll test that out and make sure it works.

David




Re: Add sub-transaction overflow status in pg_stat_activity

2022-11-24 Thread Dilip Kumar
On Thu, Nov 24, 2022 at 2:26 AM Andres Freund  wrote:
>
> Indeed. This is why I was thinking that just alerting for overflowed xact
> isn't particularly helpful. You really want to see how much they overflow and
> how often.

I think the way of monitoring the subtransaction count and overflow
status is helpful at least for troubleshooting purposes.  By regularly
monitoring user will know which backend(pid) is particularly using
more subtransactions and prone to overflow and which backends are
actually frequently causing sub-overflow.

> I think they're just not always avoidable, even in a very well operated
> system.
>
>
> I wonder if we could lower the impact of suboverflowed snapshots by improving
> the representation in PGPROC and SnapshotData. What if we
>
> a) Recorded the min and max assigned subxid in PGPROC
>
> b) Instead of giving up in GetSnapshotData() once we see a suboverflowed
>PGPROC, store the min/max subxid of the proc in SnapshotData. We could
>reliably "steal" space for that from ->subxip, as we won't need to store
>subxids for that proc.
>
> c) When determining visibility with a suboverflowed snapshot we use the
>ranges from b) to check whether we need to do a subtrans lookup. I think
>that'll often prevent subtrans lookups.
>
> d) If we encounter a subxid whose parent is in progress and not in ->subxid,
>and subxcnt isn't the max, add that subxid to subxip. That's not free
>because we'd basically need to do an insertion sort, but likely still a lot
>cheaper than doing repeated subtrans lookups.
>
> I think we'd just need a one or two additional fields in SnapshotData.

+1

I think this approach will be helpful in many cases, especially when
only some of the backend is creating sub-overflow and impacting
overall system performance.  Now, most of the xids especially the top
xid will not fall in that range (unless that sub-overflowing backend
is constantly generating subxids and increasing its range) and the
lookups for that xids can be done directly in the snapshot's xip
array.

On another thought, in XidInMVCCSnapshot() in case of sub-overflow why
don't we look into the snapshot's xip array first and see if the xid
exists there? if not then we can look into the pg_subtrans SLRU and
fetch the top xid and relook again into the xip array.  It will be
more costly in cases where we do not find xid in the xip array because
then we will have to search this array twice but I think looking into
this array is much cheaper than directly accessing SLRU.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Add peer authentication TAP test

2022-11-24 Thread Anton A. Melnikov

Hello!

On Windows this test fails with error:
# connection error: 'psql: error: connection to server at "127.0.0.1", port 
x failed:
# FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "buildfarm", database 
"postgres", no encryption'

May be disable this test for windows like in 001_password.pl and 
002_saslprep.pl?


Best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit e4491b91729b307d29ce0205b455936b3a538373
Author: Anton A. Melnikov 
Date:   Fri Nov 25 07:40:11 2022 +0300

Skip test 003_peer.pl for windows like 001_password.pl and 002_saslprep.pl

diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl
index ce8408a4f8c..7454bf4a598 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -9,6 +9,11 @@ use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
+if (!$use_unix_sockets)
+{
+	plan skip_all =>
+	  "authentication tests cannot run without Unix-domain sockets";
+}
 
 # Delete pg_hba.conf from the given node, add a new entry to it
 # and then execute a reload to refresh it.


Re: [PATCH] Add peer authentication TAP test

2022-11-24 Thread Michael Paquier
On Fri, Nov 25, 2022 at 07:56:08AM +0300, Anton A. Melnikov wrote:
> On Windows this test fails with error:
> # connection error: 'psql: error: connection to server at "127.0.0.1", port 
> x failed:
> # FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "buildfarm", 
> database "postgres", no encryption'
> 
> May be disable this test for windows like in 001_password.pl and 
> 002_saslprep.pl?

You are not using MSVC but MinGW, are you?  The buildfarm members with
TAP tests enabled are drongo, fairywren, bowerbord and jacana.  Even
though none of them are running the tests from
src/test/authentication/, this is running on a periodic basis in the
CI, where we are able to skip the test in MSVC already: 
postgresql:authentication / authentication/003_peer SKIP 9.73s

So yes, it is plausible that we are missing more safeguards here.

Your suggestion to skip under !$use_unix_sockets makes sense, as not
having unix sockets is not going to work for peer and WIN32 needs SSPI
to be secure with pg_regress.  Where is your test failing?  On the
first $node->psql('postgres') at the beginning of the test?  Just
wondering..
--
Michael


signature.asc
Description: PGP signature


Re: Questions regarding distinct operation implementation

2022-11-24 Thread Ankit Kumar Pandey



On 25/11/22 02:14, David Rowley wrote:

On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey  wrote:

Please let me know any opinions on this.

I think if you're planning on working on this then step 1 would have
to be checking the SQL standard to see which set of rows it asks
implementations to consider for duplicate checks when deciding if the
transition should be performed or not.  Having not looked, I don't
know if this is the entire partition or just the rows in the current
frame.

Depending on what you want, an alternative today would be to run a
subquery to uniquify the rows the way you want and then do the window
function stuff.

David
Thanks David, these are excellent pointers, I will look into SQL 
standard first and so on.


--
Regards,
Ankit Kumar Pandey





Re: Support logical replication of DDLs

2022-11-24 Thread vignesh C
On Sun, 20 Nov 2022 at 09:29, vignesh C  wrote:
>
> On Fri, 11 Nov 2022 at 11:03, Peter Smith  wrote:
> >
> > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith  wrote:
> > >
> > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith  wrote:
> > > >
> > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith  
> > > > wrote:
> > > > >
> > > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > > >
> > > > > *** NOTE - my review post became too big, so I split it into smaller 
> > > > > parts.
> > > >
> > >
> >
> > THIS IS PART 4 OF 4.
> >
> > ===
> >
> > src/backend/commands/ddl_deparse.c
>
> Thanks for the comments, the attached v39 patch has the changes for the same.

One comment:
While fixing review comments, I found that default syntax is not
handled for create domain:
+   /*
+* Verbose syntax
+*
+* CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s %{constraints}s
+* %{collation}s
+*/
+   createDomain = new_objtree("CREATE");
+
+   append_object_object(createDomain,
+"DOMAIN %{identity}D AS",
+
new_objtree_for_qualname_id(TypeRelationId,
+
  objectId));
+   append_object_object(createDomain,
+"%{type}T",
+
new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));

Regards,
Vignesh




Remove a unused argument from qual_is_pushdown_safe

2022-11-24 Thread Yugo NAGATA
Hello,

I found that qual_is_pushdown_safe() has an argument "subquery"
that is not used in the function.  This argument has not been
referred to since the commit 964c0d0f80e485dd3a4073e073ddfd9bfdda90b2.

I think we can remove this if there is no special reason. 

Regards,
Yugo Nagata 

-- 
Yugo NAGATA 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4ddaed31a4..5327ea2803 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -136,8 +136,7 @@ static void check_output_expressions(Query *subquery,
 static void compare_tlist_datatypes(List *tlist, List *colTypes,
 	pushdown_safety_info *safetyInfo);
 static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query);
-static bool qual_is_pushdown_safe(Query *subquery, Index rti,
-  RestrictInfo *rinfo,
+static bool qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo,
   pushdown_safety_info *safetyInfo);
 static void subquery_push_qual(Query *subquery,
 			   RangeTblEntry *rte, Index rti, Node *qual);
@@ -2527,7 +2526,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 			Node	   *clause = (Node *) rinfo->clause;
 
 			if (!rinfo->pseudoconstant &&
-qual_is_pushdown_safe(subquery, rti, rinfo, &safetyInfo))
+qual_is_pushdown_safe(rti, rinfo, &safetyInfo))
 			{
 /* Push it down */
 subquery_push_qual(subquery, rte, rti, clause);
@@ -3813,7 +3812,7 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query)
  * found to be unsafe to reference by subquery_is_pushdown_safe().
  */
 static bool
-qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo,
+qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo,
 	  pushdown_safety_info *safetyInfo)
 {
 	bool		safe = true;


Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-24 Thread Amit Kapila
On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila  wrote:
>
> On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada  wrote:
> >
> > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila  
> > wrote:
> >
> > Agreed not to have a test case for this.
> >
> > I've attached an updated patch. I've confirmed this patch works for
> > all supported branches.
> >
>
> I have slightly changed the checks used in the patch, otherwise looks
> good to me. I am planning to push (v11-v15) the attached tomorrow
> unless there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add peer authentication TAP test

2022-11-24 Thread Anton A. Melnikov

Hello, thanks for rapid answer!

On 25.11.2022 08:18, Michael Paquier wrote:

You are not using MSVC but MinGW, are you?  The buildfarm members with
TAP tests enabled are drongo, fairywren, bowerbord and jacana.  Even
though none of them are running the tests from
src/test/authentication/, this is running on a periodic basis in the
CI, where we are able to skip the test in MSVC already:
postgresql:authentication / authentication/003_peer SKIP 9.73s


There is MSVC on my PC. The project was build with
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30136 for x64


So yes, it is plausible that we are missing more safeguards here.

Your suggestion to skip under !$use_unix_sockets makes sense, as not
having unix sockets is not going to work for peer and WIN32 needs SSPI
to be secure with pg_regress.  Where is your test failing?  On the
first $node->psql('postgres') at the beginning of the test?  Just
wondering..


The test fails almost at the beginning in reset_pg_hba call after
modification pg_hba.conf and node reloading:
#t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200)
#No subtests run

Logs regress_log_003_peer and 003_peer_node.log are attached.

With best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company2022-11-25 09:55:49.731 MSK [7648] LOG:  starting PostgreSQL 16devel, compiled by Visual C++ build 1929, 64-bit
2022-11-25 09:55:49.735 MSK [7648] LOG:  listening on IPv4 address "127.0.0.1", port 60161
2022-11-25 09:55:49.773 MSK [2892] LOG:  database system was shut down at 2022-11-25 09:55:49 MSK
2022-11-25 09:55:49.800 MSK [7648] LOG:  database system is ready to accept connections
2022-11-25 09:55:49.890 MSK [7648] LOG:  received SIGHUP, reloading configuration files
2022-11-25 09:55:50.003 MSK [84] [unknown] LOG:  connection received: host=127.0.0.1 port=49822
2022-11-25 09:55:50.007 MSK [84] [unknown] FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", no encryption
2022-11-25 09:55:50.114 MSK [3356] [unknown] LOG:  connection received: host=127.0.0.1 port=49823
2022-11-25 09:55:50.117 MSK [3356] [unknown] FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", no encryption
2022-11-25 09:55:50.201 MSK [7648] LOG:  received immediate shutdown request
2022-11-25 09:55:50.212 MSK [7648] LOG:  database system is shut down
# Checking port 60161
# Found port 60161
Name: node
Data directory: 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata
Backup directory: 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/backup
Archive directory: 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/archives
Connection string: port=60161 host=127.0.0.1
Log file: 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/log/003_peer_node.log
# Running: initdb -D 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata
 -A trust -N
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "English_United 
States.1252".
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory 
c:/projects/1c-master-7397/postgrespro/src/test/authentication/tmp_check/t_003_peer_node_data/pgdata
 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... windows
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Moscow
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok

Sync to disk skipped.
The data directory might become corrupt if the operating system crashes.

Success. You can now start the database server using:

pg_ctl -D 
^"c^:^/projects^/1c^-master^-7397^/postgrespro^/src^\test^\authentication^/tmp^_check^/t^_003^_peer^_node^_data^/pgdata^"
 -l logfile start

# Running: c:/projects/1c-master-7397/postgrespro/Release/pg_regress/pg_regress 
--config-auth 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata
### Starting node "node"
# Running: pg_ctl -w -D 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata
 -l 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/log/003_peer_node.log
 -o --cluster-name=node start
waiting for server to start done
server started
# Postmaster PID for node "node" is 7648
### Reloading node "node"
# Running: pg_ctl -D 
c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata
 reload
serve

Re: A problem about join ordering

2022-11-24 Thread Richard Guo
On Fri, Nov 11, 2022 at 11:24 PM Tom Lane  wrote:

> Richard Guo  writes:
> > I'm wondering whether we need to insist on being strict for the lower
> > OJ's min_righthand.  What if we instead check strictness for its whole
> > syn_righthand?
>
> Surely not.  What if the only point of strictness is for a rel that
> isn't part of the min_righthand?  Then we could end up re-ordering
> based on a condition that isn't actually strict for what we've
> chosen as the join's RHS.
>
> It might be possible to change the other part of the equation and
> consider the A/B join's min_righthand instead of syn_righthand
> while checking if Pcd uses A/B's RHS; but I'm not real sure about
> that either.


The problem described upthread occurs in the case where the lower OJ is
in our LHS.  For the other case where the lower OJ is in our RHS, it
seems we also have join ordering problem.  As an example, consider

 A leftjoin (B leftjoin (C leftjoin D on (Pcd)) on (Pbc)) on (Pac)

 A leftjoin ((B leftjoin C on (Pbc)) leftjoin D on (Pcd)) on (Pac)

The two forms are equal if we assume Pcd is strict for C, according to
outer join identity 3.

In the two forms we both decide that we cannot interchange the ordering
of A/C join and B/C join, because Pac uses B/C join's RHS.  So we add
B/C join's full syntactic relset to A/C join's min_righthand to preserve
the ordering.  However, in the first form B/C's full syntactic relset
includes {B, C, D}, while in the second form it only includes {B, C}.
As a result, the A/(BC) join is illegal in the first form and legal in
the second form, and this will determine whether we can get the third
form as below

 (A leftjoin (B leftjoin C on (Pbc)) on (Pac)) leftjoin D on (Pcd)

This makes me rethink whether we should use lower OJ's full syntactic
relset or just its min_lefthand + min_righthand to be added to
min_righthand to preserve ordering for this case.  But I'm not sure
about this.

Any thoughts?

Thanks
Richard


Re: [PATCH] Add peer authentication TAP test

2022-11-24 Thread Michael Paquier
On Fri, Nov 25, 2022 at 10:13:29AM +0300, Anton A. Melnikov wrote:
> The test fails almost at the beginning in reset_pg_hba call after
> modification pg_hba.conf and node reloading:
> #t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200)
> #No subtests run
> 
> Logs regress_log_003_peer and 003_peer_node.log are attached.

Yeah, that's failing exactly at the position I am pointing to.  I'll
go apply what you have..
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add peer authentication TAP test

2022-11-24 Thread Anton A. Melnikov



On 25.11.2022 10:34, Michael Paquier wrote:

On Fri, Nov 25, 2022 at 10:13:29AM +0300, Anton A. Melnikov wrote:

The test fails almost at the beginning in reset_pg_hba call after
modification pg_hba.conf and node reloading:
#t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200)
#No subtests run

Logs regress_log_003_peer and 003_peer_node.log are attached.


Yeah, that's failing exactly at the position I am pointing to.  I'll
go apply what you have..
--
Michael


Thanks!

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Does pg_rman support PG15?

2022-11-24 Thread T Adachi
Hi,

To the developers working on pg_rman, are there any plans to support PG15
and when might pg_rman source be released?
The latest version of pg_rman, V1.3.14, appears to be incompatible with
PG15.
(In PG15, pg_start_backup()/pg_stop_backup() have been renamed.)