Re: pgbench --partitions=0

2022-05-15 Thread Amit Langote
On Mon, May 16, 2022 at 2:41 PM Michael Paquier  wrote:
> On Mon, May 16, 2022 at 11:34:41AM +0900, Amit Langote wrote:
> > Attached a patch to fix with a test added.  cc'ing Michael who
> > authored that commit.
>
> Indeed, 6f164e6d got that incorrectly.  I don't really want to play
> with the master branch at this stage, even if this is trivial, but
> I'll fix it after the version is tagged.

Sounds good to me.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: pgbench --partitions=0

2022-05-15 Thread Michael Paquier
On Mon, May 16, 2022 at 11:34:41AM +0900, Amit Langote wrote:
> Attached a patch to fix with a test added.  cc'ing Michael who
> authored that commit.

Indeed, 6f164e6d got that incorrectly.  I don't really want to play
with the master branch at this stage, even if this is trivial, but
I'll fix it after the version is tagged.  Thanks for the report,
Amit.
--
Michael


signature.asc
Description: PGP signature


Re: Make relfile tombstone files conditional on WAL level

2022-05-15 Thread Dilip Kumar
On Thu, May 12, 2022 at 4:27 PM Amul Sul  wrote:
>
Hi Amul,

Thanks for the review, actually based on some comments from Robert we
have planned to make some design changes.  So I am planning to work on
that for the July commitfest.  I will try to incorporate all your
review comments in the new version.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-05-15 Thread Michael Paquier
On Sat, May 14, 2022 at 01:27:28AM -0700, Noah Misch wrote:
> Here, I requested the rationale for the differences you had just described.
> You made a choice to stop testing one list of database names and start testing
> a different list of database names.  Why?

Because the shape of the new names does not change the test coverage
("regression" prefix or the addition of the double quotes with
backslashes for all the database names), while keeping the code a bit
simpler.  If you think that the older names are more adapted, I have
no objections to use them, FWIW, which is something like the patch
attached would achieve.

This uses the same convention as vcregress.pl before 322becb, but not
the one of test.sh where "regression" was appended to the database
names.
--
Michael
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 8372a85e6e..33a75991d8 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -13,18 +13,16 @@ use Test::More;
 # Generate a database with a name made of a range of ASCII characters.
 sub generate_db
 {
-	my ($node, $from_char, $to_char) = @_;
+	my ($node, $prefix, $from_char, $to_char, $suffix) = @_;
 
-	my $dbname = '';
+	my $dbname = $prefix;
 	for my $i ($from_char .. $to_char)
 	{
 		next if $i == 7 || $i == 10 || $i == 13;# skip BEL, LF, and CR
 		$dbname = $dbname . sprintf('%c', $i);
 	}
 
-	# Exercise backslashes adjacent to double quotes, a Windows special
-	# case.
-	$dbname = '\\"\\' . $dbname . '"\\';
+	$dbname .= $suffix;
 	$node->command_ok([ 'createdb', $dbname ]);
 }
 
@@ -79,10 +77,12 @@ else
 {
 	# Default is to use pg_regress to set up the old instance.
 
-	# Create databases with names covering most ASCII bytes
-	generate_db($oldnode, 1,  45);
-	generate_db($oldnode, 46, 90);
-	generate_db($oldnode, 91, 127);
+	# Create databases with names covering most ASCII bytes.  The
+	# first name exercises backslashes adjacent to double quotes, a
+	# Windows special case.
+	generate_db($oldnode, "\\\"\\", 1,  45,  "\"\\");
+	generate_db($oldnode, '',   46, 90,  '');
+	generate_db($oldnode, '',   91, 127, '');
 
 	# Grab any regression options that may be passed down by caller.
 	my $extra_opts = $ENV{EXTRA_REGRESS_OPTS} || "";


signature.asc
Description: PGP signature


Re: list of TransactionIds

2022-05-15 Thread Michael Paquier
On Mon, May 16, 2022 at 07:58:37AM +0530, Amit Kapila wrote:
> I prefer to do this for pg16 unless we see some bug due to this.

Agreed.  This does not seem worth taking any risk with after beta1,
and v14 got released this way.
--
Michael


signature.asc
Description: PGP signature


Re: strange slow query - lost lot of time somewhere

2022-05-15 Thread Pavel Stehule
po 16. 5. 2022 v 6:11 odesílatel David Rowley  napsal:

> On Fri, 6 May 2022 at 21:27, David Rowley  wrote:
> > I've attached a patch to fix.  I'll look at it in more detail after the
> weekend.
>
> I've now pushed this fix to master and backpatched to 14.
>

Thank you

Pavel


>
> David
>


Re: strange slow query - lost lot of time somewhere

2022-05-15 Thread David Rowley
On Fri, 6 May 2022 at 21:27, David Rowley  wrote:
> I've attached a patch to fix.  I'll look at it in more detail after the 
> weekend.

I've now pushed this fix to master and backpatched to 14.

David




Re: Possible corruption by CreateRestartPoint at promotion

2022-05-15 Thread Michael Paquier
On Mon, May 09, 2022 at 09:24:06AM +0900, Michael Paquier wrote:
> Okay, applied this one on HEAD after going back-and-forth on it for
> the last couple of days.  I have found myself shaping the patch in
> what looks like its simplest form, by applying the check based on an
> older checkpoint to all the fields updated in the control file, with
> the check on DB_IN_ARCHIVE_RECOVERY applying to the addition of
> DB_SHUTDOWNED_IN_RECOVERY (got initialially surprised that this was
> having side effects on pg_rewind) and the minRecoveryPoint
> calculations.

It took me some time to make sure that this would be safe, but done
now for all the stable branches.

> Now, it would be nice to get a test for this stuff, and we are going
> to need something cheaper than what's been proposed upthread.  This
> comes down to the point of being able to put a deterministic stop
> in a restart point while it is processing, meaning that we need to
> interact with one of the internal routines of CheckPointGuts().  One
> fancy way to do so would be to forcibly take a LWLock to stuck the
> restart point until it is released.  Using a SQL function for that
> would be possible, if not artistic.  Perhaps we don't need such a
> function though, if we could stuck arbitrarily the internals of a 
> checkpoint?  Any ideas?

One thing that we could do here is to resurrect the patch that adds
support for stop points in the code..
--
Michael


signature.asc
Description: PGP signature


Re: Backends stunk in wait event IPC/MessageQueueInternal

2022-05-15 Thread Japin Li


On Sat, 14 May 2022 at 11:01, Thomas Munro  wrote:
> On Sat, May 14, 2022 at 10:25 AM Thomas Munro  wrote:
>> Japin, are you able to reproduce the problem reliably?  Did I guess
>> right, that you're on illumos?  Does this help?  I used
>> defined(__sun__) to select the option, but I don't remember if that's
>> the right way to detect that OS family, could you confirm that, or
>> adjust as required?
>
> Better version.  Now you can independently set -DWAIT_USE_{POLL,EPOLL}
> and -DWAIT_USE_{SELF_PIPE,SIGNALFD} for testing, and it picks a
> sensible default.

Thanks for your patch!  The illumos already defined the following macros.

$ gcc -dM -E - 

RE: Skipping schema changes in publication

2022-05-15 Thread osumi.takami...@fujitsu.com
On Saturday, May 14, 2022 10:33 PM vignesh C  wrote:
> Thanks for the comments, the attached v5 patch has the changes for the same.
> Also I have made the changes for SKIP Table based on the new syntax, the
> changes for the same are available in
> v5-0002-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patch.
Hi,


Thank you for updating the patch.
I'll share few minor review comments on v5-0001.


(1) doc/src/sgml/ref/alter_publication.sgml

@@ -73,12 +85,13 @@ ALTER PUBLICATION name RENAME TO ADD ALL TABLES IN SCHEMA and
SET ALL TABLES IN SCHEMA to a publication requires the
-   invoking user to be a superuser.  To alter the owner, you must also be a
-   direct or indirect member of the new owning role. The new owner must have
-   CREATE privilege on the database.  Also, the new owner
-   of a FOR ALL TABLES or FOR ALL TABLES IN
-   SCHEMA publication must be a superuser. However, a superuser can
-   change the ownership of a publication regardless of these restrictions.
+   invoking user to be a superuser.  RESET of publication
+   requires the invoking user to be a superuser. To alter the owner, you must
...


I suggest to combine the first part of your change with one existing sentence
before your change, to make our description concise.

FROM:
"The ADD ALL TABLES IN SCHEMA and
SET ALL TABLES IN SCHEMA to a publication requires the
invoking user to be a superuser.  RESET of publication
requires the invoking user to be a superuser."

TO:
"The ADD ALL TABLES IN SCHEMA,
SET ALL TABLES IN SCHEMA to a publication and
RESET of publication requires the invoking user to be a 
superuser."


(2) typo

+++ b/src/backend/commands/publicationcmds.c
@@ -53,6 +53,13 @@
 #include "utils/syscache.h"
 #include "utils/varlena.h"

+#define PUB_ATION_INSERT_DEFAULT true
+#define PUB_ACTION_UPDATE_DEFAULT true


Kindly change
FROM:
"PUB_ATION_INSERT_DEFAULT"
TO:
"PUB_ACTION_INSERT_DEFAULT"


(3) src/test/regress/expected/publication.out

+-- Verify that only superuser can reset a publication
+ALTER PUBLICATION testpub_reset OWNER TO regress_publication_user2;
+SET ROLE regress_publication_user2;
+ALTER PUBLICATION testpub_reset RESET; -- fail


We have "-- fail" for one case in this patch.
On the other hand, isn't better to add "-- ok" (or "-- success") for
other successful statements,
when we consider the entire tests description consistency ?


Best Regards,
Takamichi Osumi



Re: PostgreSQL 15 Beta 1 release announcement draft

2022-05-15 Thread Amit Kapila
On Sun, May 15, 2022 at 12:22 AM Jonathan S. Katz  wrote:
>
> Please provide feedback no later than 2022-05-19 0:00 AoE[1].
>

> [`recovery_prefetch`](https://www.postgresql.org/docs/15/runtime-config-wal.html#GUC-RECOVERY-PREFETCH)
> that can help speed up all recovery operations by prefetching data blocks.

Is it okay to say that this feature speeds up *all* recovery
operations? See the discussion between Simon and Tomas [1] related to
this.

[1] - 
https://www.postgresql.org/message-id/3f4d65c8-ad61-9f57-d5ad-6c1ea841e471%40enterprisedb.com

-- 
With Regards,
Amit Kapila.




pgbench --partitions=0

2022-05-15 Thread Amit Langote
Hi,

I noticed that $subject causes an error in HEAD:

$ pgbench -i --partitions=0
pgbench: error: --partitions must be in range 1..2147483647

However, it works in v13 and v14, assuming no partitions.

I think the commit 6f164e6d17 may have unintentionally broken it, by
introducing this hunk:

@@ -6135,12 +6116,9 @@ main(int argc, char **argv)
break;
case 11:/* partitions */
initialization_option_set = true;
-   partitions = atoi(optarg);
-   if (partitions < 0)
-   {
-   pg_log_fatal("invalid number of partitions:
\"%s\"", optarg);
+   if (!option_parse_int(optarg, "--partitions", 1, INT_MAX,
+ ))
exit(1);
-   }

Attached a patch to fix with a test added.  cc'ing Michael who
authored that commit.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


pgbench-accept-0-partitions.patch
Description: Binary data


Re: list of TransactionIds

2022-05-15 Thread Amit Kapila
On Sun, May 15, 2022 at 5:05 PM Alvaro Herrera  wrote:
>
> On 2022-May-14, Amit Kapila wrote:
>
> > On Sat, May 14, 2022 at 1:57 AM Alvaro Herrera  
> > wrote:
> > >
> > > We didn't have any use of TransactionId as members of List, until
> > > RelationSyncEntry->streamed_txns was introduced (464824323e57, pg14).
> > > It's currently implemented as a list of int.  This is not wrong at
> > > present, but it may soon be, and I'm sure it rubs some people the wrong
> > > way.
> > >
> > > But is the rubbing way wrong enough to add support for TransactionId in
> > > pg_list.h, including, say, T_XidList?
> >
> > +1. I don't know if we have a need for this at other places but I feel
> > it is a good idea to make its current use better.
>
> I hesitate to add this the day just before beta.  This is already in
> pg14, so maybe it's not a big deal if pg15 remains the same for the time
> being.  Or we can change it for beta2.  Or we could just punt until
> pg16.  Any preferences?
>

I prefer to do this for pg16 unless we see some bug due to this.

-- 
With Regards,
Amit Kapila.




RE: First draft of the PG 15 release notes

2022-05-15 Thread osumi.takami...@fujitsu.com
On Saturday, May 14, 2022 12:07 AM 'Bruce Momjian'  wrote:
> On Fri, May 13, 2022 at 01:36:04AM +, osumi.takami...@fujitsu.com wrote:
> > >   
> > >   This is enabled with the subscriber option "disable_on_error"
> > >   and avoids possible infinite loops during stream application.
> > >   
> > >   
> > Thank you !
> >
> > In this last paragraph, how about replacing "infinite loops"
> > with "infinite error loops" ? I think it makes the situation somewhat
> > clear for readers.
> 
> Agreed, done.
Thanks !


Best Regards,
Takamichi Osumi





Re: Making JIT more granular

2022-05-15 Thread Andy Fan
>
>
> 2. You calculate the cost to compare with jit_above_cost as:
>
> plan->total_cost * plan->est_loops.
>
> An alternative way might be to consider the rescan cost like
> cost_rescan. This should be closer for a final execution cost.
> However since it is hard to set a reasonable jit_above_cost,
> so I am feeling the current way is OK as well.
>

There are two observers after thinking more about this.  a).  due to the
rescan cost reason,  plan->total_cost * plan->est_loops might be greater
than the whole plan's total_cost.  This may cause users to be confused why
this change can make the plan not JITed in the past,  but JITed now.

explain analyze select * from t1, t2 where t1.a  = t2.a;
  QUERY PLAN


 Nested Loop  (cost=0.00..154.25 rows=100 width=16) (actual
time=0.036..2.618 rows=100 loops=1)Join Filter: (t1.a = t2.a)Rows
Removed by Join Filter: 9900->  Seq Scan on t1  (cost=0.00..2.00
rows=100 width=8) (actual time=0.015..0.031 rows=100 loops=1)->
Materialize  (cost=0.00..2.50 rows=100 width=8) (actual time=0.000..0.010
rows=100 loops=100)  ->  Seq Scan on t2  (cost=0.00..2.00 rows=100
width=8) (actual time=0.007..0.023 rows=100 loops=1)  Planning Time: 0.299
ms  Execution Time: 2.694 ms (8 rows)

The overall plan's total_cost is 154.25, but the Materialize's JIT cost is
2.5 * 100 = 250.

b). Since the total_cost for a plan counts all the costs for its children,
so if one
child plan is JITed, I think all its parents would JITed. Is this by
design?

 QUERY PLAN

 Sort
   Sort Key: (count(*))
   ->  HashAggregate
 Group Key: a
 ->  Seq Scan on t1

(If Seq Scan is JITed, both HashAggregate & Sort will be JITed.)

-- 
Best Regards
Andy Fan


Re: gitmaster access

2022-05-15 Thread Tatsuo Ishii
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Bruce Momjian  writes:
>> > On Wed, May 11, 2022 at 08:59:26PM -0400, Bruce Momjian wrote:
>> >> On Thu, May 12, 2022 at 09:04:38AM +0900, Tatsuo Ishii wrote:
>> >>> The last line should be 
>> >>> "ssh://g...@gitmaster.postgresql.org/postgresql.git"?
>> 
>> > I assume the URL list at:
>> >https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary
>> > is for non-committers.
>> 
>> Yeah, I agree with that.  If we advertise the gitmaster address here,
>> the primary result will be that we get a lot of complaints from random
>> people complaining that they can't access it.  A secondary result
>> is likely to be an increase in attacks against that server.
> 
> I don't think we could change it very easily without some ugly hacking
> of the tool that generates that page too, which is no good...
> 
> We might be able to get rid of the ssh:// URL there though... Will look
> into that.

For postgresql.git, I agree. But for other repositories, I do not agree.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Minor improvements to test log navigability

2022-05-15 Thread Thomas Munro
Hi,

Speaking as someone who regularly trawls through megabytes of build farm output:

1.  It seems a bit useless to have a load of "FATAL:  the database
system is in recovery mode" spam whenever the server crashes under
src/test/regress.  Any reason not to just turn that off, as we do for
the TAP tests?

2.  The TAP test logs are strangely named.  Any reason not to call
them 001_testname.log, instead of regress_log_001_testname, so they
appear next to the corresponding
001_testname_{primary,standby,xxx}.log in directory listings (CI) and
dumps (build farm, presumably), and have a traditional .log suffix?
From 2ddc124126ad0bb635023b805dbb6ad1b30fc863 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 16 May 2022 10:34:04 +1200
Subject: [PATCH 1/2] Use restart_after_crash=off for regression tests.

Continuing to try to connect while the server is not accepting
connections in crash recovery produces a lot of useless extra log
material.  Let's not do that.
---
 src/test/regress/pg_regress.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 982801e029..9a04007651 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2376,6 +2376,7 @@ regression_main(int argc, char *argv[],
 		snprintf(buf, sizeof(buf),
  "\"%s%spostgres\" -D \"%s/data\" -F%s "
  "-c \"listen_addresses=%s\" -k \"%s\" "
+ "-c \"restart_after_crash=off\" "
  "> \"%s/log/postmaster.log\" 2>&1",
  bindir ? bindir : "",
  bindir ? "/" : "",
-- 
2.36.0

From aa24fa3b8b25bfc237ca007f9986fc2b8c9c37d7 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 16 May 2022 10:36:30 +1200
Subject: [PATCH 2/2] Rename TAP test log output files.

Instead of "regress_log_001_testname", use "001_testname.log".  This
will cluster them with the corresponding server logs in sorted directory
listings (CI) and dumps (build farm).
---
 .cirrus.yml| 1 -
 src/test/perl/PostgreSQL/Test/Utils.pm | 2 +-
 src/test/perl/README   | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae55..e7232b7a7c 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -33,7 +33,6 @@ on_failure: _failure
 paths:
   - "**/*.log"
   - "**/*.diffs"
-  - "**/regress_log_*"
 type: text/plain
 
 task:
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 1ca2cc5917..8d7e20b0eb 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -201,7 +201,7 @@ INIT
 	# Open the test log file, whose name depends on the test name.
 	$test_logfile = basename($0);
 	$test_logfile =~ s/\.[^.]+$//;
-	$test_logfile = "$log_path/regress_log_$test_logfile";
+	$test_logfile = "$log_path/$test_logfile.log";
 	open my $testlog, '>', $test_logfile
 	  or die "could not open STDOUT to logfile \"$test_logfile\": $!";
 
diff --git a/src/test/perl/README b/src/test/perl/README
index 4b160cce36..1227944132 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -19,7 +19,7 @@ make check-world PROVE_FLAGS='--verbose'
 
 When a test fails, the terminal output from 'prove' is usually not sufficient
 to diagnose the problem.  Look into the log files that are left under
-tmp_check/log/ to get more info.  Files named 'regress_log_XXX' are log
+tmp_check/log/ to get more info.  Files named '001_testname.log' are log
 output from the perl test scripts themselves, and should be examined first.
 Other files are postmaster logs, and may be helpful as additional data.
 
-- 
2.36.0



Re: [RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog

2022-05-15 Thread Tomas Vondra
On 5/15/22 21:55, Matthias van de Meent wrote:
> Note: I am not (currently) planning on implementing this rough idea,
> just putting it up to share and document the idea, on request of Tomas
> (cc-ed).
> 
> The excellent pgconf.de presentation on PostgreSQL's extended
> statistics system by Tomas Vondra [0] talked about how the current
> default statistics assume the MCVs of columns to be fully independent,
> i.e. values of column A do not imply any value of columns B and C, and
> that for accurate data on correllated values the user needs to
> manually create statistics on the combined columns (by either
> STATISTICS or by INDEX).
> 
> This is said to be due to limitations in our statistics collector: to
> determine the fraction of the table that contains the value, we store
> the N most common values with the fraction of their occurrance in the
> table. This value is quite exact, but combining these values proves
> difficult: there is nothing in the stored value that can confidently
> include or exclude parts of the table from a predicate using that MCV,
> so we can only assume that the values of two columns are independent.
> 
> After the presentation it came to me that if we were to add an
> estimator for the number of rows with that value to the MCV lists in
> the form of HLL sketches (in addition to or replacing the current
> most_common_elem_freqs fractions), we would be able to make better
> estimates for multi-column filters by combining the HLL row
> cardinality sketches for filters that filter on these MCVs. This would
> remove the immediate need for manual statistics with an cartesian
> product of the MCVs of those columns with their occurrance fractions,
> which significantly reduces the need for the creation of manual
> statistics - the need that exists due to planner mis-estimates in
> correlated columns. Custom statistics will still be required for
> expression statistics, but column correlation estimations _within
> MCVs_ is much improved.
> 
> How I imagine this would work is that for each value in the MCV, an
> HLL is maintained that estimates the amount of distinct tuples
> containing that value. This can be h(TID) or h(PK), or anything else
> that would uniquely identify returned tuples. Because the keyspace of
> all HLLs that are generated are on the same table, you can apply join
> and intersection operations on the HLLs of the MCVs (for OR and
> AND-operations respectively), and provide fairly accurately estimates
> for the amount of tuples that would be returned by the filter on that
> table.
> > The required size of the HLL sketches can be determined by the amount
> of tuples scanned during analyze, potentially reducing the size
> required to store these HLL sketches from the usual 1.5kB per sketch
> to something smaller - we'll only ever need to count nTuples distinct
> values, so low values for default_statistics_target would allow for
> smaller values for m in the HLL sketches, whilst still providing
> fairly accurate result estimates.
> 

I think it's an interesting idea. In principle it allows deducing the
multi-column MCV for arbitrary combination of columns, not determined in
advance. We'd have the MCV with HLL instead of frequencies for columns
A, B and C:

(a1, hll(a1))
(a2, hll(a2))
(...)
(aK, hll(aK))


(b1, hll(b1))
(b2, hll(b2))
(...)
(bL, hll(bL))

(c1, hll(c1))
(c2, hll(c2))
(...)
(cM, hll(cM))

and from this we'd be able to build MCV for any combination of those
three columns.

And in some sense it might even be more efficient/accurate, because the
MCV on (A,B,C) might have up to K*L*M items. if there's 100 items in
each column, that'd be 1,000,000 combinations, which we can't really
store (target is up to 10k). And even if we could, it'd be 1M
combinations with frequencies (so ~8-16B per combination).

While with the MCV/HLL, we'd have 300 items and HLL. Assuming 256-512B
HLL would be enough, that's still way smaller than the multi-column MCV.

Even with target=10k it'd still be cheaper to store the separate MCV
with HLL values, if I count right, and there'd be no items omitted from
the MCV.

> Kind regards,
> 
> Matthias van de Meent
> 
> PS: Several later papers correctly point out that HLL can only count
> up to 2^32 due to the use of a hash function that outputs only 32
> bits; which is not enough for large tables. HLL++ solves this by using
> a hash function that outputs 64 bits, and can thus be considered a
> better alternative which provides the same features. But, any other
> sketch that provides an accurate (but not necessarily: perfect)
> count-distinct of which results can be combined should be fine as
> well.
> 

I don't think the 32-bit limitation is a problem for us, because we'd be
only ever build HLL on a sample, not the whole table. And the samples
are limited to 3M rows (with statistics target = 10k), so we're nowhere
near the scale requiring 64-bit hashes.

Presumably the statistics target value would determine the necessary HLL
parameters (and 

[RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog

2022-05-15 Thread Matthias van de Meent
Note: I am not (currently) planning on implementing this rough idea,
just putting it up to share and document the idea, on request of Tomas
(cc-ed).

The excellent pgconf.de presentation on PostgreSQL's extended
statistics system by Tomas Vondra [0] talked about how the current
default statistics assume the MCVs of columns to be fully independent,
i.e. values of column A do not imply any value of columns B and C, and
that for accurate data on correllated values the user needs to
manually create statistics on the combined columns (by either
STATISTICS or by INDEX).

This is said to be due to limitations in our statistics collector: to
determine the fraction of the table that contains the value, we store
the N most common values with the fraction of their occurrance in the
table. This value is quite exact, but combining these values proves
difficult: there is nothing in the stored value that can confidently
include or exclude parts of the table from a predicate using that MCV,
so we can only assume that the values of two columns are independent.

After the presentation it came to me that if we were to add an
estimator for the number of rows with that value to the MCV lists in
the form of HLL sketches (in addition to or replacing the current
most_common_elem_freqs fractions), we would be able to make better
estimates for multi-column filters by combining the HLL row
cardinality sketches for filters that filter on these MCVs. This would
remove the immediate need for manual statistics with an cartesian
product of the MCVs of those columns with their occurrance fractions,
which significantly reduces the need for the creation of manual
statistics - the need that exists due to planner mis-estimates in
correlated columns. Custom statistics will still be required for
expression statistics, but column correlation estimations _within
MCVs_ is much improved.

How I imagine this would work is that for each value in the MCV, an
HLL is maintained that estimates the amount of distinct tuples
containing that value. This can be h(TID) or h(PK), or anything else
that would uniquely identify returned tuples. Because the keyspace of
all HLLs that are generated are on the same table, you can apply join
and intersection operations on the HLLs of the MCVs (for OR and
AND-operations respectively), and provide fairly accurately estimates
for the amount of tuples that would be returned by the filter on that
table.

The required size of the HLL sketches can be determined by the amount
of tuples scanned during analyze, potentially reducing the size
required to store these HLL sketches from the usual 1.5kB per sketch
to something smaller - we'll only ever need to count nTuples distinct
values, so low values for default_statistics_target would allow for
smaller values for m in the HLL sketches, whilst still providing
fairly accurate result estimates.

Kind regards,

Matthias van de Meent

PS: Several later papers correctly point out that HLL can only count
up to 2^32 due to the use of a hash function that outputs only 32
bits; which is not enough for large tables. HLL++ solves this by using
a hash function that outputs 64 bits, and can thus be considered a
better alternative which provides the same features. But, any other
sketch that provides an accurate (but not necessarily: perfect)
count-distinct of which results can be combined should be fine as
well.

[0] 
https://www.postgresql.eu/events/pgconfde2022/schedule/session/3704-an-overview-of-extended-statistics-in-postgresql/




Re: Reproducible coliisions in jsonb_hash

2022-05-15 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > Here, that doesn't seem too likely. You could have a column that
> > contains 'tom' and ['tom'] and [['tom']] and [[['tom']]] and so forth
> > and they all get mapped onto the same bucket and you're sad. But
> > probably not.
> 
> Yeah, that might be a more useful way to think about it: is this likely
> to cause performance-critical collisions in practice?  I agree that
> that doesn't seem like a very likely situation, even given that you
> might be using json for erratically-structured data.

Particularly for something like jsonb (but maybe other things?) having a
hash function that could be user-defined or at least have some options
seems like it would be quite nice (similar to compression...).  If we
were to go in the direction of changing this, I'd suggest that we try to
make it something where the existing function could still be used while
also allowing a new one to be used.  More flexibility would be even
better, of course (column-specific hash functions comes to mind...).

Agreed with the general conclusion here also, just wanted to share some
thoughts on possible future directions to go in.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: gitmaster access

2022-05-15 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Bruce Momjian  writes:
> > On Wed, May 11, 2022 at 08:59:26PM -0400, Bruce Momjian wrote:
> >> On Thu, May 12, 2022 at 09:04:38AM +0900, Tatsuo Ishii wrote:
> >>> The last line should be 
> >>> "ssh://g...@gitmaster.postgresql.org/postgresql.git"?
> 
> > I assume the URL list at:
> >https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary
> > is for non-committers.
> 
> Yeah, I agree with that.  If we advertise the gitmaster address here,
> the primary result will be that we get a lot of complaints from random
> people complaining that they can't access it.  A secondary result
> is likely to be an increase in attacks against that server.

I don't think we could change it very easily without some ugly hacking
of the tool that generates that page too, which is no good...

We might be able to get rid of the ssh:// URL there though... Will look
into that.

> The onboarding process for new committers should include explaining
> about the separate master repo and how they can access it, but that
> is absolutely not something we should advertise widely.

It does.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Asynchronous and "direct" IO support for PostgreSQL.

2022-05-15 Thread Bharath Rupireddy
On Wed, Sep 1, 2021 at 11:27 AM Andres Freund  wrote:
>
> Hi,
>
> Attached is an updated patch AIO series. The major changes are:

Hi Andres, is there a plan to get fallocate changes alone first? I think
fallocate API can help parallel inserts work (bulk relation extension
currently writes zero filled-pages) and make pre-padding while allocating
WAL files faster.

Regards,
Bharath Rupireddy.


Re: Make name optional in CREATE STATISTICS

2022-05-15 Thread Pavel Stehule
ne 15. 5. 2022 v 14:20 odesílatel Simon Riggs 
napsal:

> Currently, CREATE STATS requires you to think of a name for each stats
> object, which is fairly painful, so users would prefer an
> automatically assigned name.
>
> Attached patch allows this, which turns out to be very simple, since a
> name assignment function already exists.
>
> The generated name is simple, but that's exactly what users do anyway,
> so it is not too bad.
>
>
good idea

Pavel


> Tests, docs included.
>
> --
> Simon Riggshttp://www.EnterpriseDB.com/
>


Re: make MaxBackends available in _PG_init

2022-05-15 Thread Julien Rouhaud
On Fri, May 13, 2022 at 09:49:54AM -0400, Robert Haas wrote:
>
> Committed.

Thanks!

For the record I submitted patches or pull requests this weekend for all
repositories I was aware of to fix the compatibility with this patch, hoping
that it will save some time to the packagers when they will take care of the
beta.




Re: [PATCH] New [relation] option engine

2022-05-15 Thread Alvaro Herrera
I'm sorry if you've already said this elsewhere, but can you please
state what is the *intention* of this patchset?  If it's a pure
refactoring (but I don't think it is), then it's a net loss, because
after pgindent it summarizes as:

 58 files changed, 2714 insertions(+), 2368 deletions(-)

so we end up 500+ lines worse than the initial story.  However, I
suspect that's not the final situation, since I saw a comment somewhere
that opclass options have to be rewritten to modify this mechanism, and
I suspect that will remove a few lines.  And you maybe have a more
ambitious goal.  But what is it?

Please pgindent your code for the next submission, making sure to add
your new typedef(s) to typedefs.list so that it doesn't generate stupid
spaces.  After pgindenting you'll notice the argument lists of some
functions look bad (cf. commit c4f113e8fef9).  Please fix that too.

I notice that you kept the commentary about lock levels in the place
where they were previously defined.  This is not good.  Please move each
explanation next to the place where each option is defined.

For next time, please use "git format-patch" for submission, and write a
tentative commit message.  The committer may or may not use your
proposed commit message, but with it they will know what you're trying
to achieve.

The translatability marker for detailmsg for enums is wrong AFAICT.  You
need gettext_noop() around the strings themselves IIRC.  I think you
need to get rid of the _() call around the variable that receives that
value and use errdetail() instead of errdetail_internal(), to avoid
double-translating it; but I'm not 100% sure.  Please experiment with
"make update-po" until you get the messages in the .po file.  

You don't need braces around single-statement blocks.

Thanks

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)




Make name optional in CREATE STATISTICS

2022-05-15 Thread Simon Riggs
Currently, CREATE STATS requires you to think of a name for each stats
object, which is fairly painful, so users would prefer an
automatically assigned name.

Attached patch allows this, which turns out to be very simple, since a
name assignment function already exists.

The generated name is simple, but that's exactly what users do anyway,
so it is not too bad.

Tests, docs included.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


create_stats_opt_name.v3.patch
Description: Binary data


Re: list of TransactionIds

2022-05-15 Thread Alvaro Herrera
On 2022-May-14, Amit Kapila wrote:

> On Sat, May 14, 2022 at 1:57 AM Alvaro Herrera  
> wrote:
> >
> > We didn't have any use of TransactionId as members of List, until
> > RelationSyncEntry->streamed_txns was introduced (464824323e57, pg14).
> > It's currently implemented as a list of int.  This is not wrong at
> > present, but it may soon be, and I'm sure it rubs some people the wrong
> > way.
> >
> > But is the rubbing way wrong enough to add support for TransactionId in
> > pg_list.h, including, say, T_XidList?
> 
> +1. I don't know if we have a need for this at other places but I feel
> it is a good idea to make its current use better.

I hesitate to add this the day just before beta.  This is already in
pg14, so maybe it's not a big deal if pg15 remains the same for the time
being.  Or we can change it for beta2.  Or we could just punt until
pg16.  Any preferences?

(Adding this to pg14 seems out of the question.  It's probably okay
ABI-wise to add a new node tag at the end of the list, but I'm not sure
it's warranted.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)