Re: Compressed TOAST Slicing

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 08:38:56PM +, Regina Obe wrote:
> I tested on windows mingw64 (as of a week ago) and confirmed the
> patch applies cleanly and significantly faster for left, substr
> tests than head. 

int32
pglz_decompress(const char *source, int32 slen, char *dest,
-   int32 rawsize)
+   int32 rawsize, bool is_slice)
The performance improvements are nice, but breaking a published API is
less nice particularly since some work has been done to make pglz more
plugabble (see 60838df9, guess how wrote that).  Could it be possible
to rework this part please?  It's been some time since I touched this
code, but it would be really nice if we don't have an extra parameter,
and just not bypass the sanity checks at the end.  Using a parameter
to bypass those checks may cause problems for future callers of it.
--
Michael


signature.asc
Description: PGP signature


Re: Compressed TOAST Slicing

2019-03-11 Thread Andrey Borodin
Hi!

> 21 февр. 2019 г., в 23:50, Paul Ramsey  написал(а):
> 
> Merci! Attached are updated patches.
> 

As noted before, patches are extremely useful.
So, I've looked into the code too.

I've got some questions about pglz_decompress() changes:

1.
+   if (dp >= destend)  /* check for 
buffer overrun */
+   break;  /* do not 
clobber memory */
This is done inside byte-loop. But can we just calculate len = min(len, destend 
- dp) beforehand?

2. Function argument is_slice is only preventing error.

+* If we are slicing, then we won't necessarily
+* be at the end of the source or dest buffers
+* when we hit a stop, so we don't test then.

But I do not get why we should get that error. If we have limited dest, why we 
should not fill it entirely?

3. And I'd use memmove despite the comment why we do not do that. It is 
SSE-optimized and cache-optimized nowadays.
But this in not point of this patch, so let's discard this point.

Best regards, Andrey Borodin.


Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 02:35:49PM +, Georgios Kokolatos wrote:
> To be honest, I have not checked closely on the failure, still it is
> the only test failing which by itself should be worthwhile
> mentioning. 

This works for me, as well as a plain installcheck and check from the
subpath.  Are you sure that contrib/pageinspect is installed in the
set of binaries your tests are using?  What do the logs of the server
tell you?  These are located in src/test/recovery/tmp_check/.  Please
note that when running installcheck we assume that all the needed
components are installed.
--
Michael


signature.asc
Description: PGP signature


Re: Suggestions on message transfer among backends

2019-03-11 Thread Kyotaro HORIGUCHI
Hello.

At Mon, 11 Mar 2019 21:37:32 +0800, Andy Fan  wrote 
in 
> notes on the shared hash map:   it needs multi writers and multi readers.
> 
> On Mon, Mar 11, 2019 at 9:36 PM Andy Fan  wrote:
> 
> > Hi:
> >   I need some function which requires some message exchange among
> > different back-ends (connections).
> > specially I need a shared hash map and a message queue.
> >
> > Message queue:  it should be many writers,  1 reader.   Looks POSIX
> > message queue should be OK, but postgre doesn't use it.  is there any
> > equivalent in PG?
> >
> > shared hash map:  the number of items can be fixed and the value can be
> > fixed as well.
> >
> > any keywords or explanation will be extremely helpful.

I suppose that you are writing an extension or tweaking the core
code in C source. dshash (dynamic shared hash) would work for you
as shared hash map, and is shm_mq usable as the message queue?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-11 Thread Tom Lane
I wrote:
> I've successfully done check-world after renumbering every OID above
> 4000 to somewhere else.  I also tried renumbering everything below
> 4000, which unsurprisingly blew up because there are various catalog
> columns we haven't fixed to use symbolic OIDs.  (The one that initdb
> immediately trips over is pg_database.dattablespace.)  I'm not sure
> if it's worth the trouble to make that totally clean, but I suspect
> we ought to at least mop up text-search references to be symbolic.
> That's material for a separate patch though.

So I couldn't resist poking at that, and after a couple hours' work
I have the attached patch, which removes all remaining hard-wired
OID references in the .dat files.

Using this, I renumbered all the OIDs in include/catalog, and behold
things pretty much worked.  I got through check-world after hacking
up these points:

* Unsurprisingly, there are lots of regression tests that have object
OIDs hard-wired in queries and/or expected output.

* initdb.c has a couple of places that know that template1 has OID 1.

* information_schema.sql has several SQL-language functions that
hard-wire the OIDs of assorted built-in types.

I'm not particularly fussed about the first two points, but the
last is a bit worrisome.  It's not too hard to imagine somebody
adding knowledge of their new type to those functions, and the
code getting broken by a renumbering pass, and us not noticing
if the point isn't stressed by a regression test (which mostly
those functions aren't).

We could imagine fixing those functions along the lines of

   CASE WHEN $2 = -1 /* default typmod */
THEN null
-   WHEN $1 IN (1042, 1043) /* char, varchar */
+   WHEN $1 IN ('pg_catalog.bpchar'::pg_catalog.regtype,
+   'pg_catalog.varchar'::pg_catalog.regtype)
THEN $2 - 4

which would add some parsing overhead, but I'm not sure if anyone
would notice that.

regards, tom lane

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 10c2b24..ce159ae 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -174,6 +174,20 @@ foreach my $row (@{ $catalog_data{pg_am} })
 	$amoids{ $row->{amname} } = $row->{oid};
 }
 
+# class (relation) OID lookup (note this only covers bootstrap catalogs!)
+my %classoids;
+foreach my $row (@{ $catalog_data{pg_class} })
+{
+	$classoids{ $row->{relname} } = $row->{oid};
+}
+
+# collation OID lookup
+my %collationoids;
+foreach my $row (@{ $catalog_data{pg_collation} })
+{
+	$collationoids{ $row->{collname} } = $row->{oid};
+}
+
 # language OID lookup
 my %langoids;
 foreach my $row (@{ $catalog_data{pg_language} })
@@ -243,6 +257,41 @@ foreach my $row (@{ $catalog_data{pg_proc} })
 	}
 }
 
+# tablespace OID lookup
+my %tablespaceoids;
+foreach my $row (@{ $catalog_data{pg_tablespace} })
+{
+	$tablespaceoids{ $row->{spcname} } = $row->{oid};
+}
+
+# text search configuration OID lookup
+my %tsconfigoids;
+foreach my $row (@{ $catalog_data{pg_ts_config} })
+{
+	$tsconfigoids{ $row->{cfgname} } = $row->{oid};
+}
+
+# text search dictionary OID lookup
+my %tsdictoids;
+foreach my $row (@{ $catalog_data{pg_ts_dict} })
+{
+	$tsdictoids{ $row->{dictname} } = $row->{oid};
+}
+
+# text search parser OID lookup
+my %tsparseroids;
+foreach my $row (@{ $catalog_data{pg_ts_parser} })
+{
+	$tsparseroids{ $row->{prsname} } = $row->{oid};
+}
+
+# text search template OID lookup
+my %tstemplateoids;
+foreach my $row (@{ $catalog_data{pg_ts_template} })
+{
+	$tstemplateoids{ $row->{tmplname} } = $row->{oid};
+}
+
 # type lookups
 my %typeoids;
 my %types;
@@ -287,14 +336,21 @@ close $ef;
 
 # Map lookup name to the corresponding hash table.
 my %lookup_kind = (
-	pg_am   => \%amoids,
-	pg_language => \%langoids,
-	pg_opclass  => \%opcoids,
-	pg_operator => \%operoids,
-	pg_opfamily => \%opfoids,
-	pg_proc => \%procoids,
-	pg_type => \%typeoids,
-	encoding=> \%encids);
+	pg_am  => \%amoids,
+	pg_class   => \%classoids,
+	pg_collation   => \%collationoids,
+	pg_language=> \%langoids,
+	pg_opclass => \%opcoids,
+	pg_operator=> \%operoids,
+	pg_opfamily=> \%opfoids,
+	pg_proc=> \%procoids,
+	pg_tablespace  => \%tablespaceoids,
+	pg_ts_config   => \%tsconfigoids,
+	pg_ts_dict => \%tsdictoids,
+	pg_ts_parser   => \%tsparseroids,
+	pg_ts_template => \%tstemplateoids,
+	pg_type=> \%typeoids,
+	encoding   => \%encids);
 
 
 # Open temp files
@@ -730,8 +786,8 @@ sub morph_row_for_pgattr
 	$row->{attndims} = $type->{typcategory} eq 'A' ? '1' : '0';
 
 	# collation-aware catalog columns must use C collation
-	$row->{attcollation} = $type->{typcollation} != 0 ?
-	$C_COLLATION_OID : 0;
+	$row->{attcollation} =
+	  $type->{typcollation} ne '0' ? $C_COLLATION_OID : 0;
 
 	if (defined $attr->{forcenotnull})
 	{
diff --git a/src/include/catalog/pg_class.dat b/src/include/catalog/pg_class.dat
index 5a1f3aa..511d876 100644
--- 

Re: [Suspect SPAM] Re: Should we increase the default vacuum_cost_limit?

2019-03-11 Thread Kyotaro HORIGUCHI
Sorry, I sent a wrong patch. The attached is the right one.

At Mon, 11 Mar 2019 13:57:21 +0100, Julien Rouhaud  wrote 
in 
> On Mon, Mar 11, 2019 at 10:03 AM David Rowley
>  wrote:
> >
> > On Mon, 11 Mar 2019 at 09:58, Tom Lane  wrote:
> > > The second patch is a delta that rounds off to the next smaller unit
> > > if there is one, producing a less noisy result:
> > >
> > > regression=# set work_mem = '30.1GB';
> > > SET
> > > regression=# show work_mem;
> > >  work_mem
> > > --
> > >  30822MB
> > > (1 row)
> > >
> > > I'm not sure if that's a good idea or just overthinking the problem.
> > > Thoughts?
> >
> > I don't think you're over thinking it.  I often have to look at such
> > settings and I'm probably not unique in when I glance at 30822MB I can
> > see that's roughly 30GB, whereas when I look at 31562138kB, I'm either
> > counting digits or reaching for a calculator.  This is going to reduce
> > the time it takes for a human to process the pg_settings output, so I
> > think it's a good idea.
> 
> Definitely, rounding up will spare people from wasting time to check
> what's the actual value.

+1. I don't think it overthinking, too.

Anyone who specifies memory size in GB won't care under-MB
fraction. I don't think '0.01GB' is a sane setting but it being
10MB doesn't matter.  However, I don't think that '0.1d' becoming
'2h' is reasonable. "10 times per day" is "rounded" to "12 times
per day" by that.

Is it worth showing values with at most two or three fraction
digits instead of rounding the value on setting? In the attached
PoC patch - instead of the 'roundoff-fractions-harder' patch -
shows values in the shortest exact representation.

work_mem:
   31562138  => '30.1 GB'
   31562137  => '31562137 kB'
   '0.1GB'   => '0.1 GB'
   '0.01GB'  => '0.01 GB'
   '0.001GB' => '1049 kB'

lock_timeout:
   '0.1h'=> '6 min'
   '90 min'  => '90 min'
   '120 min' => '2 h'
   '0.1 d'   => '0.1 d'

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d374f5372c..21e0807728 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6005,16 +6005,19 @@ convert_to_base_unit(double value, const char *unit,
 /*
  * Convert an integer value in some base unit to a human-friendly unit.
  *
- * The output unit is chosen so that it's the greatest unit that can represent
- * the value without loss.  For example, if the base unit is GUC_UNIT_KB, 1024
- * is converted to 1 MB, but 1025 is represented as 1025 kB.
+ * The output unit is chosen so that it's the shortest representation that can
+ * represent the value without loss.  For example, if the base unit is
+ * GUC_UNIT_KB, 1024 is converted to 1 MB, but 1025 is represented as 1025 kB.
+ * Also 104858 is converted to '0.1 GB', which is shorter than other
+ * representations.
  */
 static void
 convert_int_from_base_unit(int64 base_value, int base_unit,
-		   int64 *value, const char **unit)
+		   double *value, const char **unit, int *digits)
 {
 	const unit_conversion *table;
 	int			i;
+	int			len = 0;
 
 	*unit = NULL;
 
@@ -6027,17 +6030,49 @@ convert_int_from_base_unit(int64 base_value, int base_unit,
 	{
 		if (base_unit == table[i].base_unit)
 		{
+			const double frac_digits = 2;
+			double rval;
+
 			/*
-			 * Accept the first conversion that divides the value evenly.  We
-			 * assume that the conversions for each base unit are ordered from
-			 * greatest unit to the smallest!
+			 * Round off the representation at the digit where it is exactly
+			 * the same with base_value.
 			 */
-			if (table[i].multiplier <= 1.0 ||
-base_value % (int64) table[i].multiplier == 0)
+			rval = (double)base_value / table[i].multiplier;
+			rval = rint(rval * pow(10, frac_digits)) *
+pow(10, -frac_digits);
+
+			/* Try the unit if it is exact representation */
+			if ((int64)rint(rval * table[i].multiplier) == base_value)
 			{
-*value = (int64) rint(base_value / table[i].multiplier);
-*unit = table[i].unit;
-break;
+int nfrac = 0;
+int newlen = 1;
+
+/* Count fraction digits */
+for (nfrac = 0 ; nfrac < frac_digits ; nfrac++)
+{
+	double p = pow(10, nfrac);
+	if (rval * p - floor(rval * p) < 0.1)
+		break;
+}
+
+/*  Caclculate width of the representation */
+if (rval >= 1.0)
+	newlen += floor(log10(rval));
+newlen += nfrac;
+if (nfrac > 0)
+	newlen++; /* for decimal point */
+
+if (len == 0 || newlen < len)
+{
+	*digits = nfrac;
+	*value = rval;
+	*unit = table[i].unit;
+	len = newlen;
+}
+
+/* We found the integer representation, exit. */
+if (nfrac == 0)
+	break;
 			}
 		}
 	}
@@ -9359,18 +9394,19 @@ _ShowOption(struct config_generic *record, bool use_units)
 	 * Use int64 arithmetic to avoid overflows in units
 	 * conversion.
 	 */
-	int64		result = *conf->variable;
+	double		result = *conf->variable;
 		

Re: [Suspect SPAM] Re: Should we increase the default vacuum_cost_limit?

2019-03-11 Thread Kyotaro HORIGUCHI
At Mon, 11 Mar 2019 13:57:21 +0100, Julien Rouhaud  wrote 
in 
> On Mon, Mar 11, 2019 at 10:03 AM David Rowley
>  wrote:
> >
> > On Mon, 11 Mar 2019 at 09:58, Tom Lane  wrote:
> > > The second patch is a delta that rounds off to the next smaller unit
> > > if there is one, producing a less noisy result:
> > >
> > > regression=# set work_mem = '30.1GB';
> > > SET
> > > regression=# show work_mem;
> > >  work_mem
> > > --
> > >  30822MB
> > > (1 row)
> > >
> > > I'm not sure if that's a good idea or just overthinking the problem.
> > > Thoughts?
> >
> > I don't think you're over thinking it.  I often have to look at such
> > settings and I'm probably not unique in when I glance at 30822MB I can
> > see that's roughly 30GB, whereas when I look at 31562138kB, I'm either
> > counting digits or reaching for a calculator.  This is going to reduce
> > the time it takes for a human to process the pg_settings output, so I
> > think it's a good idea.
> 
> Definitely, rounding up will spare people from wasting time to check
> what's the actual value.

+1. I don't think it overthinking, too.

Anyone who specifies memory size in GB won't care under-MB
fraction. I don't think '0.01GB' is a sane setting but it being
10MB doesn't matter.  However, I don't think that '0.1d' becoming
'2h' is reasonable. "10 times per day" is "rounded" to "12 times
per day" by that.

Is it worth showing values with at most two or three fraction
digits instead of rounding the value on setting? In the attached
PoC patch - instead of the 'roundoff-fractions-harder' patch -
shows values in the shortest exact representation.

work_mem:
   31562138  => '30.1 GB'
   31562137  => '31562137 kB'
   '0.1GB'   => '0.1 GB'
   '0.01GB'  => '0.01 GB'
   '0.001GB' => '1049 kB'

lock_timeout:
   '0.1h'=> '6 min'
   '90 min'  => '90 min'
   '120 min' => '2 h'
   '0.1 d'   => '0.1 d'

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d374f5372c..3ca2d6dec4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6011,7 +6011,7 @@ convert_to_base_unit(double value, const char *unit,
  */
 static void
 convert_int_from_base_unit(int64 base_value, int base_unit,
-		   int64 *value, const char **unit)
+		   double *value, const char **unit, int *digits)
 {
 	const unit_conversion *table;
 	int			i;
@@ -6027,6 +6027,9 @@ convert_int_from_base_unit(int64 base_value, int base_unit,
 	{
 		if (base_unit == table[i].base_unit)
 		{
+			const double frac_digits = 2;
+			double rval;
+
 			/*
 			 * Accept the first conversion that divides the value evenly.  We
 			 * assume that the conversions for each base unit are ordered from
@@ -6035,7 +6038,42 @@ convert_int_from_base_unit(int64 base_value, int base_unit,
 			if (table[i].multiplier <= 1.0 ||
 base_value % (int64) table[i].multiplier == 0)
 			{
-*value = (int64) rint(base_value / table[i].multiplier);
+*digits = 0;
+*value = rint(base_value / table[i].multiplier);
+*unit = table[i].unit;
+break;
+			}
+
+			/*
+			 * If base_value is exactly represented by a number with at most
+			 * two fraction digits, we prefer it than lower units.
+			 */
+			rval = (double)base_value / table[i].multiplier;
+			rval = rint(rval * pow(10, frac_digits)) *
+pow(10, -frac_digits);
+
+			/*
+			 * Acceptable range for rval is quite arbitrary.
+			 */
+			if ((rval >= 1.0 ||
+ (table[i + 1].unit &&
+  table[i].multiplier / table[i + 1].multiplier < 1000) &&
+(int64)rint(rval * table[i].multiplier) == base_value)
+			{
+int frac;
+
+/* count required fraction digits */
+for (frac = 1 ; frac < frac_digits ; frac++)
+{
+	if (rval * 10.0 - floor(rval * 10.0) < 0.1)
+	{
+		*digits = frac;
+		break;
+	}
+}
+if (frac == frac_digits)
+	*digits = frac_digits;
+*value = rval;
 *unit = table[i].unit;
 break;
 			}
@@ -9359,18 +9397,19 @@ _ShowOption(struct config_generic *record, bool use_units)
 	 * Use int64 arithmetic to avoid overflows in units
 	 * conversion.
 	 */
-	int64		result = *conf->variable;
+	double		result = *conf->variable;
 	const char *unit;
+	int			digits = 0;
 
 	if (use_units && result > 0 && (record->flags & GUC_UNIT))
-		convert_int_from_base_unit(result,
+		convert_int_from_base_unit(*conf->variable,
    record->flags & GUC_UNIT,
-   , );
+   , , );
 	else
 		unit = "";
 
-	snprintf(buffer, sizeof(buffer), INT64_FORMAT "%s",
-			 result, unit);
+	snprintf(buffer, sizeof(buffer), "%.*f %s",
+			 digits, result, unit);
 	val = buffer;
 }
 			}


Re: Offline enabling/disabling of data checksums

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 11:19:50AM +0100, Michael Banck wrote:
> One thing: you (Michael) should be co-author for patch #3 as I took some
> of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
> _checksums

OK, thanks for the notice.  I was not sure as we actually developped
the same fork.
--
Michael


signature.asc
Description: PGP signature


Re: Special role for subscriptions

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 06:32:10PM -0700, Jeff Davis wrote:
>  * Is the original idea of a special role still viable?

In my opinion, that part may be valuable.  The latest patches proposed
change the way tables are filtered and listed on the subscription
side, lowering the permission to spawn a new thread and to connect to a 
publication server by just being a database owner instead of being a
superuser, and that's quite a gap.
--
Michael


signature.asc
Description: PGP signature


Re: Unaccent extension python script Issue in Windows

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 09:54:45PM +0530, Ramanarayana wrote:
> I went through the python script and found that the stdout encoding is set
> to utf-8 only if python version is <=2. The same needs to be done for
> python 3

If you send a patch for that, how would it look like?  Could you also
register any patch produced to the future commit fest?  It is here:
https://commitfest.postgresql.org/23/
--
Michael


signature.asc
Description: PGP signature


Re: Fwd: Add tablespace tap test to pg_rewind

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 07:49:11PM +0800, Shaoqi Bai wrote:
> Thanks, will work on it as you suggested
> Add pg_basebackup --T olddir=newdir to support check the consistency of a
> tablespace created before promotion
> Add run_test('remote');

Thanks for considering my input.  Why don't you register your patch to
the next commit fest then so as it goes through a formal review once
you are able to provide a new version?  The commit fest is here:
https://commitfest.postgresql.org/23/

We are currently in the process of wrapping up the last commit fest
for v12, so this stuff will have to wait a bit :(

It could be an idea to split the patch in two pieces:
- One patch which refactors the code for the new option in
PostgresNode.pm
- Second patch for the new test with integration in RewindTest.pm.
This should touch different parts of the code, so combining both would
be fine as well for me :)
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Removed unused variable, openLogOff.

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 03:30:22PM -0400, Robert Haas wrote:
> Sorry that I didn't get to this before you did -- I was on PTO on
> Friday and did not work on the weekend.

My apologies, Robert.  It seems that I have been too much hasty
then.  There are so many things going around lately, it is hard to
keep track of everything...
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 02:11:11PM +, Sergei Kornilov wrote:
> I review latest patchset.

Thanks, I have committed the refactoring of src/common/ as a first
step.

> I have one big question: Is pg_checksums
> safe for cross-versions operations? Even with update_controlfile
> call? Currently i am able to enable checksums in pg11 cluster with
> pg_checksums compiled on HEAD. Is this expected? I didn't notice any
> version-specific check in code.

This depends on the version of the control file, and it happens that
we don't check for it, so that's a good catch from your side.  Not
doing the check is a bad idea as ControlFileData should be compatible
between the binary and the data read.  I am attaching a fresh 0001
which should be back-patched down to v11 as a bug fix.  An advantage
of that, which is similar to pg_rewind, is that if the control file
version does not change in a major version, then the tool can be
used.  And the data folder layer is unlikely going to change.. 

>> pg_checksums checks, enables or disables data checksums
> 
> maybe better is , not ?

Fixed, as part of the renaming patch.

>> +printf(_("%s enables, disables or verifies data checksums in a 
>> PostgreSQL\n"), progname);
>> +printf(_("database cluster.\n\n"));
> 
> I doubt this is good line formatting for translation purposes.
>
>> +printf(_("  -c, --checkcheck data checksums.  This is the 
>> default\n"));
>> +printf(_(" mode if nothing is specified.\n"));
> 
> same. For example pg_basebackup uses different multiline style:

Oh, good points.  I forgot about that point of view.

>> +command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
>> +  "fails when relfilnodes are requested and action is --disable");
> 
> action is "--enable" here ;-)

s/relfilnodes/relfilenodes/ while on it.

>> if (badblocks > 0)gi
>>  return 1;
> 
> Small question: why return 1 instead of exit(1)?

OK, let's fix that on the way as part of the renaming.

>> 
>>  
> 
> How about use "app-pgchecksums" similar to other applications?

Yes, I was wondering about that one when doing the renaming, but did
not bother much for consistency.  Anyway switched, you are right.

Attached is an updated patch set, minus the refactoring for
src/common/.
--
Michael
From ef3e4f9b0dbff7deaaf19cb303f71c7883193a72 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 12 Mar 2019 10:43:47 +0900
Subject: [PATCH v2 1/4] Ensure version compatibility of pg_verify_checksums

pg_verify_checksums performs a read of the control file, and the data it
fetches should be from a data folder compatible with the major version
of Postgres the binary has been compiled with.

Reported-by: Sergei Kornilov
Author: Michael Paquier
Discussion: https://postgr.es/m/155231347133.16480.11453587097036807558.p...@coridan.postgresql.org
Backpatch-through: 11, where the tool has been introduced.
---
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..4c7c055b31 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -316,6 +316,13 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
+	{
+		fprintf(stderr, _("%s: cluster is not compatible with this version of pg_verify_checksums\n"),
+progname);
+		exit(1);
+	}
+
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-- 
2.20.1

From deee88e2d95f69b360580fbdd5300d2c34cda58b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 12 Mar 2019 11:02:05 +0900
Subject: [PATCH v2 2/4] Rename pg_verify_checksums to pg_checksums

The current tool name is too restrictive and focuses only on verifying
checksums.  As more options to control checksums for an offline cluster
are planned to be added, switch to a more generic name.  Documentation
as well as all past references to the tool are updated.

Author: Michael Paquier
Reviewed-by: Michael Banck, Seigei Kornilov
Discussion: https://postgr.es/m/20181221201616.gd4...@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/allfiles.sgml|  2 +-
 ...erify_checksums.sgml => pg_checksums.sgml} | 24 +--
 doc/src/sgml/reference.sgml   |  2 +-
 src/backend/replication/basebackup.c  |  2 +-
 src/bin/Makefile  |  2 +-
 src/bin/initdb/t/001_initdb.pl|  8 +++
 src/bin/pg_checksums/.gitignore   |  3 +++
 .../Makefile  | 20 
 src/bin/pg_checksums/nls.mk   |  4 
 .../pg_checksums.c}   | 20 +---
 src/bin/pg_checksums/t/001_basic.pl   |  8 +++
 .../t/002_actions.pl

RE: pg_upgrade: Pass -j down to vacuumdb

2019-03-11 Thread Jamison, Kirk
Hi Jesper,

Sorry I almost completely forgot to get back to you on this.
Actually your patch works when I tested it before,
and I understand the intention.
. 
Although a point was raised by other developers by making
--jobs optional in the suggested line by using the env variable instead.

> > Kirk, can you provide a delta patch to match what you are thinking ?

I was thinking of something like the attached, 
but the user needs to edit the variable value though. 
I am not sure how to implement the suggestion to allow the 
script to absorb a value from the environment,
so that it doesn't need to be edited at all.

Regards,
Kirk Jamison


v8_0001-Use-vacuumdb-opt-env-variable-to-pass-vacuumdb-options.patch
Description: v8_0001-Use-vacuumdb-opt-env-variable-to-pass-vacuumdb-options.patch


analyze_new_cluster.sh
Description: analyze_new_cluster.sh


RE: Timeout parameters

2019-03-11 Thread Nagaura, Ryohei
Hello, Mikalai-san.

Thank you for your mail.

> From: mikalaike...@ibagroup.eu 
> I'm with Fabien that "client-side timeout" seems unsafe. Also I agree with 
> Fabien
> that quire can take much time to be processed by the PosgtreSQL server and it 
> is
> a normal behavior. There is possible that performance of the PostgreSQL server
> machine can be low temporary or continuously, especially during upgrading
> procedure.
I'm very poor at English, so I couldn't read your intension.
In conclusion, you think this feature should process something e.g., sending 
canceling request. Don't you?
If yes, is it hard for you to accept my thought as follows?
1. The "end-user" I mentioned is application/system user.
2. Such end-users don't want to wait responses from system or application for 
whatever reason.
3. End-users don't care where the failure of the system is when it occurs.
4. They just don't want to wait long.
5. If I made the server processing something when this parameter is triggered, 
they would wait for the time it takes to process it.
6. Therefore it is not preferable to make servers processing something in this 
feature.

> Such a situation looks to be covered by TCP_USER_TIMEOUT and keep_alive
> mechanisms. 
i.e., you mean that it can't be happened to occur OS hang with network still 
alive.
Don't you?

These comments are helpful for documenting. Thank you.
> I think it is important to add some more information into the description of 
> this
> parameter which informs end-users that this parameter has to be used very
> carefully because it can impact as on the client application as on the server.
>...
> May be it is better to warn in documentation or prohibit in the source
> code to set "client-side timeout" less than TCP_USER_TIMEOUT to avoid handling
> "possible" logical problems ahead to the network problems.
> Keep in mind that "client-side timeout" can abort a connection which uses 
> UNIX-domain sockets too.
I didn't take into account it. I'll investigate it. Thanks.

Best regards,
-
Ryohei Nagaura





Re: Special role for subscriptions

2019-03-11 Thread Jeff Davis
On Fri, 2018-11-09 at 15:24 +0300, Evgeniy Efimkin wrote:
> Hi!
> In order to support create subscription from non-superuser, we need
> to make it possible to choose tables on the subscriber side:

I'd like to know more about the reasoning here. This thread started out
by suggesting a new special role that can be used for subscriptions,
but now it's changing the way subscription happens.

 * What are the most important use cases here? Are we just trying to
avoid the unnecessary use of superuser, or is there a real use case for
subscribing to a subset of a publication?
 * What are all the reasons CREATE SUBSCRIPTION currently requires
superuser?
 * Is the original idea of a special role still viable?

Regards,
Jeff Davis





Re: Suggestions on message transfer among backends

2019-03-11 Thread Chapman Flack
On 03/11/19 19:53, Euler Taveira wrote:
> Em seg, 11 de mar de 2019 às 10:36, Andy Fan
>  escreveu:
>>
>>   I need some function which requires some message exchange among different 
>> back-ends (connections).
>> specially I need a shared hash map and a message queue.
>>
> It seems you are looking for LISTEN/NOTIFY. However, if it is part of

My own recollection from looking at LISTEN/NOTIFY is that, yes, it
offers a mechanism for message passing among sessions, but the message
/reception/ part is very closely bound to the frontend/backend protocol.

That is, a message sent in session B can be received in session A, but
it pretty much goes flying straight out the network connection to
/the connected client associated with session A/.

If you're actually working /in the backend/ of session A (say, in a
server-side PL), it seemed to be unexpectedly difficult to find a way
to hook those notifications. But I looked at it only briefly, and
some time ago.

Regards,
-Chap



Use nanosleep(2) in pg_usleep, if available?

2019-03-11 Thread Tom Lane
In the thread about vacuum_cost_delay vs vacuum_cost_limit,
I wondered whether nanosleep(2) would provide any better
timing resolution than select(2).  Some experimentation
suggests that it doesn't, but nonetheless I see a good reason
why we should consider making pg_usleep use nanosleep() if
possible: it's got much better-defined semantics for what
happens if a signal arrives.  The comments for pg_usleep
lay out the problems with relying on select():

 * CAUTION: the behavior when a signal arrives during the sleep is platform
 * dependent.  On most Unix-ish platforms, a signal does not terminate the
 * sleep; but on some, it will (the Windows implementation also allows signals
 * to terminate pg_usleep).  And there are platforms where not only does a
 * signal not terminate the sleep, but it actually resets the timeout counter
 * so that the sleep effectively starts over!  It is therefore rather hazardous
 * to use this for long sleeps; a continuing stream of signal events could
 * prevent the sleep from ever terminating.  Better practice for long sleeps
 * is to use WaitLatch() with a timeout.

While the WaitLatch alternative avoids the problem, I doubt
we're ever going to remove pg_usleep entirely, so it'd be
good if it had fewer sharp edges.  nanosleep() has the
same behavior as Windows, ie, the sleep is guaranteed to be
terminated by a signal.  So if we used nanosleep() where available
we'd have that behavior on just about every interesting platform.

nanosleep() does exist pretty far back: it's in SUSv2, though
that version of the standard allows it to fail with ENOSYS.
Not sure if we'd need to teach configure to check for that
possibility.

Thoughts?

regards, tom lane



Re: Suggestions on message transfer among backends

2019-03-11 Thread Euler Taveira
Em seg, 11 de mar de 2019 às 10:36, Andy Fan
 escreveu:
>
>   I need some function which requires some message exchange among different 
> back-ends (connections).
> specially I need a shared hash map and a message queue.
>
It seems you are looking for LISTEN/NOTIFY. However, if it is part of
a complex solution, a background worker with shared memory access is
the way to go.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread Andrew Dunstan


On 3/11/19 6:07 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I'm going to mark this as rejected. Here's a possible doc patch
> Maybe s/strictly/ordinary/, or some other word?  "strictly"
> doesn't convey much to me.  Otherwise seems fine.


OK, done.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread David G. Johnston
On Mon, Mar 11, 2019 at 3:07 PM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > I'm going to mark this as rejected. Here's a possible doc patch
>
> Maybe s/strictly/ordinary/, or some other word?  "strictly"
> doesn't convey much to me.  Otherwise seems fine.
>

How about:

While the COALESCE, GREATEST, and LEAST functions below accept a
variable number of arguments they do not support the passing of a
VARIADIC array.

?

David J.



Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread Tom Lane
Andrew Dunstan  writes:
> I'm going to mark this as rejected. Here's a possible doc patch

Maybe s/strictly/ordinary/, or some other word?  "strictly"
doesn't convey much to me.  Otherwise seems fine.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-03-11 Thread Tom Lane
John Naylor  writes:
> Now it looks like:
> perl  renumber_oids.pl  --first-mapped-oid 8000 --last-mapped-oid 8999
>  --first-target-oid 2000  *.dat
> To prevent a maintenance headache, I didn't copy any of the formatting
> logic over. You'll also have to run reformat_dat_files.pl afterwards
> to restore that. It seems to work, but I haven't tested thoroughly.

I didn't like the use of Data::Dumper, because it made it quite impossible
to check what the script had done by eyeball.  After some thought
I concluded that we could probably just apply the changes via
search-and-replace, which is pretty ugly and low-tech but it leads to
easily diffable results, whether or not the initial state is exactly
what reformat_dat_files would produce.

I also changed things so that the OID mapping is computed before we start
changing any files, because as it stood the objects would get renumbered
in a pretty random order; and I renamed one of the switches so they all
have unique one-letter abbreviations.

Experimenting with this, I realized that it couldn't renumber OIDs that
are defined in .h files rather than .dat files, which is a serious
deficiency, but given the search-and-replace implementation it's not too
hard to fix up the .h files as well.  So I did that, and removed the
expectation that the target files would be listed on the command line;
that seems more likely to be a foot-gun than to do anything useful.

I've successfully done check-world after renumbering every OID above
4000 to somewhere else.  I also tried renumbering everything below
4000, which unsurprisingly blew up because there are various catalog
columns we haven't fixed to use symbolic OIDs.  (The one that initdb
immediately trips over is pg_database.dattablespace.)  I'm not sure
if it's worth the trouble to make that totally clean, but I suspect
we ought to at least mop up text-search references to be symbolic.
That's material for a separate patch though.

This seems committable from my end --- any further comments?

regards, tom lane

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 3c5830b..b6038b9 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -388,8 +388,25 @@
 to see which ones do not appear.  You can also use
 the duplicate_oids script to check for mistakes.
 (genbki.pl will assign OIDs for any rows that
-didn't get one hand-assigned to them and also detect duplicate OIDs
-at compile time.)
+didn't get one hand-assigned to them, and it will also detect duplicate
+OIDs at compile time.)
+   
+
+   
+When choosing OIDs for a patch that is not expected to be committed
+immediately, best practice is to use a group of more-or-less
+consecutive OIDs starting with some random choice in the range
+8000.  This minimizes the risk of OID collisions with other
+patches being developed concurrently.  To keep the 8000
+range free for development purposes, after a patch has been committed
+to the master git repository its OIDs should be renumbered into
+available space below that range.  Typically, this will be done
+near the end of each development cycle, moving all OIDs consumed by
+patches committed in that cycle at the same time.  The script
+renumber_oids.pl can be used for this purpose.
+If an uncommitted patch is found to have OID conflicts with some
+recently-committed patch, renumber_oids.pl may
+also be useful for recovering from that situation.

 

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 3bf308f..368b1de 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -87,6 +87,8 @@ sub ParseHeader
 		}
 
 		# Push the data into the appropriate data structure.
+		# Caution: when adding new recognized OID-defining macros,
+		# also update src/include/catalog/renumber_oids.pl.
 		if (/^DECLARE_TOAST\(\s*(\w+),\s*(\d+),\s*(\d+)\)/)
 		{
 			push @{ $catalog{toasting} },
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 833fad1..f253613 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -4,6 +4,9 @@
  *	  This file provides some definitions to support indexing
  *	  on system catalogs
  *
+ * Caution: all #define's with numeric values in this file had better be
+ * object OIDs, else renumber_oids.pl might change them inappropriately.
+ *
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
diff --git a/src/include/catalog/renumber_oids.pl b/src/include/catalog/renumber_oids.pl
new file mode 100755
index 000..8a07340
--- /dev/null
+++ b/src/include/catalog/renumber_oids.pl
@@ -0,0 +1,291 @@
+#!/usr/bin/perl
+#--
+#
+# renumber_oids.pl
+#Perl script that shifts a range of OIDs in the Postgres 

Re: Pluggable Storage - Andres's take

2019-03-11 Thread Andres Freund
On 2019-03-11 13:31:26 -0700, Andres Freund wrote:
> On 2019-03-11 12:37:46 -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-03-08 19:13:10 -0800, Andres Freund wrote:
> > > Changes:
> > > - I've added comments to all the callbacks in the first commit / the
> > >   scan commit
> > > - I've renamed table_gimmegimmeslot to table_slot_create
> > > - I've made the callbacks and their wrappers more consistently named
> > > - I've added asserts for necessary callbacks in scan commit
> > > - Lots of smaller cleanup
> > > - Added a commit message
> > > 
> > > While 0001 is pretty bulky, the interesting bits concentrate on a
> > > comparatively small area. I'd appreciate if somebody could give the
> > > comments added in tableam.h a read (both on callbacks, and their
> > > wrappers, as they have different audiences). It'd make sense to first
> > > read the commit message, to understand the goal (and I'd obviously also
> > > appreciate suggestions for improvements there as well).
> > > 
> > > I'm pretty happy with the current state of the scan patch. I plan to do
> > > two more passes through it (formatting, comment polishing, etc. I don't
> > > know of any functional changes needed), and then commit it, lest
> > > somebody objects.
> > 
> > Here's a further polished version. Pretty boring changes:
> > - newlines
> > - put tableam.h into the correct place
> > - a few comment improvements, including typos
> > - changed reorderqueue_push() to accept the slot. That avoids an
> >   unnecessary heap_copytuple() in some cases
> > 
> > No meaningful changes in later patches.
> 
> I pushed this.  There's a failure on 32bit machines, unfortunately. The
> problem comes from the ParallelTableScanDescData embedded in BTShared -
> after the change the compiler can't see that that actually needs more
> alignment, because ParallelTableScanDescData doesn't have any 8byte
> members. That's a problem for just about all such "struct inheritance"
> type tricks postgres, but we normally just allocate them separately,
> guaranteeing maxalign. Given that we here already allocate enough space
> after the BTShared struct, it's probably easiest to just not embed the
> struct anymore.

I've pushed an attempt to fix this, which locally fixes 32bit
builds. It's copying the alignment logic for shm_toc_allocate, namely
using BUFFERALIGN for alignment.  We should probably invent a more
appropriate define for this at some point...

Greetings,

Andres Freund



Re: Compressed TOAST Slicing

2019-03-11 Thread Regina Obe
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

No need for documentation as this is a performance improvement patch

I tested on windows mingw64 (as of a week ago) and confirmed the patch applies 
cleanly and significantly faster for left, substr tests than head.

Re: insensitive collations

2019-03-11 Thread Peter Eisentraut
On 2019-03-08 11:09, Peter Eisentraut wrote:
> On 2019-03-07 20:04, Daniel Verite wrote:
>> With previous versions, we'd need to call ucol_setAttribute(),
>> with the attributes and values defined here:
>> http://icu-project.org/apiref/icu4c/ucol_8h.html
>> for instance to get colStrength=secondary:
>>   ucol_setAttribute(coll, UCOL_STRENGTH , UCOL_SECONDARY, );
>> which I've just checked gives the expected result with ICU-4.2.
> 
> I see.  I'm thinking about adding some ad hoc code to
> pg_newlocale_from_collation() to parse these keywords ourselves, so we
> can provide the same interface for old ICU versions.  I'll send a
> separate patch for that.

Patches here.  This will allow all the existing collation customization
options as well as the ones being proposed in this thread to work in
older ICU versions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From dc21791adee43a854247ff007b3757ab480e509d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Mar 2019 16:10:59 +0100
Subject: [PATCH 1/2] Add tests for ICU collation customization

---
 .../regress/expected/collate.icu.utf8.out | 39 +++
 src/test/regress/sql/collate.icu.utf8.sql | 21 ++
 2 files changed, 60 insertions(+)

diff --git a/src/test/regress/expected/collate.icu.utf8.out 
b/src/test/regress/expected/collate.icu.utf8.out
index f95d165288..4b94921cf8 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1100,6 +1100,45 @@ select textrange_en_us('A','Z') @> 'b'::text;
 
 drop type textrange_c;
 drop type textrange_en_us;
+-- test ICU collation customization
+CREATE COLLATION testcoll_ignore_accents (provider = icu, locale = 
'@colStrength=primary;colCaseLevel=yes');
+SELECT 'aaá' > 'AAA' COLLATE "und-x-icu", 'aaá' < 'AAA' COLLATE 
testcoll_ignore_accents;
+ ?column? | ?column? 
+--+--
+ t| t
+(1 row)
+
+CREATE COLLATION testcoll_backwards (provider = icu, locale = 
'@colBackwards=yes');
+SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE 
testcoll_backwards;
+ ?column? | ?column? 
+--+--
+ t| t
+(1 row)
+
+CREATE COLLATION testcoll_lower_first (provider = icu, locale = 
'@colCaseFirst=lower');
+CREATE COLLATION testcoll_upper_first (provider = icu, locale = 
'@colCaseFirst=upper');
+SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE 
testcoll_upper_first;
+ ?column? | ?column? 
+--+--
+ t| t
+(1 row)
+
+CREATE COLLATION testcoll_shifted (provider = icu, locale = 
'@colAlternate=shifted');
+SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE 
testcoll_shifted;
+ ?column? | ?column? 
+--+--
+ t| t
+(1 row)
+
+CREATE COLLATION testcoll_numeric (provider = icu, locale = '@colNumeric=yes');
+SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE 
testcoll_numeric;
+ ?column? | ?column? 
+--+--
+ t| t
+(1 row)
+
+CREATE COLLATION testcoll_error1 (provider = icu, locale = 
'@colNumeric=lower');
+ERROR:  could not open collator for locale "@colNumeric=lower": 
U_ILLEGAL_ARGUMENT_ERROR
 -- cleanup
 SET client_min_messages TO warning;
 DROP SCHEMA collate_tests CASCADE;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql 
b/src/test/regress/sql/collate.icu.utf8.sql
index 0aeba3e202..73fb1232a7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -425,6 +425,27 @@ CREATE INDEX collate_dep_test4i ON collate_dep_test4t (b 
COLLATE test0);
 drop type textrange_en_us;
 
 
+-- test ICU collation customization
+
+CREATE COLLATION testcoll_ignore_accents (provider = icu, locale = 
'@colStrength=primary;colCaseLevel=yes');
+SELECT 'aaá' > 'AAA' COLLATE "und-x-icu", 'aaá' < 'AAA' COLLATE 
testcoll_ignore_accents;
+
+CREATE COLLATION testcoll_backwards (provider = icu, locale = 
'@colBackwards=yes');
+SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE 
testcoll_backwards;
+
+CREATE COLLATION testcoll_lower_first (provider = icu, locale = 
'@colCaseFirst=lower');
+CREATE COLLATION testcoll_upper_first (provider = icu, locale = 
'@colCaseFirst=upper');
+SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE 
testcoll_upper_first;
+
+CREATE COLLATION testcoll_shifted (provider = icu, locale = 
'@colAlternate=shifted');
+SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE 
testcoll_shifted;
+
+CREATE COLLATION testcoll_numeric (provider = icu, locale = '@colNumeric=yes');
+SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE 
testcoll_numeric;
+
+CREATE COLLATION testcoll_error1 (provider = icu, locale = 
'@colNumeric=lower');
+
+
 -- cleanup
 SET client_min_messages TO warning;
 DROP SCHEMA collate_tests CASCADE;
-- 
2.21.0

>From 

Re: Patch to document base64 encoding

2019-03-11 Thread Karl O. Pinc
Hi Fabien,

On Sun, 10 Mar 2019 08:15:35 +0100 (CET)
Fabien COELHO  I registered as a reviewer in the CF app.

Thanks.

What's causing problems here is that the encode and decode
functions are listed in both the string functions section
and the binary functions section.  A related but not-relevant
problem is that there are functions listed in the string
function section which take binary input.

I asked about this on IRC and the brief reply was
unflattering to the existing documentation.

So I'm going to fix this also.  3 patches attached:

doc_base64_part1_v9.patch

  This moves functions taking bytea and other non-string
  input into the binary string section, and vice versa.
  Eliminates duplicate encode() and decode() documentation.

  Affects: convert(bytea, name, name)
   convert_from(bytea, name)
   encode(bytea, text)
   length(bytea, name)
   quote_nullable(anytype)
   to_hex(int or bigint)
   decode(text, text)

  Only moves, eliminates duplicates, and adjusts indentation.


doc_base64_part2_v9.patch

  Cleanup wording after moving functions between sections.


doc_base64_part3_v9.patch

  Documents base64, hex, and escape encode() and decode()
  formats.

> >> "The string and the binary encode and decode functions..." sentence
> >> looks strange to me, especially with the English article that I do
> >> not really master, so maybe it is ok. I'd have written something
> >> more straightforward, eg: "Functions encode and decode support the
> >> following encodings:",  
> >
> > It is an atypical construction because I want to draw attention
> > that this is documentation not only for the encode() and decode()
> > in section 9.4. String Functions and Operators but also for the
> > encode() and decode in section 9.5. Binary String Functions and
> > Operators. Although I can't think of a better approach it makes me
> > uncomfortable that documentation written in one section applies
> > equally to functions in a different section.  
> 
> People coming from the binary doc would have no reason to look at the 
> string paragraph anyway.
> 
> > Do you think it would be useful to hyperlink the word "binary"
> > to section 9.5?  
> 
> Hmmm... I think that the link is needed in the other direction.

I'm not sure what you mean here or if it's still relevant.

> I'd suggest (1) to use a simpler and direct sentence in the string 
> section, (2) to simplify/shorten the in cell description in the
> binary section, and (3) to add an hyperlink from the binary section
> which would point to the expanded explanation in the string section.
> 
> > The idiomatic phrasing would be "Both the string and the binary
> > encode and decode functions..." but the word "both" adds
> > no information.  Shorter is better.  
> 
> Possibly, although "Both" would insist on the fact that it applies to
> the two variants, which was your intention.

I think this is no longer relevant.  Although I'm not sure what
you mean by 3.  The format names already hyperlink back to the
string docs.

> >> and also I'd use a direct "Function
> >> <...>decode ..." rather than "The decode
> >> function ..." (twice).  
> >
> > The straightforward English would be "Decode accepts...".  The
> > problem is that this begins the sentence with the name of a
> > function. This does not work very well when the function name is
> > all lower case, and can have other problems where clarity is lost
> > depending on documentation output formatting.  
> 
> Yep.
> 
> > I don't see a better approach.  
> 
> I suggested "Function <>decode ...", which is the kind of thing we
> do in academic writing to improve precision, because I thought it
> could be better:-)

"Function <>decode ..." just does not work in English.

> >> Maybe I'd use the exact same grammatical structure for all 3 cases,
> >> starting with "The <>whatever encoding converts bla bla bla"
> >> instead of varying the sentences.  
> >
> > Agreed.  Good idea.  The first paragraph of each term has to
> > do with encoding and the second with decoding.  
> 
> 
> > Uniformity in starting the second paragraphs helps make
> > this clear, even though the first paragraphs are not uniform.
> > With this I am not concerned that the first paragraphs
> > do not have a common phrasing that's very explicit about
> > being about encoding.
> >
> > Adjusted.  
> 
> Cannot see it fully in the v8 patch:
> 
>   - The base64 encoding is
>   - hex represents
>   - escape converts

I did only the decode paras.  I guess no reason not to make
the first paras uniform as well.   Done.

I also alphabetized by format name.

I hope that 3 patches will make review easier.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a78ea..3d748b660f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1692,48 +1692,6 @@
   

 
- 

Re: Pluggable Storage - Andres's take

2019-03-11 Thread Andres Freund
On 2019-03-11 12:37:46 -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-03-08 19:13:10 -0800, Andres Freund wrote:
> > Changes:
> > - I've added comments to all the callbacks in the first commit / the
> >   scan commit
> > - I've renamed table_gimmegimmeslot to table_slot_create
> > - I've made the callbacks and their wrappers more consistently named
> > - I've added asserts for necessary callbacks in scan commit
> > - Lots of smaller cleanup
> > - Added a commit message
> > 
> > While 0001 is pretty bulky, the interesting bits concentrate on a
> > comparatively small area. I'd appreciate if somebody could give the
> > comments added in tableam.h a read (both on callbacks, and their
> > wrappers, as they have different audiences). It'd make sense to first
> > read the commit message, to understand the goal (and I'd obviously also
> > appreciate suggestions for improvements there as well).
> > 
> > I'm pretty happy with the current state of the scan patch. I plan to do
> > two more passes through it (formatting, comment polishing, etc. I don't
> > know of any functional changes needed), and then commit it, lest
> > somebody objects.
> 
> Here's a further polished version. Pretty boring changes:
> - newlines
> - put tableam.h into the correct place
> - a few comment improvements, including typos
> - changed reorderqueue_push() to accept the slot. That avoids an
>   unnecessary heap_copytuple() in some cases
> 
> No meaningful changes in later patches.

I pushed this.  There's a failure on 32bit machines, unfortunately. The
problem comes from the ParallelTableScanDescData embedded in BTShared -
after the change the compiler can't see that that actually needs more
alignment, because ParallelTableScanDescData doesn't have any 8byte
members. That's a problem for just about all such "struct inheritance"
type tricks postgres, but we normally just allocate them separately,
guaranteeing maxalign. Given that we here already allocate enough space
after the BTShared struct, it's probably easiest to just not embed the
struct anymore.

- Andres



Re: Compressed TOAST Slicing

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Darafei Praliaskouski wrote:

> The feature is super valuable for complex PostGIS-enabled databases.

After having to debug a perf problem in this area, I agree, +1 for the patch.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:43 PM Alvaro Herrera  
> wrote:
> > > Huh.  Well, that's another option, but then what do we do if the
> > > number of phases is not a constant?
> >
> > Well, why do we care?  "Some phases might be skipped".
> 
> It seems pretty confusing.  I mean, in the case of the CLUSTER patch,
> you're either going to seq-scan the table or index-scan the table.
> Those are (at last check) reported using different phase numbers, but
> they are mutually exclusive.  Generally, if you are going to do either
> foo -> bar -> baz -> quux or foo -> bletch -> quux, how many phases
> are there total?  5?  4?

Hmm.  Your argument is not entirely devoid of merit, but I'm not 100%
convinced either.

So, in CLUSTER, the phases in the middle section are exclusive of one
another.  You do bar and baz, or you do bletch.  But you never do bar
and bletch, or bletch and baz, or bar on isolation or baz on isolation.
Furthermore, the counting of phases depends on internal system state
(optimizer output), not on the user's input.

In CREATE INDEX, it's not exactly the same.  You either have a
complicated 8-phase system (CREATE INDEX CONCURRENTLY) or just a
two-phase system.  The phases for the second case are a strict subset of
the cases in the first case.  Whether to use one or the other phase
sequence is entirely up to the user.

On the other hand, the subphase numbers vary per AM (but I expect
they're always the same for any one AM.)

To me, it's not a big deal, but if we don't put the number in the phase
name, then we force users to keep the reference documentation open every
time they create an index.

I'm not wed to anything in this area, TBH.  My first patch had no phase
numbers and I added them because of reviewer feedback.  I do agree we
should be consistent ... but on the other hand, each case is a bit
different: consider VACUUM, which goes back to phase 2 after doing phase
3 for a while.  You don't have that behavior for either CLUSTER or
CREATE INDEX.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Compressed TOAST Slicing

2019-03-11 Thread Darafei Praliaskouski
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I have read the patch and have no problems with it.

The feature is super valuable for complex PostGIS-enabled databases.

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas
On Mon, Mar 11, 2019 at 3:43 PM Alvaro Herrera  wrote:
> > Huh.  Well, that's another option, but then what do we do if the
> > number of phases is not a constant?
>
> Well, why do we care?  "Some phases might be skipped".

It seems pretty confusing.  I mean, in the case of the CLUSTER patch,
you're either going to seq-scan the table or index-scan the table.
Those are (at last check) reported using different phase numbers, but
they are mutually exclusive.  Generally, if you are going to do either
foo -> bar -> baz -> quux or foo -> bletch -> quux, how many phases
are there total?  5?  4?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread Pavel Stehule
po 11. 3. 2019 v 18:04 odesílatel David Steele  napsal:

> On 3/11/19 6:43 PM, Andrew Dunstan wrote:
> >
> > On 3/6/19 10:24 AM, Chapman Flack wrote:
> >> On 3/6/19 10:12 AM, Andrew Dunstan wrote:
> >>
> >>> Having reviewed the thread, I'm with Andres and Tom. Maybe though we
> >>> should have a note somewhere to the effect that you can't use VARIADIC
> >>> with these.
> >> Perhaps such a note belongs hoisted into the functions-conditional
> >> section of the manual, making a general observation that these things
> >> are conditional *expressions* that may resemble functions, but in
> >> particular, COALESCE, GREATEST, and LEAST cannot be called with
> >> keyword VARIADIC and an array argument, as they could if they were
> >> ordinary functions.
> >>
> >
> >
> > I'm going to mark this as rejected. Here's a possible doc patch
>
> +1
>
>
ok

Pavel

-- 
> -David
> da...@pgmasters.net
>


Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:26 PM Alvaro Herrera  
> wrote:
> > Oh.  That's easily removed.  Though I have to say that other people said
> > that they liked it so much that they would have liked to have it in the
> > original VACUUM one too 
> > (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).
> 
> Huh.  Well, that's another option, but then what do we do if the
> number of phases is not a constant?

Well, why do we care?  "Some phases might be skipped".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Removed unused variable, openLogOff.

2019-03-11 Thread Robert Haas
On Thu, Mar 7, 2019 at 8:27 PM Michael Paquier  wrote:
> After sleeping on it, let's live with just switching to nleft in the
> message, without openLogOff as that's the second time folks complain
> about the previous code.  So I just propose the attached.  Robert,
> others, any objections?  Perhaps you would prefer fixing it yourself?

Sorry that I didn't get to this before you did -- I was on PTO on
Friday and did not work on the weekend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas
On Mon, Mar 11, 2019 at 3:26 PM Alvaro Herrera  wrote:
> Oh.  That's easily removed.  Though I have to say that other people said
> that they liked it so much that they would have liked to have it in the
> original VACUUM one too (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).

Huh.  Well, that's another option, but then what do we do if the
number of phases is not a constant?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Robert Haas wrote:

> On Mon, Mar 11, 2019 at 3:18 PM Alvaro Herrera  
> wrote:
> > On 2019-Mar-11, Robert Haas wrote:
> > > I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> > > inconsistent with what we've got already and it doesn't add much.
> > > Someone who wants to know which phase it is can look at the underlying
> > > numbers directly instead of going through the view, but most people
> > > probably won't care, and given that the phases may be of dramatically
> > > unequal length, I don't think it's adding much.
> > >
> > > Another reason why I think this is a bad idea is that there may be
> > > some operations where we don't transit all the phases in all cases;
> > > the pending patch for CLUSTER progress reporting works like that.
> >
> > What part of it don't you like?  Is it the fact that we have phase
> > numbers in the phase name?  Is it the fact that we count total phases?
> > Is it that we have two numbers being current (phase + subphase)?
> 
> that you have phase numbers in the phase name

Oh.  That's easily removed.  Though I have to say that other people said
that they liked it so much that they would have liked to have it in the
original VACUUM one too (5ba2b281-9c84-772a-cf37-17780d782...@lab.ntt.co.jp).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas
On Mon, Mar 11, 2019 at 3:18 PM Alvaro Herrera  wrote:
> On 2019-Mar-11, Robert Haas wrote:
> > I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> > inconsistent with what we've got already and it doesn't add much.
> > Someone who wants to know which phase it is can look at the underlying
> > numbers directly instead of going through the view, but most people
> > probably won't care, and given that the phases may be of dramatically
> > unequal length, I don't think it's adding much.
> >
> > Another reason why I think this is a bad idea is that there may be
> > some operations where we don't transit all the phases in all cases;
> > the pending patch for CLUSTER progress reporting works like that.
>
> What part of it don't you like?  Is it the fact that we have phase
> numbers in the phase name?  Is it the fact that we count total phases?
> Is it that we have two numbers being current (phase + subphase)?

that you have phase numbers in the phase name

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Robert Haas wrote:

> I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
> inconsistent with what we've got already and it doesn't add much.
> Someone who wants to know which phase it is can look at the underlying
> numbers directly instead of going through the view, but most people
> probably won't care, and given that the phases may be of dramatically
> unequal length, I don't think it's adding much.
> 
> Another reason why I think this is a bad idea is that there may be
> some operations where we don't transit all the phases in all cases;
> the pending patch for CLUSTER progress reporting works like that.

What part of it don't you like?  Is it the fact that we have phase
numbers in the phase name?  Is it the fact that we count total phases?
Is it that we have two numbers being current (phase + subphase)?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Robert Haas
On Mon, Mar 11, 2019 at 8:41 AM Alvaro Herrera  wrote:
> > I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
> > the blocks_done count is increasing. Shouldn't it have
> > changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
> > index(3 of 8): table scan(2/5) ?
> > Although I think it has been rectified in the latest patch as I now get
> > 'table scan' phase in output as I do CREATE INDEX on table with 100
> > records
>
> Yeah, this was a bug that I fixed in v5.  (It was a misunderstanding
> about how parallel scanning is set up, IIRC).  For v5, I tested both
> parallel and non-parallel builds, with and without sync seqscans, and
> everything seemed to behave correctly.
>
>
> Thanks for looking!  I intend to post a new version later this week.

I don't think that I much like this (3 of 8) and (2 of 5) stuff.  It's
inconsistent with what we've got already and it doesn't add much.
Someone who wants to know which phase it is can look at the underlying
numbers directly instead of going through the view, but most people
probably won't care, and given that the phases may be of dramatically
unequal length, I don't think it's adding much.

Another reason why I think this is a bad idea is that there may be
some operations where we don't transit all the phases in all cases;
the pending patch for CLUSTER progress reporting works like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] kNN for btree

2019-03-11 Thread Alexander Korotkov
Hi!

I've some questions regarding this patchset.

1) This comment needs to be revised.  Now, B-tree supports both
ammatchorderby and amcanbackward.  How do we guarantee that kNN is not
backwards scan?

/*
 * Only forward scan is supported with reordering.  Note: we can get away
 * with just Asserting here because the system will not try to run the
 * plan backwards if ExecSupportsBackwardScan() says it won't work.
 * Currently, that is guaranteed because no index AMs support both
 * ammatchorderby and amcanbackward; if any ever do,
 * ExecSupportsBackwardScan() will need to consider indexorderbys
 * explicitly.
 */

2) Why this should be so?

EXPLAIN (COSTS OFF)
SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23)
ORDER BY thousand DESC, tenthous <-> 3500;
 QUERY PLAN
-
 Sort
   Sort Key: thousand DESC, ((tenthous <-> 3500))
   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
 Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[]))
(4 rows)

EXPLAIN (COSTS OFF)
SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23)
ORDER BY thousand, tenthous <-> 3500;
  QUERY PLAN
---
 Index Only Scan using tenk1_thous_tenthous on tenk1
   Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[]))
   Order By: (thousand AND (tenthous <-> 3500))
(3 rows)

It seems that we restart bidirectional scan for each value specified
in IN clause.  Then why does it matter whether it is forward or
backward scan?

3) What is idea of 8th patch?  It doesn't seem to be really needed for
subsequent 9th patch, because we anyway ignore partial match pathkeys.
Then why bother producing them?  Is it stub for further incremental
sort?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: Timeout parameters

2019-03-11 Thread MikalaiKeida
Hello Ryohei-san,

I understand the main aim of your suggestion that a client application has 
to do a lot of work except making quires to the database. I agree with you 
that "client-side timeout" has to be integrated into the PostgreSQL server 
and libpq.
I'm with Fabien that "client-side timeout" seems unsafe. Also I agree with 
Fabien that quire can take much time to be processed by the PosgtreSQL 
server and it is a normal behavior. There is possible that performance of 
the PostgreSQL server machine can be low temporary or continuously, 
especially during upgrading procedure.
I think it is important to add some more information into the description 
of this parameter which informs end-users that this parameter has to be 
used very carefully because it can impact as on the client application as 
on the server.

> You mentioned about when a SQL query is heavy, but I want to talk about 
when OS hang.
> If OS hang occurs, the cost of cancel request processing is so high.

Such a situation looks to be covered by TCP_USER_TIMEOUT and keep_alive 
mechanisms. May be it is better to warn in documentation or prohibit in 
the source code to set "client-side timeout" less than TCP_USER_TIMEOUT to 
avoid handling "possible" logical problems ahead to the network problems. 
Keep in mind that   "client-side timeout" can abort a connection which 
uses UNIX-domain sockets too.

What do you think about it?

Best regards,
Mikalai Keida

Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

2019-03-11 Thread Georgios Kokolatos
Hi,

I applied the patch on current master and run the tests, but I am afraid that 
the newly introduced test failed on installcheck-world:

```t/016_min_consistency.pl . # Looks like your test exited with 29 
before it could output anything.
t/016_min_consistency.pl . Dubious, test returned 29 (wstat 7424, 
0x1d00)
Failed 2/2 subtests

Test Summary Report
---
t/016_min_consistency.pl   (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 2 tests but ran 0.
Files=16, Tests=143, 65 wallclock secs ( 0.04 usr  0.04 sys +  6.53 cusr  5.08 
csys = 11.69 CPU)
Result: FAIL```

To be honest, I have not checked closely on the failure, still it is the only 
test failing which by itself should be worthwhile mentioning.

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Rahila Syed
Hi Alvaro,

On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera 
wrote:

> Hi Rahila,
>
> Thanks for looking.
>
> On 2019-Mar-04, Rahila wrote:
>
> > 1. Thank you for incorporating review comments.
> > Can you please rebase the latest 0001-Report-progress-of-
> > CREATE-INDEX-operations.patch on master? Currently it does not apply on
> > 754b90f657bd54b482524b73726dae4a9165031c
>
> Hmm, rebased to current master.  Didn't conflict at all when rebasing,
> so it's strange that it didn't apply.


Thanks for updating the patch. Sorry, I think it wasn't that the patch
needed rebasing but
I failed to apply it correctly last time. I can apply it now.


> +extern char *btbuildphasename(int64 phasenum);


1. I think int64 is too large a datatype for phasenum.
Also int32 is used for phasenum in  pg_indexam_progress_phasename().
Can we have it as int8?

2.

>   if ((previous_blkno == InvalidBlockNumber) ||

+   (scan->rs_cblock != previous_blkno))

+   {

+
> pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,

+
> scan->rs_cblock);

+   previous_blkno = scan->rs_cblock;

+   }


. In validate_index_heapscan, we dont calculate blocks_done similar to
IndexBuildHeapScan i.e
block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
results as we are
still processing that block.

3. There is no if(progress) check in validate_index()/
validate_index_heapscan() code. Wont it be a problem if it is called from
other
index methods which dont support reporting progress at the moment?

4.  Just to clarify my understanding can you please see below comment

Quoting your following comment in cluster command progress monitor thread
while referring to progress reporting from IndexBuildHeapScan,

"One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command."

Does that mean that we can't have any other INDEX progress monitoring, use
PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
parameter numbers to report anything but the metrics they report now.

5.

> 15:56:44.682 | building index (3 of 8): initializing (1/5)|
>442478 |  442399 |0 |   0 |0 |
>  0

15:56:44.694 | building index (3 of 8): initializing (1/5)|
>442478 |  442399 |0 |   0 |0 |
>  0

15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>442478 |  442399 |1 |   0 |0 |
>  0

15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
>442478 |  442399 |1 |   0 |0 |
>  0


I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
the blocks_done count is increasing. Shouldn't it have
changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
index(3 of 8): table scan(2/5) ?
Although I think it has been rectified in the latest patch as I now get
'table scan' phase in output as I do CREATE INDEX on table with 100
records

Thank you,
.--
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgbench MAX_ARGS

2019-03-11 Thread Andrew Dunstan


On 3/11/19 1:04 PM, Dagfinn Ilmari Mannsåker wrote:
> Andrew Dunstan  writes:
>
>> I think we've spent enough time on this. Committed with minor changes.
> Thanks for committing it. However, I can't see it in git. Did you forget
> to push?
>
>


Ooops, yes, done now.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread David Steele

On 3/11/19 6:43 PM, Andrew Dunstan wrote:


On 3/6/19 10:24 AM, Chapman Flack wrote:

On 3/6/19 10:12 AM, Andrew Dunstan wrote:


Having reviewed the thread, I'm with Andres and Tom. Maybe though we
should have a note somewhere to the effect that you can't use VARIADIC
with these.

Perhaps such a note belongs hoisted into the functions-conditional
section of the manual, making a general observation that these things
are conditional *expressions* that may resemble functions, but in
particular, COALESCE, GREATEST, and LEAST cannot be called with
keyword VARIADIC and an array argument, as they could if they were
ordinary functions.




I'm going to mark this as rejected. Here's a possible doc patch


+1

--
-David
da...@pgmasters.net



Re: pgbench MAX_ARGS

2019-03-11 Thread Dagfinn Ilmari Mannsåker
Andrew Dunstan  writes:

> I think we've spent enough time on this. Committed with minor changes.

Thanks for committing it. However, I can't see it in git. Did you forget
to push?

> cheers
>
>
> andrew

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen



Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-03-11 Thread Ashutosh Bapat
On Mon, Mar 11, 2019 at 10:40 AM amul sul  wrote:

>
> All the places from where this handle_missing_partition() get called
> have the following code to decide the value for missing_side_outer/_inner
> which
> I yet to understand. Do you think this has some flaw?
>
> /*
>  * For a FULL join, inner relation acts as both OUTER and INNER
>  * relation.  For LEFT and ANTI join the inner relation acts as
>  * INNER relation. For INNER and SEMI join OUTER and INNER
>  * differentiation is immaterial.
>  */
> missing_side_inner = (jointype == JOIN_FULL ||
>   jointype == JOIN_LEFT ||
>   jointype == JOIN_ANTI);
> missing_side_outer = (jointype == JOIN_FULL);
>

I was wrong, sorry. The comment says it all.


>
>
>
>> argument value which fails to set merged_index.
>>>
>>> In the attached patch, I tried to fix this case by setting merged_index
>>> explicitly which fixes the reported crash.
>>>
>>
>> I expect handle_missing_partition() to set the merged_index always. In
>> your patches, I don't see that function in your patches is setting it
>> explicitly. If we are setting merged_index explicitly somewhere else, other
>> places may miss that explicit assignment. So it's better to move it inside
>> this function.
>>
>>
>
> Ok, that can be fixed.
>
> Similarly, I think merge_null_partitions should set null_index instead of
> asserting when null partitions missing from both the side, make sense?
>

I think not. null_index, once set shouldn't change and hence does not
change with each pair of partitions being matched. So, it makes sense to
make sure that null_index remains invalid if none of the tables have null
partition.

--
Best Wishes,
Ashutosh Bapat


Re: proposal: variadic argument support for least, greatest function

2019-03-11 Thread Andrew Dunstan

On 3/6/19 10:24 AM, Chapman Flack wrote:
> On 3/6/19 10:12 AM, Andrew Dunstan wrote:
>
>> Having reviewed the thread, I'm with Andres and Tom. Maybe though we
>> should have a note somewhere to the effect that you can't use VARIADIC
>> with these.
> Perhaps such a note belongs hoisted into the functions-conditional
> section of the manual, making a general observation that these things
> are conditional *expressions* that may resemble functions, but in
> particular, COALESCE, GREATEST, and LEAST cannot be called with
> keyword VARIADIC and an array argument, as they could if they were
> ordinary functions.
>


I'm going to mark this as rejected. Here's a possible doc patch


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a78ea..7fbcdfeae5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12481,6 +12481,15 @@ SELECT setval('foo', 42, false);Next nextval
   
 
+   
+
+ Although COALESCE, GREATEST, and
+ LEAST are syntactically similar to functions, they are
+ not strictly functions, and thus cannot be used with explicit
+ VARIADIC array arguments.
+
+   
+
   
CASE
 


Re: move hash_any to utils/hash/hashfn.c

2019-03-11 Thread Alvaro Herrera
On 2019-Jan-25, Alvaro Herrera wrote:

> Would anybody object too hard if I move hash_any() and friends to
> src/backend/utils/hash/hashfn.c and the declarations to
> src/include/utils/hashutils.h?

Pushed this.  I ended up adding an #include of utils/hashutils.h to
access/hash.h, so that any third-party code using hash_any() from the
latter continues to build unbroken.  I don't think this is terribly
problematic -- the only issue is that those projects won't become
updated to use the leaner file, but I don't think we care too much about
that.  If I do get pushback about this change I can definitely remove
that line and watch the fireworks.

Anyway, let's see what *else* breaks now ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Unaccent extension python script Issue in Windows

2019-03-11 Thread Ramanarayana
Hi Hackers,

In master branch,  unaccent extension is having issue with the below python
script.This issue is only in windows 10 and python 3.

python generate_unaccent_rules.py --unicode-data-file UnicodeData.txt
--latin-ascii-file Latin-ASCII.xml > unaccent.rules

I am getting the following error

UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in
position 0: character maps to 

I went through the python script and found that the stdout encoding is set
to utf-8 only if python version is <=2. The same needs to be done for
python 3
-- 
Cheers
Ram 4.0


Re: pgbench MAX_ARGS

2019-03-11 Thread Andrew Dunstan


On 3/11/19 6:07 AM, David Rowley wrote:
> On Mon, 11 Mar 2019 at 12:37, Dagfinn Ilmari Mannsåker
>  wrote:
>> David Rowley  writes:
>>> I think some comments in the area to explain the 0th is for the sql
>>> would be a good idea too, that might stop any confusion in the
>>> future. I see that's documented in the struct header comment, but
>>> maybe worth a small note around that error message just to confirm the
>>> - 1 is not a mistake, and neither is the >= MAX_ARGS.
>> I have done this in the updated version of the patch, attached.
>> Setting back to NR.
> The patch looks good to me. I'm happy for it to be marked as ready for
> committer.  Fabien, do you want to have another look?
>



I think we've spent enough time on this. Committed with minor changes.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ECPG regression with DECLARE STATEMENT support

2019-03-11 Thread Michael Meskes
> I attached a simple bug-fixing patch.

I'm not happy with the situation, but don't see a better solution
either. Therefore I committed the change to get rid of the regression.

Thanks.

Michael 
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




Re: Portability of strtod (was Re: pgsql: Include GUC's unit, if it has one, in out-of-range error message)

2019-03-11 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Mar 11, 2019 at 8:45 AM Tom Lane  wrote:
>> I can think of three plausible responses.  In decreasing order of
>> amount of work:
>> 
>> 1. Decide that we'd better wrap strtod() with something that ensures
>> platform-independent behavior for all our uses of strtod (and strtof?)
>> rather than only float8in_internal.

> This sounds like a good approach, but won't it has the risk of change
> in behaviour?

Well, the point would be to produce consistent behavior across platforms
where it's not consistent now.  So yeah, some platforms would necessarily
see a behavior change.  But I think your point is that changing this
everywhere is solving a problem that hasn't been complained about,
and that's a valid concern.

>> 2. Put in a hack in guc.c to make it ignore ERANGE as long as the result
>> satisfies isinf().  This would ensure GUC cases would go through the
>> value-out-of-range path rather than the syntax-error path.  We've got
>> a bunch of other strtod() calls that are potentially subject to similar
>> platform dependencies though ...

> Yeah, this won't completely fix the symptom.

It would fix things in an area where we're changing the behavior anyway,
so maybe that's the right scope to work at.  After thinking about this
a little, it seems like simply ignoring ERANGE from strtod might get the
behavior we want: per POSIX strtod's result should be infinity for overflow
or zero for underflow, and proceeding with either of those should give
better behavior than treating the case as a syntax error.  Anyway
I think I'll try that and see what the buildfarm says.

>> 3. Decide this isn't worth avoiding platform dependencies for, and just
>> take out the new regression test case.  I'd only put in that test on
>> the spur of the moment anyway, so it's hard to argue that it's worth
>> much.

> For the time being option-3 sounds like a reasonable approach to fix
> buildfarm failures and then later if we want to do some bigger surgery
> based on option-1 or some other option, we can anyways do it.

Yeah, if I can't fix it pretty easily then I'll just remove the test
case.  But the behavior shown in the expected result is a bit nicer than
what we're actually getting from these buildfarm animals, so ideally
we'd fix it.

regards, tom lane



Re: GiST VACUUM

2019-03-11 Thread Heikki Linnakangas

On 10/03/2019 18:40, Andrey Borodin wrote:

Here's new version of the patch for actual page deletion.
Changes:

1. Fixed possible concurrency issue:

We were locking child page while holding lock on internal page. Notes
in GiST README recommend locking child before parent. Thus now we
unlock internal page before locking children for deletion, and lock
it again, check that everything is still on it's place and delete
only then.


Good catch. The implementation is a bit broken, though. This segfaults:

create table gist_point_tbl(id int4, p point);
create index gist_pointidx on gist_point_tbl using gist(p);

insert into gist_point_tbl (id, p)
  select g,  point((1+g) % 100, (1+g) % 100) from generate_series(1, 
1) g;

insert into gist_point_tbl (id, p)
  select -g, point(1000, 1000) from generate_series(1, 1) g;

delete from gist_point_tbl where id >= 0;
vacuum verbose gist_point_tbl;

BTW, we don't seem to have test coverage for this in the regression 
tests. The tests that delete some rows, where you updated the comment, 
doesn't actually seem to produce any empty pages that could be deleted.



One thing still bothers me. Let's assume that we have internal page
with 2 deletable leaves. We lock these leaves in order of items on
internal page. Is it possible that 2nd page have follow-right link on
1st and someone will lock 2nd page, try to lock 1st and deadlock with
VACUUM?


Hmm. If the follow-right flag is set on a page, it means that its right 
sibling doesn't have a downlink in the parent yet. Nevertheless, I think 
I'd sleep better, if we acquired the locks in left-to-right order, to be 
safe.


An easy cop-out would be to use LWLockConditionalAcquire, and bail out 
if we can't get the lock. But if it's not too complicated, it'd be nice 
to acquire the locks in the correct order to begin with.


I did a round of cleanup and moving things around, before I bumped into 
the above issue. I'm including them here as separate patches, for easier 
review, but they should of course be squashed into yours. For 
convenience, I'm including your patches here as well, so that all the 
patches you need are in the same place, but they are identical to what 
you posted.


- Heikki
>From a09f14de32f3040b19238d239b7f025f345e940b Mon Sep 17 00:00:00 2001
From: Andrey 
Date: Fri, 8 Mar 2019 21:19:32 +0500
Subject: [PATCH 1/6] Add block set data structure

Currently we have Bitmapset which works only with 32 bit ints.
Furthermore it is not very space-efficient in case of sparse
bitmap.

In this commit we invent block set: statically typed radix tree
for keeping BlockNumbers. This still can be improved in many ways
applicable to bitmaps, most notable in-pointer lists can be used.
But for the sake of code simplicity it is now plain in it's
ctructure.

The block set is introduced to support efficient page deletion in
GiST's VACUUM.
---
 src/backend/lib/Makefile  |   4 +-
 src/backend/lib/blockset.c| 201 ++
 src/include/lib/blockset.h|  24 +++
 src/test/modules/test_blockset/.gitignore |   4 +
 src/test/modules/test_blockset/Makefile   |  21 ++
 src/test/modules/test_blockset/README |   8 +
 .../test_blockset/expected/test_blockset.out  |   7 +
 .../test_blockset/sql/test_blockset.sql   |   3 +
 .../test_blockset/test_blockset--1.0.sql  |   8 +
 .../modules/test_blockset/test_blockset.c | 182 
 .../test_blockset/test_blockset.control   |   4 +
 11 files changed, 464 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/blockset.c
 create mode 100644 src/include/lib/blockset.h
 create mode 100644 src/test/modules/test_blockset/.gitignore
 create mode 100644 src/test/modules/test_blockset/Makefile
 create mode 100644 src/test/modules/test_blockset/README
 create mode 100644 src/test/modules/test_blockset/expected/test_blockset.out
 create mode 100644 src/test/modules/test_blockset/sql/test_blockset.sql
 create mode 100644 src/test/modules/test_blockset/test_blockset--1.0.sql
 create mode 100644 src/test/modules/test_blockset/test_blockset.c
 create mode 100644 src/test/modules/test_blockset/test_blockset.control

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 191ea9bca26..9601894f07c 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
-   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
+OBJS = binaryheap.o bipartite_match.o blockset.o bloomfilter.o dshash.o \
+   hyperloglog.o ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/blockset.c b/src/backend/lib/blockset.c
new file mode 100644
index 000..c07b2974b1e
--- /dev/null
+++ b/src/backend/lib/blockset.c

Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-11 Thread Tom Lane
Etsuro Fujita  writes:
> (2019/03/11 14:14), Tom Lane wrote:
>> Seems to me it's the other way around: the final target would include
>> all functions invoked in the grouping target plus maybe some more.
>> So a non-parallel-safe grouping target implies a non-parallel-safe
>> final target, but not vice versa.

> I mean the final *scan/join* target, not the final target.

Oh, of course.  Yup, I was too tired last night :-(.  So this is
just a plan-quality problem not a wrong-answer problem.

However, I'd still argue for back-patching into v11, on the grounds
that this is a regression from v10.  The example you just gave does
produce the desired plan in v10, and I think it's more likely that
people would complain about a regression from v10 than that they'd
be unhappy because we changed it between 11.2 and 11.3.

regards, tom lane



Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance

2019-03-11 Thread Tom Lane
Dean Rasheed  writes:
> +1 for including the inverse functions. However, it looks to me like
> the inverse functions are C99-specific, so they might not be available
> on all supported platforms. If they're not, we may need to provide our
> own implementations.

FWIW, I'm pretty sure they're available everywhere.  It's true C89
doesn't mention them, but POSIX has had them for a long time.  The
SUSv2 version of POSIX has them, and so does my pet dinosaur HPUX 10.20,
which has this to say about their origin:

$ man asinh
...
STANDARDS CONFORMANCE
  asinh(): SVID3, XPG4.2

Windows, as usual, is a wild card, but as far as I can tell by googling
they exist in Windows too (at least recent versions).

It's definitely possible that there are substandard implementations
out there, though.  Hopefully the buildfarm will alert us to any
problems.

> Of course that may all be moot -- those functions may in fact be
> available everywhere we care about, but it was interesting to play
> around with them anyway.

Yeah, math functions are fun to play around with ... and we could end
up needing the code.  We'll see.

regards, tom lane



Re: Offline enabling/disabling of data checksums

2019-03-11 Thread Sergei Kornilov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Hello

I review latest patchset. I have one big question: Is pg_checksums safe for 
cross-versions operations? Even with update_controlfile call? Currently i am 
able to enable checksums in pg11 cluster with pg_checksums compiled on HEAD. Is 
this expected? I didn't notice any version-specific check in code.

And few small notes:

> pg_checksums checks, enables or disables data checksums

maybe better is , not ?

> + printf(_("%s enables, disables or verifies data checksums in a 
> PostgreSQL\n"), progname);
> + printf(_("database cluster.\n\n"));

I doubt this is good line formatting for translation purposes.

> + printf(_("  -c, --checkcheck data checksums.  This is the 
> default\n"));
> + printf(_(" mode if nothing is specified.\n"));

same. For example pg_basebackup uses different multiline style:

>   printf(_("  -r, --max-rate=RATEmaximum transfer rate to transfer 
> data directory\n"
>" (in kB/s, or use suffix 
> \"k\" or \"M\")\n"));

> +command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
> +   "fails when relfilnodes are requested and action is --disable");

action is "--enable" here ;-)

> if (badblocks > 0)
>   return 1;

Small question: why return 1 instead of exit(1)?

> 
>  

How about use "app-pgchecksums" similar to other applications?

regards, Sergei

Re: Suggestions on message transfer among backends

2019-03-11 Thread Andy Fan
notes on the shared hash map:   it needs multi writers and multi readers.

On Mon, Mar 11, 2019 at 9:36 PM Andy Fan  wrote:

> Hi:
>   I need some function which requires some message exchange among
> different back-ends (connections).
> specially I need a shared hash map and a message queue.
>
> Message queue:  it should be many writers,  1 reader.   Looks POSIX
> message queue should be OK, but postgre doesn't use it.  is there any
> equivalent in PG?
>
> shared hash map:  the number of items can be fixed and the value can be
> fixed as well.
>
> any keywords or explanation will be extremely helpful.
>
> Thanks
>


Suggestions on message transfer among backends

2019-03-11 Thread Andy Fan
Hi:
  I need some function which requires some message exchange among different
back-ends (connections).
specially I need a shared hash map and a message queue.

Message queue:  it should be many writers,  1 reader.   Looks POSIX message
queue should be OK, but postgre doesn't use it.  is there any equivalent in
PG?

shared hash map:  the number of items can be fixed and the value can be
fixed as well.

any keywords or explanation will be extremely helpful.

Thanks


Best way to keep track of a sliced TOAST

2019-03-11 Thread Bruno Hass
Hi,

I've been reading about TOASTing and would like to modify how the slicing works 
by taking into consideration the type of the varlena field. These changes would 
support future implementations of type specific optimized TOAST'ing functions. 
The first step would be to add information to the TOAST so we know if it is 
sliced or not and by which function it was sliced and TOASTed. This information 
should not break the current on disk format of TOASTs. I had the idea of 
putting this information on the varattrib struct va_header, perhaps adding more 
bit layouts to represent sliced TOASTs. This idea, however, was pointed to me 
to be a rather naive approach. What would be the best way to do this?

Bruno Hass


Re: Should we increase the default vacuum_cost_limit?

2019-03-11 Thread Julien Rouhaud
On Mon, Mar 11, 2019 at 10:03 AM David Rowley
 wrote:
>
> On Mon, 11 Mar 2019 at 09:58, Tom Lane  wrote:
> > The second patch is a delta that rounds off to the next smaller unit
> > if there is one, producing a less noisy result:
> >
> > regression=# set work_mem = '30.1GB';
> > SET
> > regression=# show work_mem;
> >  work_mem
> > --
> >  30822MB
> > (1 row)
> >
> > I'm not sure if that's a good idea or just overthinking the problem.
> > Thoughts?
>
> I don't think you're over thinking it.  I often have to look at such
> settings and I'm probably not unique in when I glance at 30822MB I can
> see that's roughly 30GB, whereas when I look at 31562138kB, I'm either
> counting digits or reaching for a calculator.  This is going to reduce
> the time it takes for a human to process the pg_settings output, so I
> think it's a good idea.

Definitely, rounding up will spare people from wasting time to check
what's the actual value.



Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-11 Thread Alvaro Herrera
Hi Rahila

On 2019-Mar-11, Rahila Syed wrote:

> On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera 
> wrote:

> > +extern char *btbuildphasename(int64 phasenum);
> 
> 1. I think int64 is too large a datatype for phasenum.
> Also int32 is used for phasenum in  pg_indexam_progress_phasename().
> Can we have it as int8?

It does look strange, I agree, and the first code I wrote had it using a
smaller type.  However, I later realized that since the value comes
directly from pg_stat_get_progress_info(), which returns int8 values, it
was pointless to only accept a small fraction of the possible values for
no good reason, so I widened it to int64 as you see now.

> 2.
> . In validate_index_heapscan, we dont calculate blocks_done similar to
> IndexBuildHeapScan i.e
> block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
> Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
> results as we are
> still processing that block.

Thanks for pointing out that there's an off-by-one bug there (should be
cblock-1).  However, IndexBuildHeapScan uses more complicated code
because it needs to cover for two additional things that
validate_index_heapscan doesn't: parallel heapscans and synchronized
seqscans.  We could do that, I just saw no point in it.

> 3. There is no if(progress) check in validate_index()/
> validate_index_heapscan() code. Wont it be a problem if it is called from
> other index methods which dont support reporting progress at the
> moment?

Good question.  I'll have a look.  Most likely, I'll end up having
things so that building an index using an unsupported index AM reports
progress based on IndexBuildHeapScan / validate_index /
validate_index_heapscan ... which might mean I should remove the
'progress' parameter from there and have them report unconditionally.

> 4.  Just to clarify my understanding can you please see below comment
> 
> Quoting your following comment in cluster command progress monitor thread
> while referring to progress reporting from IndexBuildHeapScan,
> 
> "One, err, small issue with that idea is that we need the param numbers
> not to conflict for any "progress update providers" that are to be used
> simultaneously by any command."
> 
> Does that mean that we can't have any other INDEX progress monitoring, use
> PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
> parameter numbers to report anything but the metrics they report now.

What I mean is that the literal parameter numbers defined as
PROGRESS_SCAN_BLOCKS_DONE/TOTAL may not be used for other parameters by
commands that call IndexBuildHeapScan, if those other parameters are
used by the same commands simultaneously with IndexBuildHeapScan.  So
those parameter numbers become "reserved".

> 5.
> 
> > 15:56:44.682 | building index (3 of 8): initializing (1/5)|
> >442478 |  442399 |0 |   0 |0 |
> 
> I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
> the blocks_done count is increasing. Shouldn't it have
> changed to reflect  PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
> index(3 of 8): table scan(2/5) ?
> Although I think it has been rectified in the latest patch as I now get
> 'table scan' phase in output as I do CREATE INDEX on table with 100
> records

Yeah, this was a bug that I fixed in v5.  (It was a misunderstanding
about how parallel scanning is set up, IIRC).  For v5, I tested both
parallel and non-parallel builds, with and without sync seqscans, and
everything seemed to behave correctly.


Thanks for looking!  I intend to post a new version later this week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance

2019-03-11 Thread Dean Rasheed
On Sun, 3 Feb 2019 at 15:12, Tom Lane  wrote:
>
> Andrew Gierth  writes:
> > The spec doesn't require the inverse functions (asinh, acosh, atanh),
> > but surely there is no principled reason to omit them?
>
> +1 --- AFAICS, the C library has offered all six since C89.
>

+1 for including the inverse functions. However, it looks to me like
the inverse functions are C99-specific, so they might not be available
on all supported platforms. If they're not, we may need to provide our
own implementations.

I did a bit of research and had play. It looks like it might not be
too hard to provide our own implementations, but it does take a bit of
care to avoid rounding and overflow errors. Attached are some
standalone C programs where I tested various algorithms. A decent
approach seems to be to either use log1p() (which is itself
C99-specific, hence that's the first thing I played with) or to use a
single round of Newton-Raphson to correct rounding errors, in a
similar way to how we implement cbrt() on platforms that don't have
that.

Of course that may all be moot -- those functions may in fact be
available everywhere we care about, but it was interesting to play
around with them anyway.

Regards,
Dean
#include 
#include 
#include 

/*
 * A commonly used algorithm, due to Kahan (citation needed).
 *
 * Max error: 1 ulp.
 * Avg error: 0.158 ulp.
 *
 * It's not obvious, but this does appear to be monotonic at the cutover point
 * (at least on my machine). Can that be relied upon on all platforms?
 *
 * -5.5511151231257851673e-17 -> -5.5511151231257851673e-17 (u != 1 branch)
 * -5.5511151231257839347e-17 -> -5.5511151231257839347e-17 (u != 1 branch)
 * -5.5511151231257827021e-17 -> -5.5511151231257827021e-17 (u == 1 branch)
 * -5.5511151231257820858e-17 -> -5.5511151231257820858e-17 (u == 1 branch)
 *
 * 1.1102230246251564172e-16 -> 1.1102230246251564172e-16 (u == 1 branch)
 * 1.1102230246251565404e-16 -> 1.1102230246251565404e-16 (u == 1 branch)
 * 1.1102230246251567869e-16 -> 1.1102230246251565404e-16 (u != 1 branch)
 * 1.1102230246251570335e-16 -> 1.1102230246251567869e-16 (u != 1 branch)
 *
 * However, it doesn't appear to be monotonic at various other points.
 */
double kahan_log1p(double x)
{
	volatile double u = 1+x;
	return u == 1 ? x : x * (log(u) / (u-1));
}

/*
 * Algorithm used in the GNU Scientific Library.
 *
 * Max error: 1 ulp.
 * Avg error: 0.086 ulp.
 *
 * Where does this formula come from? Licensing?
 * Doesn't return -0 when x is -0, but that could be fixed up.
 * Looks to be monotonic everywhere.
 */
double gsl_log1p(double x)
{
	volatile double y, z;
	y = 1 + x;
	z = y - 1;
	return log(y) - (z-x)/y;
}

/*
 * Alternative algorithm using one round of Newton-Raphson.
 *
 * Max error: 1 ulp.
 * Avg error: 0.094 ulp.
 *
 * Works well for all inputs.
 * Looks to be monotonic everywhere.
 */
double nr_log1p(double x)
{
	double y, e;

	if (x == 0)
		return x;

	y = log(1+x);
	e = exp(y);

	return y - (e-1-x)/e;
}

/* === TESTS === */

int num_tests = 0;
double max_kahan_err = 0.0;
double max_gsl_err = 0.0;
double max_nr_err = 0.0;
double total_kahan_err = 0.0;
double total_gsl_err = 0.0;
double total_nr_err = 0.0;

double err(double x, double y)
{
	if (x < y)
	{
		double diff = y - x;
		double ulp = nextafter(x, y) - x;
		return diff / ulp;
	}
	if (x > y)
	{
		double diff = x - y;
		double ulp = nextafter(y, x) - y;
		return diff / ulp;
	}
	return 0.0;
}

void test_log1p(double x)
{
	double y = log1p(x);
	double kahan_y = kahan_log1p(x);
	double gsl_y = gsl_log1p(x);
	double nr_y = nr_log1p(x);
	double kahan_err = err(y, kahan_y);
	double gsl_err = err(y, gsl_y);
	double nr_err = err(y, nr_y);

	double prev_x = nextafter(x, -DBL_MAX);
	double next_x = nextafter(x, DBL_MAX);
#define monotonic(fn) \
	( prev_x == -1 || (fn)(prev_x) <= (fn)(x) ) && \
	( next_x == DBL_MAX || (fn)(next_x) >= (fn)(x) ) ? \
	"Monotonic" : "Not monotonic"

	printf("x = %.20g\n", x);
	printf("Standard result: %.20g %s\n", y, monotonic(log1p));
	printf("Kahan log1p():   %.20g,  err=%f %s\n", kahan_y, kahan_err,
		   monotonic(kahan_log1p));
	printf("GSL log1p(): %.20g,  err=%f %s\n", gsl_y, gsl_err,
		   monotonic(gsl_log1p));
	printf("NR log1p():  %.20g,  err=%f %s\n", nr_y, nr_err,
		   monotonic(nr_log1p));

	if (kahan_err > max_kahan_err) max_kahan_err = kahan_err;
	if (gsl_err > max_gsl_err) max_gsl_err = gsl_err;
	if (nr_err > max_nr_err) max_nr_err = nr_err;

	total_kahan_err += kahan_err;
	total_gsl_err += gsl_err;
	total_nr_err += nr_err;
	num_tests++;
}

int main()
{
	test_log1p(nextafter(-1, 0));
	test_log1p(3.141e-16-1);
	test_log1p(3.141e-15-1);
	test_log1p(3.141e-14-1);
	test_log1p(3.141e-13-1);
	test_log1p(3.141e-12-1);
	test_log1p(3.141e-11-1);
	test_log1p(3.141e-10-1);
	test_log1p(3.141e-9-1);
	test_log1p(3.141e-8-1);
	test_log1p(3.141e-7-1);
	test_log1p(3.141e-6-1);
	test_log1p(3.141e-5-1);
	test_log1p(3.141e-4-1);
	test_log1p(3.141e-3-1);
	test_log1p(3.141e-2-1);
	test_log1p(-6.859e-1);
	

Re: Fix optimization of foreign-key on update actions

2019-03-11 Thread Peter Eisentraut
On 2019-02-06 23:15, Peter Eisentraut wrote:
> On 05/02/2019 17:20, Tom Lane wrote:
>> What I *don't* like about the proposed patch is that it installs a
>> new, different comparison rule for the ON UPDATE CASCADE case only.
>> If we were to go in this direction, I'd think we should try to use
>> the same comparison rule for all FK row comparisons.
> 
> That's easy to change.  I had it like that in earlier versions of the
> patch.  I agree it would be better for consistency, but it would create
> some cases where we do unnecessary extra work.

Updated patch with this change made, and some conflicts resolved.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d8615fe224eed4747682e88202faad1251c317f6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 5 Feb 2019 16:27:00 +0100
Subject: [PATCH v2 1/2] Test case for keys that "look" different but compare
 as equal

---
 src/test/regress/expected/foreign_key.out | 34 +++
 src/test/regress/sql/foreign_key.sql  | 20 +
 2 files changed, 54 insertions(+)

diff --git a/src/test/regress/expected/foreign_key.out 
b/src/test/regress/expected/foreign_key.out
index bf2c91d9f0..a68d055e40 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1474,6 +1474,40 @@ delete from pktable2 where f1 = 1;
 alter table fktable2 drop constraint fktable2_f1_fkey;
 ERROR:  cannot ALTER TABLE "pktable2" because it has pending trigger events
 commit;
+drop table pktable2, fktable2;
+--
+-- Test keys that "look" different but compare as equal
+--
+create table pktable2 (a float8, b float8, primary key (a, b));
+create table fktable2 (x float8, y float8, foreign key (x, y) references 
pktable2 (a, b) on update cascade);
+insert into pktable2 values ('-0', '-0');
+insert into fktable2 values ('-0', '-0');
+select * from pktable2;
+ a  | b  
++
+ -0 | -0
+(1 row)
+
+select * from fktable2;
+ x  | y  
++
+ -0 | -0
+(1 row)
+
+update pktable2 set a = '0' where a = '-0';
+select * from pktable2;
+ a | b  
+---+
+ 0 | -0
+(1 row)
+
+-- buggy: did not update fktable2.x
+select * from fktable2;
+ x  | y  
++
+ -0 | -0
+(1 row)
+
 drop table pktable2, fktable2;
 --
 -- Foreign keys and partitioned tables
diff --git a/src/test/regress/sql/foreign_key.sql 
b/src/test/regress/sql/foreign_key.sql
index c8d1214d02..c1fc7d30b1 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -1106,6 +1106,26 @@ CREATE TEMP TABLE tasks (
 
 drop table pktable2, fktable2;
 
+--
+-- Test keys that "look" different but compare as equal
+--
+create table pktable2 (a float8, b float8, primary key (a, b));
+create table fktable2 (x float8, y float8, foreign key (x, y) references 
pktable2 (a, b) on update cascade);
+
+insert into pktable2 values ('-0', '-0');
+insert into fktable2 values ('-0', '-0');
+
+select * from pktable2;
+select * from fktable2;
+
+update pktable2 set a = '0' where a = '-0';
+
+select * from pktable2;
+-- buggy: did not update fktable2.x
+select * from fktable2;
+
+drop table pktable2, fktable2;
+
 
 --
 -- Foreign keys and partitioned tables

base-commit: f2d84a4a6b4ec891a0a52f583ed5aa081c71acc6
-- 
2.21.0

From f9f05aceaed972147719345c65cce9222c50ef5e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Mar 2019 12:55:22 +0100
Subject: [PATCH v2 2/2] Fix optimization of foreign-key on update actions

In RI_FKey_pk_upd_check_required(), we check among other things
whether the old and new key are equal, so that we don't need to run
cascade actions when nothing has actually changed.  This was using the
equality operator.  But the effect of this is that if a value in the
primary key is changed to one that "looks" different but compares as
equal, the update is not propagated.  (Examples are float -0 and 0 and
case-insensitive text.)  This appears to violate the SQL standard, and
it also behaves inconsistently if in a multicolumn key another key is
also updated that would cause the row to compare as not equal.

To fix, if we are looking at the PK table in ri_KeysEqual(), then do a
bytewise comparison similar to record_image_eq() instead of using the
equality operators.  This only makes a difference for ON UPDATE
CASCADE, but for consistency we treat all changes to the PK the same.  For
the FK table, we continue to use the equality operators.

Discussion: 
https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e...@2ndquadrant.com
---
 src/backend/utils/adt/datum.c | 57 +++
 src/backend/utils/adt/ri_triggers.c   | 40 ++--
 src/backend/utils/adt/rowtypes.c  | 41 +---
 src/include/utils/datum.h |  9 
 src/test/regress/expected/foreign_key.out |  8 ++--
 src/test/regress/sql/foreign_key.sql  |  2 +-
 6 files changed, 100 insertions(+), 57 

Re: Fix volatile vs. pointer confusion

2019-03-11 Thread Alvaro Herrera
On 2019-Mar-11, Peter Eisentraut wrote:

> Variables used after a longjmp() need to be declared volatile.  In
> case of a pointer, it's the pointer itself that needs to be declared
> volatile, not the pointed-to value.  So we need
> 
> PyObject *volatile items;
> 
> instead of
> 
> volatile PyObject *items;  /* wrong */

Looking at recently committed 2e616dee9e60, we have introduced this:

+   volatile xmlBufferPtr buf = NULL;
+   volatile xmlNodePtr cur_copy = NULL;

where the pointer-ness nature of the object is inside the typedef.  I
*suppose* that this is correct as written.  There are a few occurrences
of this pattern in eg. contrib/xml2.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fwd: Add tablespace tap test to pg_rewind

2019-03-11 Thread Shaoqi Bai
Thanks, will work on it as you suggested
Add pg_basebackup --T olddir=newdir to support check the consistency of a
tablespace created before promotion
Add run_test('remote');

On Mon, Mar 11, 2019 at 6:50 AM Michael Paquier  wrote:

> On Sat, Mar 09, 2019 at 09:09:24AM +0900, Michael Paquier wrote:
> > When working on the first version of pg_rewind for VMware with Heikki,
> > tablespace support has been added only after, so what you propose is
> > sensible I think.
> >
> > +# Run the test in both modes.
> > +run_test('local');
> > Small nit: we should test for the remote mode here as well.
>
> I got to think more about this one a bit more, and I think that it
> would be good to also check the consistency of a tablespace created
> before promotion.  If you copy the logic from 002_databases.pl, this
> is not going to work because the primary and the standby would try to
> create the tablespace in the same path, stepping on each other's
> toes.  So what you need to do is to create the tablespace on the
> primary because creating the standby.  This requires a bit more work
> than what you propose here though as you basically need to extend
> RewindTest::create_standby so as it is possible to pass extra
> arguments to $node_master->backup.  And in this case the extra option
> to use is --tablespace-mapping to make sure that the primary and the
> standby have the same tablespace, but defined on different paths.
> --
> Michael
>


Re: I have some troubles to run test_shm_mq;

2019-03-11 Thread Andy Fan
 Thanks for the clarification!

On Mon, Mar 11, 2019 at 5:02 PM Thomas Munro  wrote:

> On Mon, Mar 11, 2019 at 9:35 PM Andy Fan  wrote:
> > and whenever I run a simple query  "SELECT test_shm_mq(1024, 'a');"
> >
> > I see the following log
> >
> > 2019-03-11 16:33:17.800 CST [65021] LOG:  background worker
> "test_shm_mq" (PID 65052) exited with exit code 1
>
> Hmm, I don't know actually know why test_shm_mq_main() ends with
> proc_exit(1) instead of 0.  It's possible that it was written when the
> meaning of bgworker exit codes was still being figured out, but I'm
> not sure...
>
> >> Works, thank you Thomas!  I have spent more than 2 hours on this.   do
> you know which document I miss for this question?
>
> There is probably only src/test/modules/README, which explains that
> these modules are tests and examples and not part of a server
> installation.
>
> --
> Thomas Munro
> https://enterprisedb.com
>


Re: Feature: temporary materialized views

2019-03-11 Thread Andreas Karlsson

On 3/8/19 2:38 AM, Michael Paquier wrote:

On Thu, Mar 07, 2019 at 10:45:04AM +0200, David Steele wrote:

I think a new patch is required here so I have marked this Waiting on
Author.  cfbot is certainly not happy and anyone trying to review is going
to have hard time trying to determine what to review.


I would recommend to mark this patch as returned with feedback as we
already know that we need to rethink a bit harder the way relations
are created in CTAS, not to mention that the case of EXPLAIN CTAS IF
NOT EXISTS is not correctly handled.  This requires more than three of
work which is what remains until the end of this CF, so v12 is not a
sane target.


Agreed. Even if I could find the time to write a patch for this there is 
no way it would make it into v12.


Andreas




Re: Offline enabling/disabling of data checksums

2019-03-11 Thread Michael Banck
Hi,

Am Montag, den 11.03.2019, 11:11 +0100 schrieb Michael Banck:
> I had a quick look over the patch and your changes and it LGTM.

One thing: you (Michael) should be co-author for patch #3 as I took some
of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
_checksums


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Offline enabling/disabling of data checksums

2019-03-11 Thread Michael Banck
Hi Michael,

Am Montag, den 11.03.2019, 13:53 +0900 schrieb Michael Paquier:
> On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:
> > Hallo Michael,
> 
> Okay, let's move on with these patches!

Wow cool. I was going to go back to these and split them up similar to
how you did it now that the online verification patch seems to be
done/stuck for v12, but great that you beat me to it.

I had a quick look over the patch and your changes and it LGTM.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: pgbench MAX_ARGS

2019-03-11 Thread David Rowley
On Mon, 11 Mar 2019 at 12:37, Dagfinn Ilmari Mannsåker
 wrote:
>
> David Rowley  writes:
> > I think some comments in the area to explain the 0th is for the sql
> > would be a good idea too, that might stop any confusion in the
> > future. I see that's documented in the struct header comment, but
> > maybe worth a small note around that error message just to confirm the
> > - 1 is not a mistake, and neither is the >= MAX_ARGS.
>
> I have done this in the updated version of the patch, attached.

> Setting back to NR.

The patch looks good to me. I'm happy for it to be marked as ready for
committer.  Fabien, do you want to have another look?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Should we increase the default vacuum_cost_limit?

2019-03-11 Thread David Rowley
On Mon, 11 Mar 2019 at 09:58, Tom Lane  wrote:
> The second patch is a delta that rounds off to the next smaller unit
> if there is one, producing a less noisy result:
>
> regression=# set work_mem = '30.1GB';
> SET
> regression=# show work_mem;
>  work_mem
> --
>  30822MB
> (1 row)
>
> I'm not sure if that's a good idea or just overthinking the problem.
> Thoughts?

I don't think you're over thinking it.  I often have to look at such
settings and I'm probably not unique in when I glance at 30822MB I can
see that's roughly 30GB, whereas when I look at 31562138kB, I'm either
counting digits or reaching for a calculator.  This is going to reduce
the time it takes for a human to process the pg_settings output, so I
think it's a good idea.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: I have some troubles to run test_shm_mq;

2019-03-11 Thread Thomas Munro
On Mon, Mar 11, 2019 at 9:35 PM Andy Fan  wrote:
> and whenever I run a simple query  "SELECT test_shm_mq(1024, 'a');"
>
> I see the following log
>
> 2019-03-11 16:33:17.800 CST [65021] LOG:  background worker "test_shm_mq" 
> (PID 65052) exited with exit code 1

Hmm, I don't know actually know why test_shm_mq_main() ends with
proc_exit(1) instead of 0.  It's possible that it was written when the
meaning of bgworker exit codes was still being figured out, but I'm
not sure...

>> Works, thank you Thomas!  I have spent more than 2 hours on this.   do you 
>> know which document I miss for this question?

There is probably only src/test/modules/README, which explains that
these modules are tests and examples and not part of a server
installation.

-- 
Thomas Munro
https://enterprisedb.com



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-11 Thread Dean Rasheed
On Sun, 10 Mar 2019 at 22:28, David Rowley  wrote:
>
> On Mon, 11 Mar 2019 at 06:36, Tomas Vondra  
> wrote:
> >
> > On 3/9/19 7:33 PM, Dean Rasheed wrote:
> > > I wonder if it's possible to write smaller, more targeted tests.
> > > Currently "stats_ext" is by far the slowest test in its group, and I'm
> > > not sure that some of those tests add much. It ought to be possible to
> > > write a function that calls EXPLAIN and returns a query's row
> > > estimate, and then you could write tests to confirm the effect of the
> > > new stats by verifying the row estimates change as expected.
> >
> > Sure, if we can write more targeted tests, that would be good. But it's
> > not quite clear to me how wrapping EXPLAIN in a function makes those
> > tests any faster?
>
> I've not looked at the tests in question, but if they're executing an
> inferior plan is used when no extended stats exists, then maybe that's
> why they're slow.
>
> I think Dean might mean to create a function similar to
> explain_parallel_append() in partition_prune.sql then write tests that
> check the row estimate with EXPLAIN (COSTS ON) but strip out the other
> costing stuff instead of validating that the poor plan was chosen.
>

Yeah that's the sort of thing I was thinking of. I think it might be
possible to write simpler and faster tests by inserting far fewer rows
and relying on ANALYSE having sampled everything, so the row estimates
should be predictable. It may be the case that, with just a handful of
rows, the extended stats don't affect the plan, but you'd still see a
difference in the row estimates, and that could be a sufficient test I
think.

> > On 3/10/19 2:09 PM, Dean Rasheed wrote:
> > > 12). bms_member_index() should surely be in bitmapset.c. It could be
> > > more efficient by just traversing the bitmap words and making use of
> > > bmw_popcount(). Also, its second argument should be of type 'int' for
> > > consistency with other bms_* functions.
> >
> > Yes, moving to bitmapset.c definitely makes sense. I don't see how it
> > could use bms_popcount() though.
>
> I think it could be done by first checking if the parameter is a
> member of the set, and then if so, count all the bits that come on and
> before that member. You can use bmw_popcount() for whole words before
> the specific member's word then just bitwise-and a bit mask of a
> bitmapword that has all bits set for all bits on and before your
> parameter's BITNUM(), and add the bmw_popcount of the final word
> bitwise-anding the mask. bms_add_range() has some masking code you
> could copy.

Yep, that's what I was imagining. Except I think that to get a 0-based
index result you'd want the mask to have all bits set for bits
*before* the parameter's BITNUM(), rather than on and before. So I
think the mask would simply be

  ((bitmapword) 1 << bitnum) - 1

Regards,
Dean



Re: I have some troubles to run test_shm_mq;

2019-03-11 Thread Andy Fan
and whenever I run a simple query  "SELECT test_shm_mq(1024, 'a');"

I see the following log

2019-03-11 16:33:17.800 CST [65021] LOG:  background worker "test_shm_mq"
(PID 65052) exited with exit code 1


does it indicates something wrong?

On Mon, Mar 11, 2019 at 4:30 PM Andy Fan  wrote:

> Works, thank you Thomas!  I have spent more than 2 hours on this.   do you
> know which document I miss for this question?
>
> Thanks
>
> On Mon, Mar 11, 2019 at 4:05 PM Thomas Munro 
> wrote:
>
>> On Mon, Mar 11, 2019 at 8:59 PM Andy Fan 
>> wrote:
>> > 4.   CREATE EXTENSION test_shm_mq;  ==> .  could not open extension
>> control file "/.../share/postgresql/extension/test_shm_mq.control": No such
>> file or directory
>> >
>> > how can I get it work?  Thanks
>>
>> Hi Andy,
>>
>> Try this first:
>>
>> cd src/test/modules/test_shm_mq
>> make install
>>
>> --
>> Thomas Munro
>> https://enterprisedb.com
>>
>


Re: Fix volatile vs. pointer confusion

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 08:23:39AM +0100, Peter Eisentraut wrote:
> Attached patch fixes a couple of cases of that.  Most instances were
> already correct.

It seems to me that you should look at that:
https://www.postgresql.org/message-id/20190308055911.gg4...@paquier.xyz
They treat about the same subject, and a patch has been sent for this
CF.
--
Michael


signature.asc
Description: PGP signature


Re: I have some troubles to run test_shm_mq;

2019-03-11 Thread Andy Fan
Works, thank you Thomas!  I have spent more than 2 hours on this.   do you
know which document I miss for this question?

Thanks

On Mon, Mar 11, 2019 at 4:05 PM Thomas Munro  wrote:

> On Mon, Mar 11, 2019 at 8:59 PM Andy Fan  wrote:
> > 4.   CREATE EXTENSION test_shm_mq;  ==> .  could not open extension
> control file "/.../share/postgresql/extension/test_shm_mq.control": No such
> file or directory
> >
> > how can I get it work?  Thanks
>
> Hi Andy,
>
> Try this first:
>
> cd src/test/modules/test_shm_mq
> make install
>
> --
> Thomas Munro
> https://enterprisedb.com
>


Re: Problems with plan estimates in postgres_fdw

2019-03-11 Thread Etsuro Fujita

(2019/03/09 1:25), Antonin Houska wrote:

Etsuro Fujita  wrote:

(2019/03/01 20:16), Antonin Houska wrote:

Etsuro Fujita   wrote:



Conversely, it appears that add_foreign_ordered_paths() added by the patchset
would generate such pre-sorted paths *redundantly* when the input_rel is the
final scan/join relation.  Will look into that.


Currently I have no idea how to check the plan created by FDW at the
UPPERREL_ORDERED stage, except for removing the sort from the
UPPERREL_GROUP_AGG stage as I proposed here:

https://www.postgresql.org/message-id/11807.1549564431%40localhost


I don't understand your words "how to check the plan created by FDW". Could
you elaborate on that a bit more?


I meant that I don't know how to verify that the plan that sends the ORDER BY
clause to the remote server was created at the UPPERREL_ORDERED and not at
UPPERREL_GROUP_AGG. However if the ORDER BY clause really should not be added
at the UPPERREL_GROUP_AGG stage and if we ensure that it no longer happens,
then mere presence of ORDER BY in the (remote) plan means that the
UPPERREL_ORDERED stage works fine.


I don't think we need to consider pushing down the query's ORDER BY to 
the remote in GetForeignUpperPaths() for each of upper relations below 
UPPERREL_ORDERED (ie, UPPERREL_GROUP_AGG, UPPERREL_WINDOW, and 
UPPERREL_DISTINCT); I think that's a job for GetForeignUpperPaths() for 
UPPERREL_ORDERED, though the division of labor would be arbitrary. 
However, I think it's a good thing that there is a room for considering 
remote sorts even in GetForeignUpperPaths() for UPPERREL_GROUP_AGG, 
because some remote sorts might be useful to process its upper relation. 
 Consider this using postgres_fdw:


postgres=# explain verbose select distinct count(a) from ft1 group by b;
   QUERY PLAN

 Unique  (cost=121.47..121.52 rows=10 width=12)
   Output: (count(a)), b
   ->  Sort  (cost=121.47..121.49 rows=10 width=12)
 Output: (count(a)), b
 Sort Key: (count(ft1.a))
 ->  Foreign Scan  (cost=105.00..121.30 rows=10 width=12)
   Output: (count(a)), b
   Relations: Aggregate on (public.ft1)
   Remote SQL: SELECT count(a), b FROM public.t1 GROUP BY 2
(9 rows)

For this query it might be useful to push down the sort on top of the 
foreign scan.  To allow that, we should allow GetForeignUpperPaths() for 
UPPERREL_GROUP_AGG to consider that sort pushdown.  (We would soon 
implement the SELECT DISTINCT pushdown for postgres_fdw, and if so, we 
would no longer need to consider that sort pushdown in 
postgresGetForeignUpperPaths() for UPPERREL_GROUP_AGG, but I don't think 
all FDWs can have the ability to push down SELECT DISTINCT to the remote...)


Best regards,
Etsuro Fujita




Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-11 Thread Dean Rasheed
On Sun, 10 Mar 2019 at 17:36, Tomas Vondra  wrote:
> On 3/10/19 2:09 PM, Dean Rasheed wrote:
> > 14). The attnums Bitmapset passed to
> > statext_is_compatible_clause_internal() is an input/output argument
> > that it updates. That should probably be documented. When it calls
> > itself recursively for AND/OR/NOT clauses, it could just pass the
> > original Bitmapset through to be updated (rather than creating a new
> > one and merging it), as it does for other types of clause.
>
> I don't think it's really possible, because the AND/OR/NOT clause is
> considered compatible only when all the pieces are compatible. So we
> can't tweak the original bitmapset directly in case the incompatible
> clause is not the very first one.
>

In the case where the overall clause is incompatible, you don't
actually care about the attnums returned. Right now it will return an
empty set (NULL). With this change it would return all the attnums
encountered before the incompatible piece, but that wouldn't matter.
In fact, you could easily preserve the current behaviour just by
having the outer statext_is_compatible_clause() function set attnums
back to NULL if the result is false.

Regards,
Dean



Re: psql show URL with help

2019-03-11 Thread Peter Eisentraut
On 2019-03-08 16:11, David Fetter wrote:
>> The outcome of that is exactly what my patch does, but the inputs are
>> different.  We have PG_MAJORVERSION, which is always a single integer,
>> and PG_VERSION, which could be 10.9.8 or 11beta5 or 12devel.  The patch does
>>
>> if (PG_VERSION ends with 'devel')
>>   return /docs/devel/
>> else
>>  return /docs/$PG_MAJORVERSION/
>>
>> There is no third case.  Your third case of not-numeric-and-not-devel is
>> correctly covered by the else branch.
> 
> Thanks for helping me understand.

Committed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-03-11 Thread Masahiko Sawada
On Thu, Feb 21, 2019 at 3:05 PM Pavan Deolasee  wrote:
>
> Hi,
>
> Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly 
> while loading data via COPY FREEZE and had also posted a draft patch.
>
> I now have what I think is a more complete patch. I took a slightly different 
> approach and instead of setting PD_ALL_VISIBLE bit initially and then not 
> clearing it during insertion, we now recheck the page for all-frozen, 
> all-visible tuples just before switching to a new page. This allows us to 
> then also mark set the visibility map bit, like we do in vacuumlazy.c

I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: I have some troubles to run test_shm_mq;

2019-03-11 Thread Thomas Munro
On Mon, Mar 11, 2019 at 8:59 PM Andy Fan  wrote:
> 4.   CREATE EXTENSION test_shm_mq;  ==> .  could not open extension control 
> file "/.../share/postgresql/extension/test_shm_mq.control": No such file or 
> directory
>
> how can I get it work?  Thanks

Hi Andy,

Try this first:

cd src/test/modules/test_shm_mq
make install

-- 
Thomas Munro
https://enterprisedb.com



I have some troubles to run test_shm_mq;

2019-03-11 Thread Andy Fan
My code is based on commit

zhifan@zhifandeMacBook-Pro ~/g/polardb_clean> git log
commit d06fe6ce2c79420fd19ac89ace81b66579f08493
Author: Tom Lane 
Date:   Tue Nov 6 18:56:26 2018 -0500

what I did includes:
1.   ./configure --enable-debug
2.   make world  // doesn't see the test_shm_mq on the output
3.   make install-world // doesn't see the test_shm_mq.control under the
install directory.
4.   CREATE EXTENSION test_shm_mq;  ==> .  could not open extension control
file "/.../share/postgresql/extension/test_shm_mq.control": No such file or
directory

how can I get it work?  Thanks


Fix volatile vs. pointer confusion

2019-03-11 Thread Peter Eisentraut
Variables used after a longjmp() need to be declared volatile.  In
case of a pointer, it's the pointer itself that needs to be declared
volatile, not the pointed-to value.  So we need

PyObject *volatile items;

instead of

volatile PyObject *items;  /* wrong */

Attached patch fixes a couple of cases of that.  Most instances were
already correct.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 1bb6acf63cf796d0659283c7ea5220385f7de181 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Mar 2019 08:18:55 +0100
Subject: [PATCH] Fix volatile vs. pointer confusion

Variables used after a longjmp() need to be declared volatile.  In
case of a pointer, it's the pointer itself that needs to be declared
volatile, not the pointed-to value.  So we need

PyObject *volatile items;

instead of

volatile PyObject *items;  /* wrong */
---
 contrib/hstore_plpython/hstore_plpython.c | 9 -
 contrib/jsonb_plpython/jsonb_plpython.c   | 9 +++--
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/contrib/hstore_plpython/hstore_plpython.c 
b/contrib/hstore_plpython/hstore_plpython.c
index 2f24090ff3..93c39d294d 100644
--- a/contrib/hstore_plpython/hstore_plpython.c
+++ b/contrib/hstore_plpython/hstore_plpython.c
@@ -128,7 +128,7 @@ Datum
 plpython_to_hstore(PG_FUNCTION_ARGS)
 {
PyObject   *dict;
-   volatile PyObject *items_v = NULL;
+   PyObject *volatile items = NULL;
int32   pcount;
HStore *out;
 
@@ -139,14 +139,13 @@ plpython_to_hstore(PG_FUNCTION_ARGS)
 errmsg("not a Python mapping")));
 
pcount = PyMapping_Size(dict);
-   items_v = PyMapping_Items(dict);
+   items = PyMapping_Items(dict);
 
PG_TRY();
{
int32   buflen;
int32   i;
Pairs  *pairs;
-   PyObject   *items = (PyObject *) items_v;
 
pairs = palloc(pcount * sizeof(*pairs));
 
@@ -177,14 +176,14 @@ plpython_to_hstore(PG_FUNCTION_ARGS)
pairs[i].isnull = false;
}
}
-   Py_DECREF(items_v);
+   Py_DECREF(items);
 
pcount = hstoreUniquePairs(pairs, pcount, );
out = hstorePairs(pairs, pcount, buflen);
}
PG_CATCH();
{
-   Py_DECREF(items_v);
+   Py_DECREF(items);
PG_RE_THROW();
}
PG_END_TRY();
diff --git a/contrib/jsonb_plpython/jsonb_plpython.c 
b/contrib/jsonb_plpython/jsonb_plpython.c
index f44d364c97..1bc984d5c4 100644
--- a/contrib/jsonb_plpython/jsonb_plpython.c
+++ b/contrib/jsonb_plpython/jsonb_plpython.c
@@ -237,17 +237,14 @@ PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState 
**jsonb_state)
JsonbValue *out = NULL;
 
/* We need it volatile, since we use it after longjmp */
-   volatile PyObject *items_v = NULL;
+   PyObject *volatile items = NULL;
 
pcount = PyMapping_Size(obj);
-   items_v = PyMapping_Items(obj);
+   items = PyMapping_Items(obj);
 
PG_TRY();
{
Py_ssize_t  i;
-   PyObject   *items;
-
-   items = (PyObject *) items_v;
 
pushJsonbValue(jsonb_state, WJB_BEGIN_OBJECT, NULL);
 
@@ -279,7 +276,7 @@ PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState 
**jsonb_state)
}
PG_CATCH();
{
-   Py_DECREF(items_v);
+   Py_DECREF(items);
PG_RE_THROW();
}
PG_END_TRY();
-- 
2.21.0



Re: BUG #15668: Server crash in transformPartitionRangeBounds

2019-03-11 Thread Michael Paquier
On Mon, Mar 11, 2019 at 03:44:39PM +0900, Amit Langote wrote:
> We could make the error message more meaningful depending on the context,
> but maybe it'd better be pursue it as a separate project.

Yeah, I noticed that stuff when working on it this afternoon.  The
error message does not completely feel right even in your produced
tests.  Out of curiosity I have been working on this thing myself,
and it is possible to have a context-related message.  Please see
attached, that's in my opinion less confusing, and of course
debatable.  Still this approach does not feel completely right either
as that means hijacking the code path which generates a generic
message for missing RTEs. :(
--
Michael
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index f3b6d193aa..6b248f116f 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -3259,6 +3259,9 @@ errorMissingRTE(ParseState *pstate, RangeVar *relation)
 	 * If we found a match that doesn't meet those criteria, assume the
 	 * problem is illegal use of a relation outside its scope, as in the
 	 * MySQL-ism "SELECT ... FROM a, b LEFT JOIN c ON (a.x = c.y)".
+	 *
+	 * Also, in the context of parsing a partition bound, produce a more
+	 * helpful error message.
 	 */
 	if (rte && rte->alias &&
 		strcmp(rte->eref->aliasname, relation->relname) != 0 &&
@@ -3267,7 +3270,12 @@ errorMissingRTE(ParseState *pstate, RangeVar *relation)
 			 _up) == rte)
 		badAlias = rte->eref->aliasname;
 
-	if (rte)
+	if (pstate->p_expr_kind == EXPR_KIND_PARTITION_BOUND)
+		ereport(ERROR,
+(errcode(ERRCODE_UNDEFINED_TABLE),
+ errmsg("invalid reference in partition bound expression for table \"%s\"",
+		relation->relname)));
+	else if (rte)
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_TABLE),
  errmsg("invalid reference to FROM-clause entry for table \"%s\"",
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index a37d1f18be..ee129ba18f 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -3750,8 +3750,16 @@ transformPartitionRangeBounds(ParseState *pstate, List *blist,
 IsA(linitial(cref->fields), String))
 cname = strVal(linitial(cref->fields));
 
-			Assert(cname != NULL);
-			if (strcmp("minvalue", cname) == 0)
+			if (cname == NULL)
+			{
+/*
+ * No field names have been found, meaning that there
+ * is not much to do with special value handling.  Instead
+ * let the expression transformation handle any errors and
+ * limitations.
+ */
+			}
+			else if (strcmp("minvalue", cname) == 0)
 			{
 prd = makeNode(PartitionRangeDatum);
 prd->kind = PARTITION_RANGE_DATUM_MINVALUE;
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index d51e547278..0b3c9fb1ff 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -494,6 +494,10 @@ CREATE TABLE part_bogus_expr_fail PARTITION OF list_parted FOR VALUES IN (somena
 ERROR:  column "somename" does not exist
 LINE 1: ...expr_fail PARTITION OF list_parted FOR VALUES IN (somename);
  ^
+CREATE TABLE part_bogus_expr_fail PARTITION OF list_parted FOR VALUES IN (somename.somename);
+ERROR:  invalid reference in partition bound expression for table "somename"
+CREATE TABLE part_bogus_expr_fail PARTITION OF list_parted FOR VALUES IN (somename.somename.somename);
+ERROR:  invalid reference in partition bound expression for table "somename"
 CREATE TABLE part_bogus_expr_fail PARTITION OF list_parted FOR VALUES IN (a);
 ERROR:  cannot use column references in partition bound expression
 LINE 1: ..._bogus_expr_fail PARTITION OF list_parted FOR VALUES IN (a);
@@ -558,6 +562,33 @@ DROP TABLE bigintp;
 CREATE TABLE range_parted (
 	a date
 ) PARTITION BY RANGE (a);
+-- forbidden expressions for partition bounds
+CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
+  FOR VALUES FROM (a) TO (1);
+ERROR:  cannot use column references in partition bound expression
+LINE 2:   FOR VALUES FROM (a) TO (1);
+   ^
+CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
+  FOR VALUES FROM (somename) TO (1);
+ERROR:  column "somename" does not exist
+LINE 2:   FOR VALUES FROM (somename) TO (1);
+   ^
+CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
+  FOR VALUES FROM (somename.somename) TO (1);
+ERROR:  invalid reference in partition bound expression for table "somename"
+CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
+  FOR VALUES FROM (somename.somename.somename) TO (1);
+ERROR:  invalid reference in partition bound expression for table "somename"
+CREATE TABLE part_bogus_expr_fail PARTITION OF range_parted
+  FOR VALUES FROM ((select 1)) TO (10);
+ERROR:  cannot use subquery in partition bound
+LINE 2:   FOR VALUES 

Re: BUG #15668: Server crash in transformPartitionRangeBounds

2019-03-11 Thread Amit Langote
Hi,

On 2019/03/06 17:27, Michael Paquier wrote:
> On Wed, Mar 06, 2019 at 04:00:42PM +0900, Amit Langote wrote:
>> Thanks for looking at this.  Your patch seems better, because it allows us
>> to keep the error message consistent with the message one would get with
>> list-partitioned syntax.
> 
> Thanks for confirming.  I think that it would be nice as well to add
> more test coverage for such error patterns with all the strategies.
> It would be good to fix that first, so I can take care of that.

I've added some tests to your patch.  Also improved the comments a bit.

I noticed another issue with the code -- it's using strcmp() to compare
specified string against "minvalue" and "maxvalue", which causes the
following silly error:

create table q2 partition of q for values from ("MINVALUE") to (maxvalue);
ERROR:  column "MINVALUE" does not exist
LINE 1: create table q2 partition of q for values from ("MINVALUE") ...

It should be using pg_strncasecmp().

> Now I don't really find the error "missing FROM-clause entry for
> table" quite convincing when this is applied to a partition bound when
> using a column defined in the relation.  Adding more error classes in
> the set of CRERR_NO_RTE would be perhaps nice, still I am not sure how
> elegant it could be made when looking at expressions for partition
> bounds.

Note that this is not just a problem for partition bounds.  You can see it
with default expressions too.

create table foo (a int default (aa.a));
ERROR:  missing FROM-clause entry for table "aa"
LINE 1: create table foo (a int default (aa.a));

create table foo (a int default (a.a.aa.a.a.a.a.aa));
ERROR:  improper qualified name (too many dotted names): a.a.aa.a.a.a.a.aa
LINE 1: create table foo (a int default (a.a.aa.a.a.a.a.aa));

We could make the error message more meaningful depending on the context,
but maybe it'd better be pursue it as a separate project.

Thanks,
Amit
diff --git a/src/backend/parser/parse_utilcmd.c 
b/src/backend/parser/parse_utilcmd.c
index a37d1f18be..67b05b8039 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -3746,18 +3746,28 @@ transformPartitionRangeBounds(ParseState *pstate, List 
*blist,
ColumnRef *cref = (ColumnRef *) expr;
char *cname = NULL;
 
+   /*
+* There should be a single field named "minvalue" or 
"maxvalue".
+*/
if (list_length(cref->fields) == 1 &&
IsA(linitial(cref->fields), String))
cname = strVal(linitial(cref->fields));
 
-   Assert(cname != NULL);
-   if (strcmp("minvalue", cname) == 0)
+   if (cname == NULL)
+   {
+   /*
+* ColumnRef is not in the desired 
single-field-name form; for
+* consistency, let transformExpr() report the 
error rather
+* than doing it ourselves.
+*/
+   }
+   else if (pg_strncasecmp(cname, "minvalue", 6) == 0)
{
prd = makeNode(PartitionRangeDatum);
prd->kind = PARTITION_RANGE_DATUM_MINVALUE;
prd->value = NULL;
}
-   else if (strcmp("maxvalue", cname) == 0)
+   else if (pg_strncasecmp(cname, "maxvalue", 6) == 0)
{
prd = makeNode(PartitionRangeDatum);
prd->kind = PARTITION_RANGE_DATUM_MAXVALUE;
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index d51e547278..503dca5866 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -709,6 +709,66 @@ ERROR:  modulus for hash partition must be a positive 
integer
 -- remainder must be greater than or equal to zero and less than modulus
 CREATE TABLE fail_part PARTITION OF hash_parted2 FOR VALUES WITH (MODULUS 8, 
REMAINDER 8);
 ERROR:  remainder for hash partition must be less than modulus
+-- more tests for generalized expression syntax for list/range partition bounds
+CREATE TABLE list_parted_bound_exprs(a int) PARTITION BY LIST (a);
+CREATE TABLE list_parted_bound_exprs1 PARTITION OF list_parted_bound_exprs FOR 
VALUES IN ("UNKNOWN");
+ERROR:  column "UNKNOWN" does not exist
+LINE 1: ...RTITION OF list_parted_bound_exprs FOR VALUES IN ("UNKNOWN")...
+ ^
+CREATE TABLE list_parted_bound_exprs1 PARTITION OF list_parted_bound_exprs FOR 
VALUES IN (a);
+ERROR:  cannot use column references in partition bound expression
+LINE 1: ...prs1 

Re: A separate table level option to control compression

2019-03-11 Thread Masahiko Sawada
On Wed, Feb 6, 2019 at 4:32 PM Pavan Deolasee  wrote:
>
> Hello,
>
> Currently either the table level option `toast_tuple_target` or the compile 
> time default `TOAST_TUPLE_TARGET` is used to decide whether a new tuple 
> should be compressed or not. While this works reasonably well for most 
> situations, at times the user may not want to pay the overhead of toasting, 
> yet take benefits of inline compression.
>
> I would like to propose a new table level option, compress_tuple_target, 
> which can be set independently of toast_tuple_target, and is checked while 
> deciding whether to compress the new tuple or not.
>
> For example,
>
> CREATE TABLE compresstest250 (a int, b text) WITH (compress_tuple_target = 
> 250);
> CREATE TABLE compresstest2040 (a int, b text) WITH (compress_tuple_target = 
> 2040);
>
> -- shouldn't get compressed nor toasted
> INSERT INTO compresstest250 VALUES (1, repeat('1234567890',20));
>
> -- should get compressed, but not toasted
> INSERT INTO compresstest250 VALUES (2, repeat('1234567890',30));
>
> -- shouldn't get compressed nor toasted
> INSERT INTO compresstest2040 VALUES (1, repeat('1234567890',20));
> INSERT INTO compresstest2040 VALUES (2, repeat('1234567890',30));
>
> Without this patch, the second INSERT will not compress the tuple since its 
> length is less than the toast threshold. With the patch and after setting 
> table level option, one can compress such tuples.
>
> The attached patch implements this idea.
>

I like this idea.

The patch seems to need update the part describing on-disk toast
storage in storage.sgml.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center