Re: Finer grain log timestamps

2022-06-13 Thread David Fetter
On Mon, Jun 13, 2022 at 04:22:42PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Sun, May 8, 2022 at 4:45 PM David Fetter  wrote:
> >> Please find attached a patch to change the sub-second granularity of
> >> log timestamps from milliseconds to  microseconds.
> 
> > Why is this a good idea?
> 
> I can imagine that some people would have a use for microsecond
> resolution in log files, and I can also imagine that as machines
> get faster more people will want that.

Your imagination matches situations I've seen in production where
there was some ambiguity as to what happened when inside a millisecond
boundary, and I'm sure I'm not alone in this.  I've gotten this
request from at least three people who to my knowledge knew nothing
about each other, and as I recall, the first time someone brought it
up to me was over five years back.

> As against that, this will bloat log files by a non-microscopic
> amount, and it's pretty likely to break some log-scanning tools too.

Three bytes per line, and log-scanning parsers that finicky are
already breaking all the time, respectively.

> It's unclear to me that that's a tradeoff we should force on
> everyone.

The tradeoff we're forcing on people at the moment is a loss of
precision they didn't ask for, implemented by some extra instructions
they didn't ask us to execute in a part of the code that's a hot path
at exactly the times when the machine is busiest.

> I think a proposal less likely to have push-back would be to invent
> a different log_line_prefix %-escape to produce microseconds.
> Sadly, "%u" is already taken, but perhaps we could use "%U"?
> 
> A different line of thought is to extend %t to provide a precision
> field a la sprintf, so that for example "%.3t" is equivalent to
> "%m" and "%.6t" does what David wants, and we won't have to
> search for a new escape letter when the day arrives that
> somebody wants nanosecond resolution.  The same could be done
> with %n, avoiding the need to find a different escape letter
> for that.

I'll build this more sprintf-like thing if not doing so prevents the
change from happening, but frankly, I don't really see a point in it
because the next "log timestamps at some random negative power of 10
second granularity" requirement I see will be the first.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parse CE and BCE in dates and times

2022-06-13 Thread David Fetter
On Mon, Jun 13, 2022 at 09:11:56AM +0200, Erik Rijkers wrote:
> Op 13-06-2022 om 07:51 schreef David Fetter:
> > Folks,
> > 
> > Please find attached a patch to do $Subject. As dates in a fair number
> > of fields of endeavor are expressed this way, it seems reasonable to
> > ensure tha we can parse them on input. Making it possible to use them
> > in output is a more invasive patch, and would involve changes to
> > to_date and similar that would require careful consideration.
> 
> Hi David,
> 
> I find some unexpected results:
> 
> # select '112-04-30 BC'::date;
>  date
> ---
>  0112-04-30 BC
> (1 row)
> 
> but the same with the ' BCE' suffix seems broken:
> 
> # select '112-04-30 BCE'::date;
> ERROR:  invalid input syntax for type date: "112-04-30 BCE"
> LINE 1: select '112-04-30 BCE'::date;
> 
> The same goes for '112-04-30 AD' (works) and its CE version (errors out).
> 
> Or is this as expected?

It's not, and thanks for looking at this.  Will check to see what's
going on here.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Finer grain log timestamps

2022-06-13 Thread David Fetter
On Mon, May 09, 2022 at 11:21:26AM +0100, Dagfinn Ilmari Mannsåker wrote:
> David Fetter  writes:
> 
> > diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c
> > index 55ee5423af..4698e32ab7 100644
> > --- src/backend/utils/error/elog.c
> > +++ src/backend/utils/error/elog.c
> > @@ -2295,7 +2295,7 @@ char *
> >  get_formatted_log_time(void)
> >  {
> > pg_time_t   stamp_time;
> > -   charmsbuf[13];
> > +   charmsbuf[16];
> 
> Now that it holds microseconds (µs), not milliseconds (ms), should it
> not be renamed to `usbuf`?

Good point.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 9283d9ae4d8d10876aee1da0753e7db4257e7a11 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 13 Jun 2022 00:01:04 -0700
Subject: [PATCH v3] Change log timestamps from milli- to microseconds

---
 doc/src/sgml/config.sgml   | 12 ++--
 doc/src/sgml/file-fdw.sgml |  2 +-
 src/backend/utils/error/elog.c | 14 +++---
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml
index 5b7ce6531d..adc4fc090a 100644
--- doc/src/sgml/config.sgml
+++ doc/src/sgml/config.sgml
@@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql
 
 
  %t
- Time stamp without milliseconds
+ Time stamp at second resolution
  no
 
 
  %m
- Time stamp with milliseconds
+ Time stamp with microseconds
  no
 
 
  %n
- Time stamp with milliseconds (as a Unix epoch)
+ Time stamp with microseconds (as a Unix epoch)
  no
 
 
@@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 This option emits log lines in comma-separated-values
 (CSV) format,
 with these columns:
-time stamp with milliseconds,
+time stamp with microseconds,
 user name,
 database name,
 process ID,
@@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 
 CREATE TABLE postgres_log
 (
-  log_time timestamp(3) with time zone,
+  log_time timestamp(6) with time zone,
   user_name text,
   database_name text,
   process_id integer,
@@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 
  timestamp
  string
- Time stamp with milliseconds
+ Time stamp with microseconds
 
 
  user
diff --git doc/src/sgml/file-fdw.sgml doc/src/sgml/file-fdw.sgml
index 5b98782064..c3efcbd679 100644
--- doc/src/sgml/file-fdw.sgml
+++ doc/src/sgml/file-fdw.sgml
@@ -242,7 +242,7 @@ CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;
 
 
 CREATE FOREIGN TABLE pglog (
-  log_time timestamp(3) with time zone,
+  log_time timestamp(6) with time zone,
   user_name text,
   database_name text,
   process_id integer,
diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c
index 55ee5423af..1ee49ce2be 100644
--- src/backend/utils/error/elog.c
+++ src/backend/utils/error/elog.c
@@ -2295,7 +2295,7 @@ char *
 get_formatted_log_time(void)
 {
 	pg_time_t	stamp_time;
-	char		msbuf[13];
+	char		usbuf[16];
 
 	/* leave if already computed */
 	if (formatted_log_time[0] != '\0')
@@ -2315,13 +2315,13 @@ get_formatted_log_time(void)
 	 * nonempty or CSV mode can be selected.
 	 */
 	pg_strftime(formatted_log_time, FORMATTED_TS_LEN,
-	/* leave room for milliseconds... */
-"%Y-%m-%d %H:%M:%S %Z",
+	/* leave room for microseconds... */
+"%Y-%m-%d %H:%M:%S%Z",
 pg_localtime(_time, log_timezone));
 
 	/* 'paste' milliseconds into place... */
-	sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));
-	memcpy(formatted_log_time + 19, msbuf, 4);
+	sprintf(usbuf, ".%06d", saved_timeval.tv_usec );
+	memcpy(formatted_log_time + 19, usbuf, 7);
 
 	return formatted_log_time;
 }
@@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		saved_timeval_set = true;
 	}
 
-	snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d",
+	snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d",
 			 (long) saved_timeval.tv_sec,
-			 (int) (saved_timeval.tv_usec / 1000));
+			 saved_timeval.tv_usec);
 
 	if (padding != 0)
 		appendStringInfo(buf, "%*s", padding, strfbuf);
-- 
2.36.1



Parse CE and BCE in dates and times

2022-06-12 Thread David Fetter
Folks,

Please find attached a patch to do $Subject. As dates in a fair number
of fields of endeavor are expressed this way, it seems reasonable to
ensure tha we can parse them on input. Making it possible to use them
in output is a more invasive patch, and would involve changes to
to_date and similar that would require careful consideration.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 00654ad8707fef0c9bcb7a8b6a8073d1f143368a Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 8 Jun 2021 23:44:19 -0700
Subject: [PATCH v1] Common Era for dates

---
 doc/src/sgml/datatype.sgml | 25 
 doc/src/sgml/datetime.sgml |  4 +-
 doc/src/sgml/func.sgml | 16 +++---
 src/backend/utils/adt/formatting.c | 80 +++---
 src/test/regress/expected/date.out | 72 +++
 src/test/regress/expected/horology.out |  4 +-
 src/test/regress/sql/date.sql  |  4 +-
 7 files changed, 134 insertions(+), 71 deletions(-)

diff --git doc/src/sgml/datatype.sgml doc/src/sgml/datatype.sgml
index 8e30b82273..d0ca9e500f 100644
--- doc/src/sgml/datatype.sgml
+++ doc/src/sgml/datatype.sgml
@@ -1740,24 +1740,24 @@ SELECT 'abc \153\154\155 \052\251\124'::bytea;
 timestamp [ (p) ] [ without time zone ]
 8 bytes
 both date and time (no time zone)
-4713 BC
-294276 AD
+4713 BC[E]
+294276 AD/CE
 1 microsecond


 timestamp [ (p) ] with time zone
 8 bytes
 both date and time, with time zone
-4713 BC
-294276 AD
+4713 BC[E]
+294276 AD/CE
 1 microsecond


 date
 4 bytes
 date (no time of day)
-4713 BC
-5874897 AD
+4713 BC[E]
+5874897 AD/CE
 1 day


@@ -2168,23 +2168,24 @@ MINUTE TO SECOND
  
   Valid input for the time stamp types consists of the concatenation
   of a date and a time, followed by an optional time zone,
-  followed by an optional AD or BC.
-  (Alternatively, AD/BC can appear
-  before the time zone, but this is not the preferred ordering.)
+  followed by an optional AD, CE,
+  BC, or BCE
+  (Alternatively, AD/CE/BC/BCE
+  can appear before the time zone, but this is not the preferred ordering.)
   Thus:
 
 
-1999-01-08 04:05:06
+2021-01-08 04:05:06
 
   and:
 
-1999-01-08 04:05:06 -8:00
+2021-01-08 04:05:06 -8:00
 
 
   are valid values, which follow the ISO 8601
   standard.  In addition, the common format:
 
-January 8 04:05:06 1999 PST
+January 8 04:05:06 2021 PST
 
   is supported.
  
diff --git doc/src/sgml/datetime.sgml doc/src/sgml/datetime.sgml
index ecc3245a46..6266328107 100644
--- doc/src/sgml/datetime.sgml
+++ doc/src/sgml/datetime.sgml
@@ -167,8 +167,8 @@
 
   

-Gregorian years AD 199 can be entered by using 4 digits with leading
-zeros (e.g., 0099 is AD 99).
+Gregorian years AD/CE 199 can be entered by using 4 digits with leading
+zeros (e.g., 0099 is AD/CE 99).

   
  
diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 7b652460a1..3fcb9793e7 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -7864,13 +7864,13 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 last digit of ISO 8601 week-numbering year


-BC, bc,
-AD or ad
+BC, bc, BCE, bce
+AD, ad, CE, ce
 era indicator (without periods)


-B.C., b.c.,
-A.D. or a.d.
+B.C., b.c., B.C.E., b.c.e.
+A.D., a.d., C.E., or c.e.
 era indicator (with periods)


@@ -9849,13 +9849,15 @@ SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40');
 
 

-The first century starts at 0001-01-01 00:00:00 AD, although
-they did not know it at the time. This definition applies to all
+The first century starts at 0001-01-01 00:00:00 AD/CE, although
+the calendar did not exist at the time. This definition applies to all
 Gregorian calendar countries. There is no century number 0,
 you go from -1 century to 1 century.
 
 If you disagree with this, please write your complaint to:
-Pope, Cathedral Saint-Peter of Roma, Vatican.
+His Holiness, Pope Francis
+Apostolic Palace
+00120 Vatican City

   
  
diff --git src/backend/utils/adt/formatting.c src/backend/utils/adt/formatting.c
index e909c1a200..11876dbd19 100644
--- src/backend/utils/adt/formatting.c
+++ src/backend/utils/adt/formatting.c
@@ -238,11 +238,19 @@ static const char *const days_short[] = {
 #define a_d_STR		"a.d."
 #define AD_STR		"AD"
 #define ad_STR

Re: Finer grain log timestamps

2022-05-08 Thread David Fetter
On Sun, May 08, 2022 at 04:12:27PM -0500, Justin Pryzby wrote:
> On Sun, May 08, 2022 at 08:44:51PM +0000, David Fetter wrote:
> >  CREATE TABLE postgres_log
> >  (
> > -  log_time timestamp(3) with time zone,
> > +  log_time timestamp(6) with time zone,
> 
> Please also update the corresponding thing in doc/src/sgml/file-fdw.sgml

Thanks for looking this over, and please find attached the next
version.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f6772fc0378879fd5c7fbf8cb320ee68f66341d2 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 8 May 2022 13:24:33 -0700
Subject: [PATCH v2] Change log timestamps from milli- to microseconds

---
 doc/src/sgml/config.sgml   | 12 ++--
 doc/src/sgml/file-fdw.sgml |  2 +-
 src/backend/utils/error/elog.c | 14 +++---
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml
index 03986946a8..f00db09547 100644
--- doc/src/sgml/config.sgml
+++ doc/src/sgml/config.sgml
@@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql
 
 
  %t
- Time stamp without milliseconds
+ Time stamp at second resolution
  no
 
 
  %m
- Time stamp with milliseconds
+ Time stamp with microseconds
  no
 
 
  %n
- Time stamp with milliseconds (as a Unix epoch)
+ Time stamp with microseconds (as a Unix epoch)
  no
 
 
@@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 This option emits log lines in comma-separated-values
 (CSV) format,
 with these columns:
-time stamp with milliseconds,
+time stamp with microseconds,
 user name,
 database name,
 process ID,
@@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 
 CREATE TABLE postgres_log
 (
-  log_time timestamp(3) with time zone,
+  log_time timestamp(6) with time zone,
   user_name text,
   database_name text,
   process_id integer,
@@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 
  timestamp
  string
- Time stamp with milliseconds
+ Time stamp with microseconds
 
 
  user
diff --git doc/src/sgml/file-fdw.sgml doc/src/sgml/file-fdw.sgml
index 5b98782064..c3efcbd679 100644
--- doc/src/sgml/file-fdw.sgml
+++ doc/src/sgml/file-fdw.sgml
@@ -242,7 +242,7 @@ CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;
 
 
 CREATE FOREIGN TABLE pglog (
-  log_time timestamp(3) with time zone,
+  log_time timestamp(6) with time zone,
   user_name text,
   database_name text,
   process_id integer,
diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c
index 55ee5423af..4698e32ab7 100644
--- src/backend/utils/error/elog.c
+++ src/backend/utils/error/elog.c
@@ -2295,7 +2295,7 @@ char *
 get_formatted_log_time(void)
 {
 	pg_time_t	stamp_time;
-	char		msbuf[13];
+	char		msbuf[16];
 
 	/* leave if already computed */
 	if (formatted_log_time[0] != '\0')
@@ -2315,13 +2315,13 @@ get_formatted_log_time(void)
 	 * nonempty or CSV mode can be selected.
 	 */
 	pg_strftime(formatted_log_time, FORMATTED_TS_LEN,
-	/* leave room for milliseconds... */
-"%Y-%m-%d %H:%M:%S %Z",
+	/* leave room for microseconds... */
+"%Y-%m-%d %H:%M:%S%Z",
 pg_localtime(_time, log_timezone));
 
 	/* 'paste' milliseconds into place... */
-	sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));
-	memcpy(formatted_log_time + 19, msbuf, 4);
+	sprintf(msbuf, ".%06d", saved_timeval.tv_usec );
+	memcpy(formatted_log_time + 19, msbuf, 7);
 
 	return formatted_log_time;
 }
@@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		saved_timeval_set = true;
 	}
 
-	snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d",
+	snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d",
 			 (long) saved_timeval.tv_sec,
-			 (int) (saved_timeval.tv_usec / 1000));
+			 saved_timeval.tv_usec);
 
 	if (padding != 0)
 		appendStringInfo(buf, "%*s", padding, strfbuf);
-- 
2.35.1



Finer grain log timestamps

2022-05-08 Thread David Fetter
Folks,

Please find attached a patch to change the sub-second granularity of
log timestamps from milliseconds to  microseconds.

I started out working on a longer patch that will give people
more choices than whole seconds and microseconds, but there were a lot
of odd corner cases, including what I believe might have been a
requirement for C11, should be wish to get sub-microsecond
granularity.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f4881d1669526597bdf608c7c59858f88314f8d1 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 8 May 2022 13:24:33 -0700
Subject: [PATCH v1] Change log timestamps from milli- to microseconds

---
 doc/src/sgml/config.sgml   | 12 ++--
 src/backend/utils/error/elog.c | 14 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml
index 03986946a8..f00db09547 100644
--- doc/src/sgml/config.sgml
+++ doc/src/sgml/config.sgml
@@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql
 
 
  %t
- Time stamp without milliseconds
+ Time stamp at second resolution
  no
 
 
  %m
- Time stamp with milliseconds
+ Time stamp with microseconds
  no
 
 
  %n
- Time stamp with milliseconds (as a Unix epoch)
+ Time stamp with microseconds (as a Unix epoch)
  no
 
 
@@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 This option emits log lines in comma-separated-values
 (CSV) format,
 with these columns:
-time stamp with milliseconds,
+time stamp with microseconds,
 user name,
 database name,
 process ID,
@@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 
 CREATE TABLE postgres_log
 (
-  log_time timestamp(3) with time zone,
+  log_time timestamp(6) with time zone,
   user_name text,
   database_name text,
   process_id integer,
@@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 
  timestamp
  string
- Time stamp with milliseconds
+ Time stamp with microseconds
 
 
  user
diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c
index 55ee5423af..4698e32ab7 100644
--- src/backend/utils/error/elog.c
+++ src/backend/utils/error/elog.c
@@ -2295,7 +2295,7 @@ char *
 get_formatted_log_time(void)
 {
 	pg_time_t	stamp_time;
-	char		msbuf[13];
+	char		msbuf[16];
 
 	/* leave if already computed */
 	if (formatted_log_time[0] != '\0')
@@ -2315,13 +2315,13 @@ get_formatted_log_time(void)
 	 * nonempty or CSV mode can be selected.
 	 */
 	pg_strftime(formatted_log_time, FORMATTED_TS_LEN,
-	/* leave room for milliseconds... */
-"%Y-%m-%d %H:%M:%S %Z",
+	/* leave room for microseconds... */
+"%Y-%m-%d %H:%M:%S%Z",
 pg_localtime(_time, log_timezone));
 
 	/* 'paste' milliseconds into place... */
-	sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));
-	memcpy(formatted_log_time + 19, msbuf, 4);
+	sprintf(msbuf, ".%06d", saved_timeval.tv_usec );
+	memcpy(formatted_log_time + 19, msbuf, 7);
 
 	return formatted_log_time;
 }
@@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		saved_timeval_set = true;
 	}
 
-	snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d",
+	snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d",
 			 (long) saved_timeval.tv_sec,
-			 (int) (saved_timeval.tv_usec / 1000));
+			 saved_timeval.tv_usec);
 
 	if (padding != 0)
 		appendStringInfo(buf, "%*s", padding, strfbuf);
-- 
2.35.1



Re: Should we support new definition for Identity column : GENERATED BY DEFAULT ON NULL?

2021-11-03 Thread David Fetter
On Wed, Nov 03, 2021 at 11:07:58AM +0100, Vik Fearing wrote:
> On 11/2/21 12:19 PM, Himanshu Upadhyaya wrote:
> > Hi,
> > 
> > Trying to insert NULL value to the Identity column defined by "GENERATED BY
> > DEFAULT" is disallowed, but there can be use cases where the user would
> > like to have an identity column where manual NULL insertion is required(and
> > it should not error-out by Postgres).
> 
> What could possibly be the use case for this?

Unfortunately, the PREPARE/EXECUTE infrastructure, and not just for
PostgreSQL, has no way of passing along DEFAULT explicitly, i.e. it
can only explicitly pass literals and NULL, so people come up with
workarounds like this. I keep saying "explicitly" because one of the
workarounds is to pass DEFAULT by omission in the target list, which
is not a great way to handle anything, being a communication by
silence and all. I'm thinking that lack is the problem we should
actually address.

> > Thoughts?
> 
> I don't like it.

Neither do I, but I do get some of the motivation behind it. I don't
suppose the standard actually says much about PREPARE, or whether we
should care about anything it happens to say that gets in the way of
doing something more helpful.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: PATCH: psql tab completion for SELECT

2021-10-08 Thread David Fetter
On Sat, Sep 22, 2018 at 06:53:55PM +1200, Edmund Horner wrote:
> Hi all,
> 
> Here are some rebased versions of the last two patches.  No changes in
> functionality, except a minor case sensitivity fix in the "completion
> after commas" patch.
> 
> Edmund

I've rebased and updated these to be more principled about what
functions could be tab completed.

Still missing: tests.

What precisely is this supposed to do?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 87fb63d70f5e08f853a2b8aa94e03c0d665f5000 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Thu, 7 Oct 2021 15:59:06 -0700
Subject: [PATCH v9 1/2] Infrastructure for tab completion in the SELECT list

---
 src/bin/psql/tab-complete.c | 65 ++---
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c
index ecae9df8ed..5db143523b 100644
--- src/bin/psql/tab-complete.c
+++ src/bin/psql/tab-complete.c
@@ -1017,6 +1017,45 @@ static const VersionedQuery Query_for_list_of_subscriptions[] = {
 	{0, NULL}
 };
 
+static const SchemaQuery Query_for_list_of_selectable_functions[] = {
+	{
+		/* min_server_version */
+		70400,
+		/* catname */
+		"pg_catalog.pg_proc p",
+		/* Disallow functions which take or return pseudotypes which are other than all* or *record */
+		"NOT((ARRAY[p.prorettype] || coalesce(p.proallargtypes, p.proargtypes)) "
+		"&& ARRAY(SELECT oid FROM pg_catalog.pg_type WHERE typtype='p' AND typname !~ '(^all|record$)'))"
+		/* Disallow stored procedures XXX this may change later. */
+		" AND p.prokind <> 'p'"
+		/* viscondition */
+		"pg_catalog.pg_function_is_visible(p.oid)",
+		/* namespace */
+		"p.pronamespace",
+		/* result */
+		"pg_catalog.quote_ident(p.proname)||'('",
+		/* qualresult */
+		NULL
+	},
+	{0, NULL}
+};
+
+/*
+ * This addon is used to find (unqualified) column names to include
+ * alongside the function names from the query above.
+ */
+static const VersionedQuery Query_addon_for_list_of_selectable_attributes[] = {
+	{70400,
+		"UNION ALL "
+		"   SELECT DISTINCT pg_catalog.quote_ident(attname) "
+		" FROM pg_catalog.pg_attribute "
+		"WHERE attnum > 0 "
+		"  AND NOT attisdropped "
+		"  AND substring(pg_catalog.quote_ident(attname),1,%d)='%s'"
+	},
+	{0, NULL}
+};
+
 /*
  * This is a list of all "things" in Pgsql, which can show up after CREATE or
  * DROP; and there is also a query to get a list of them.
@@ -3768,7 +3807,9 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("IS");
 
 /* SELECT */
-	/* naah . . . */
+	else if (TailMatches("SELECT") || TailMatches("SELECT", "ALL|DISTINCT"))
+		COMPLETE_WITH_VERSIONED_SCHEMA_QUERY(Query_for_list_of_selectable_functions,
+			 Query_addon_for_list_of_selectable_attributes);
 
 /* SET, RESET, SHOW */
 	/* Complete with a variable name */
@@ -4432,7 +4473,8 @@ complete_from_versioned_schema_query(const char *text, int state)
  * where %d is the string length of the text and %s the text itself.
  *
  * If both simple_query and schema_query are non-NULL, then we construct
- * a schema query and append the (uninterpreted) string simple_query to it.
+ * a schema query and append the simple_query to it, replacing the %d and %s
+ * as described above.
  *
  * It is assumed that strings should be escaped to become SQL literals
  * (that is, what is in the query is actually ... '%s' ...)
@@ -4579,21 +4621,22 @@ _complete_from_query(const char *simple_query,
 			  " WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,%d) ="
 			  " substring('%s',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) = 1",
 			  char_length, e_text);
-
-			/* If an addon query was provided, use it */
-			if (simple_query)
-appendPQExpBuffer(_buffer, "\n%s", simple_query);
 		}
 		else
 		{
 			Assert(simple_query);
-			/* simple_query is an sprintf-style format string */
-			appendPQExpBuffer(_buffer, simple_query,
-			  char_length, e_text,
-			  e_info_charp, e_info_charp,
-			  e_info_charp2, e_info_charp2);
 		}
 
+		/*
+		 * simple_query is an sprintf-style format string (or it could be NULL, but
+		 * only if this is a schema query with no addon).
+		 */
+		if (simple_query)
+			appendPQExpBuffer(_buffer, simple_query,
+	  char_length, e_text,
+	  e_info_charp, e_info_charp,
+					  e_info_charp2, e_info_charp2);
+
 		/* Limit the number of records in the result */
 		appendPQExpBuffer(_buffer, "\nLIMIT %d",
 		  completion_max_records);
-- 
2.31.1

>From 3764b9fe211db1fa322fa83103d39356928942ae Mon Sep 17 00:00:

Re: When is int32 not an int32?

2021-09-26 Thread David Fetter
On Sun, Sep 26, 2021 at 05:32:11PM -0400, David E. Wheeler wrote:
> Hell Hackers, long time no email!
> 
> I got a bug report for the semver extension:
> 
>   https://github.com/theory/pg-semver/issues/58
> 
> It claims that a test unexpected passes. That is, Test #31 is expected to 
> fail, because it intentionally tests a version in which its parts overflow 
> the int32[3] they’re stored in, with the expectation that one day we can 
> refactor the type to handle larger version parts.
> 
> I can’t imagine there would be any circumstance under which int32 would 
> somehow be larger than a signed 32-bit integer, but perhaps there is?
> 
> Scroll to the bottom of these pages to see the unexpected passes on i386 and 
> armhf:
> 
>   
> https://ci.debian.net/data/autopkgtest/unstable/i386/p/postgresql-semver/15208658/log.gz
>   
> https://ci.debian.net/data/autopkgtest/unstable/armhf/p/postgresql-semver/15208657/log.gz
> 
> Here’s the Postgres build output for those two platforms, as well, though 
> nothing jumps out at me:
> 
>   
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=i386=13.4-3=1630408269=0
>   
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=armhf=13.4-3=1630412028=0

I noticed that in i386, configure finds none of (int8, uint8, int64,
uint64), and I wonder whether we're actually testing whatever
alternative we provide when we don't have them.

I also noticed that the first of the long sequences of 9s doesn't even
fit inside a uint64. The other two fit inside an int64, so if
promotion were somehow happening, that wouldn't be a great test.

999.99.9
 over 2^72   over 2^59  over 2^56

These two observations taken together, get me to my first guess is
that the machinery we provide when we see non-working 64-bit integers
is totally broken.

If that's right, we should at least discuss reversing our claim that
we support such systems, seeing as it doesn't appear that people will
be deploying new versions of PostgreSQL on them.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: psql: tab completion differs on semicolon placement

2021-09-20 Thread David Fetter
On Mon, Sep 20, 2021 at 08:26:51PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Daniel Gustafsson  writes:
> 
> > While testing a patch I fat-fingered a CREATE DATABASE statement by tab
> > completing *after* the semicolon, with no space between the objname and
> > semicolon.  The below options were presented, which at this point aren't 
> > really
> > applicable:
> >
> > db=# create database foo;
> > ALLOW_CONNECTIONS ENCODING  LC_COLLATELOCALE
> > TABLESPACE
> > CONNECTION LIMIT  IS_TEMPLATE   LC_CTYPE  OWNER 
> > TEMPLATE
> >
> > DROP DATABASE has a similar tab completion which makes about as much sense:
> >
> > db=# drop database foo;WITH (
> >
> > Checking prev_wd for not ending with ';' as per the attached makes 
> > "objname;"
> > behave like "objname ;".  Is there a reason for not doing that which I'm
> > missing?  I didn't check for others, but if this seems reasonable I'll go
> > through to find any other similar cases.
> 
> The same applies to any completion after a MatchAny that ends in a any
> of the WORD_BREAKS characters (except whitespace and () which are
> handled specially).
> 
> #define WORD_BREAKS "\t\n@$><=;|&{() "
> 
> IMO a fix should be more principled than just special-casing semicolon
> and CREATE TABLE.  Maybe get_previous_words() should stop when it sees
> an unquoted semicolon?

Is there some reason get_previous_words() shouldn't stop for
everything that's WORD_BREAKS? If not, making that the test might make the
general rule a little simpler to write, and if WORD_BREAKS ever
changed, for example to include all space, or all breaking space, or
similar, the consequences would at least not propagate through
seemingly unrelated code.

At the moment, get_previous_words() does look for everything in
WORD_BREAKS, and then accounts for double quotes (") and then does
something clever to account for double quotes and the quoting behavior
that doubling them ("") accomplishes. Anyhow, that looks like it
should work in this case, but clearly it's not.

Would it be less error prone to do these checks and maybe push or pop
one or more stacks holding state as each character came in? I suspect
the overhead would be unnoticeable even on the slowest* client.

Best,
David.

* One possible exception would be a gigantic paste, a case where psql
  can be prevented from attempting tab completion, although the
  prevention measures involve a pretty obscure terminal setting:
  https://cirw.in/blog/bracketed-paste

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Hook for extensible parsing.

2021-09-15 Thread David Fetter
On Wed, Sep 15, 2021 at 02:25:17PM +0100, Simon Riggs wrote:
> On Sat, 1 May 2021 at 08:24, Julien Rouhaud  wrote:
> 
> > Being able to extend core parser has been requested multiple times, and 
> > AFAICT
> > all previous attempts were rejected not because this isn't wanted but 
> > because
> > the proposed implementations required plugins to reimplement all of the core
> > grammar with their own changes, as bison generated parsers aren't 
> > extensible.
> >
> > I'd like to propose an alternative approach, which is to allow multiple 
> > parsers
> > to coexist, and let third-party parsers optionally fallback on the core
> > parsers.
> 
> Yes, that approach has been discussed by many people, most recently
> around the idea to create a fast-path grammar to make the most
> frequently used SQL parse faster.
> 
> > 0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only 
> > the
> > ability to produce "select [col, ] col FROM table" parsetree, for testing
> > purpose.  I chose it to ensure that everything works properly even with a
> > totally different grammar that has different keywords, which doesn't even 
> > ends
> > statements with a semicolon but a plain keyword.
> 
> The general rule has always been that we don't just put hooks in, we
> always require an in-core use for those hooks. I was reminded of that
> myself recently.
> 
> What we need is something in core that actually makes use of this. The
> reason for that is not politics, but a simple test of whether the
> feature makes sense AND includes all required bells and whistles to be
> useful in the real world.
> 
> Core doesn't need a LOL parser and I don't think we should commit that.

It doesn't, but it very likely needs something people can use when
they create a new table AM, and that we should use the hook in core to
implement the heap* table AM to make sure the thing is working at DDL
time.

> If we do this, I think it should have CREATE LANGUAGE support, so
> that each plugin can be seen as an in-core object and allow security
> around which users can execute which language types, allow us to
> switch between languages and have default languages for specific
> users or databases.

That's a great idea, but I must be missing something important as it
relates to parser hooks. Could you connect those a little more
explicitly?

Best,
David.

* It's not actually a heap in the sense that the term is normally used
in computing. I'd love to find out how it got to have this name and
document same so others aren't also left wondering.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-09-15 Thread David Fetter
On Fri, Sep 03, 2021 at 08:27:55PM +0200, Daniel Gustafsson wrote:
> > On 19 May 2021, at 09:53, Michael Paquier  wrote:
> > 
> > On Tue, Apr 27, 2021 at 12:58:52PM +0300, Aleksander Alekseev wrote:
> >> I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here
> >> is an alternative version of the patch that fixes this as well. Not
> >> sure if this should be in the same commit though.
> > 
> > -   /* If we have ALTER TABLE  DROP, provide COLUMN or CONSTRAINT */
> > -   else if (Matches("ALTER", "TABLE", MatchAny, "DROP"))
> > +   /* If we have ALTER TABLE  ADD|DROP, provide COLUMN or CONSTRAINT 
> > */
> > +   else if (Matches("ALTER", "TABLE", MatchAny, "ADD|DROP"))
> > Seems to me that the behavior to not complete with COLUMN or
> > CONSTRAINT for ADD is intentional, as it is possible to specify a
> > constraint or column name without the object type first.  This
> > introduces a inconsistent behavior with what we do for columns with
> > ADD, for one.  So a more consistent approach would be to list columns,
> > constraints, COLUMN and CONSTRAINT in the list of options available
> > after ADD.
> > 
> > +   else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
> > +   {
> > +   completion_info_charp = prev3_wd;
> > +   COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table);
> > +   }
> > Specifying valid constraints is an authorized grammar, so it does not
> > seem that bad to keep things as they are, either.  I would leave that
> > alone.
> 
> This has stalled being marked Waiting on Author since May, and reading the
> above it sounds like marking it Returned with Feedback is the logical next 
> step
> (patch also no longer applies).

Please find attached the next revision :)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 716e9dde96ef9973fc6b0fc9ca9934df153af9da Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 14 Sep 2021 15:28:34 -0700
Subject: [PATCH v4] psql: Fix tab completion for ALTER TABLE...

... VALIDATE CONSTRAINT.

Completions now include only constraints which have not yet been
validated.
---
 src/bin/psql/tab-complete.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c
index 5cd5838668..b55166aa64 100644
--- src/bin/psql/tab-complete.c
+++ src/bin/psql/tab-complete.c
@@ -788,6 +788,15 @@ Query_for_index_of_table \
 "   and pg_catalog.quote_ident(c1.relname)='%s'"\
 "   and pg_catalog.pg_table_is_visible(c1.oid)"
 
+/* the silly-looking length condition is just to eat up the current word */
+#define Query_for_constraint_of_table_not_validated \
+"SELECT pg_catalog.quote_ident(conname) "\
+"  FROM pg_catalog.pg_class c1, pg_catalog.pg_constraint con "\
+" WHERE c1.oid=conrelid and (%d = pg_catalog.length('%s'))"\
+"   and pg_catalog.quote_ident(c1.relname)='%s'"\
+"   and pg_catalog.pg_table_is_visible(c1.oid)" \
+"   and NOT con.convalidated"
+
 #define Query_for_all_table_constraints \
 "SELECT pg_catalog.quote_ident(conname) "\
 "  FROM pg_catalog.pg_constraint c "\
@@ -2145,14 +2154,22 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_ATTR(prev3_wd, "");
 
 	/*
-	 * If we have ALTER TABLE  ALTER|DROP|RENAME|VALIDATE CONSTRAINT,
+	 * If we have ALTER TABLE  ALTER|DROP|RENAME CONSTRAINT,
 	 * provide list of constraints
 	 */
-	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME|VALIDATE", "CONSTRAINT"))
+	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME", "CONSTRAINT"))
 	{
 		completion_info_charp = prev3_wd;
 		COMPLETE_WITH_QUERY(Query_for_constraint_of_table);
 	}
+	/*
+	 * ALTER TABLE  VALIDATE CONSTRAINT 
+	 */
+	else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_constraint_of_table_not_validated);
+	}
 	/* ALTER TABLE ALTER [COLUMN]  */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny) ||
 			 Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny))
-- 
2.31.1



Re: Mark all GUC variable as PGDLLIMPORT

2021-08-23 Thread David Fetter
On Mon, Aug 23, 2021 at 10:56:52AM -0400, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 10:15 AM Tom Lane  wrote:
> > And yes, I absolutely would prohibit extensions from accessing many
> > of them, if there were a reasonable way to do it.  It would be a good
> > start towards establishing a defined API for extensions.
> 
> Mostly, it would make extension development more difficult for no
> discernable benefit to the project.
> 
> You've made this argument many times over the years ... but we're no
> closer to having an extension API than we ever were, and we continue
> to get complaints about breaking stuff on Windows on a pretty regular
> basis.
> 
> Honestly, it seems unimaginable that an API is ever really going to be
> possible. It would be a ton of work to maintain, and we'd just end up
> breaking it every time we discover that there's a new feature we want
> to implement which doesn't fit into the defined API now. That's what
> we do *now* with functions that third-party extensions actually call,
> and with variables that they access, and it's not something that, in
> my experience, is any great problem in maintaining an extension.
> You're running code that is running inside somebody else's executable
> and sometimes you have to adjust it for upstream changes. That's life,
> and I don't think that complaints about that topic are nearly as
> frequent as complaints about extensions breaking on Windows because of
> missing PGDLLIMPORT markings.
> 
> And more than that, I'm pretty sure that you've previously taken the
> view that we shouldn't document all the hook functions that only exist
> in the backend for the purpose of extension use.

As the person on the receiving end of that one, I was nonplussed, so I
took a step back to think it over.  I recognized at that time that I
didn't have a great answer for a legitimate concern, namely that any
change, however trivial, that goes into the core and doesn't go
directly to core database functionality, represents a long-term
maintenance burden.

The thing I'm coming to is that the key architectural feature
PostgreSQL has that other RDBMSs don't is its extensibility. Because
that's been a stable feature over time, I'm pretty sure we actually
need to see documenting that as a thing that does actually go to core
database functionality. Yes, there are resources involved with doing a
thing like this, but I don't think that they require constant or even
frequent attention from committers or even from seasoned DB hackers.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Default to TIMESTAMP WITH TIME ZONE?

2021-08-15 Thread David Fetter
On Sat, Aug 14, 2021 at 10:03:01AM +0200, Peter Eisentraut wrote:
> On 13.08.21 19:07, Tom Lane wrote:
> > "David G. Johnston"  writes:
> > > On Fri, Aug 13, 2021 at 9:28 AM Simon Riggs 
> > > wrote:
> > > > The only hope is to eventually change the default, so probably
> > > > the best thing is to apply pressure via the SQL Std process.
> > 
> > > Then there is no hope because this makes the situation worse.
> > 
> > Agreed; the points I made upthread are just as valid if the change
> > is made in the standard.  But I'd be astonished if the SQL
> > committee would consider such a change anyway.
> 
> AFAIU, our timestamp with time zone type doesn't really do what the
> SQL standard specifies anyway, as it doesn't actually record the
> time zone, but it's more of a "timestamp with time zone aware
> formatting".  For SQL, it might make sense to add a (third) time
> stamp type that behaves more like that.

The way the standard defines time zones, namely as fix offsets from
UTC, is just ludicrous and has been baseless in fact much longer than
SQL has existed.  If we're going to capture input time zones, we
should come up with a way to capture the time zone as it existed when
the write occurred, i.e. both its name and the UTC offset it
represented at that time of the write.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Slim down integer formatting

2021-07-27 Thread David Fetter
On Wed, Jul 28, 2021 at 01:17:43PM +1200, David Rowley wrote:
> On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera  wrote:
> > So how much faster is it than the original?
> 
> I only did some very quick tests.  They're a bit noisey. The results
> indicate an average speedup of 1.7%, but the noise level is above
> that, so unsure.
> 
> create table a (a int);
> insert into a select a from generate_series(1,100)a;
> vacuum freeze a;
> 
> bench.sql: copy a to '/dev/null';
> 
> master @ 93a0bf239
> drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 153.815 ms
> latency average = 152.955 ms
> latency average = 147.491 ms
> 
> master + v2 patch
> drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 144.749 ms
> latency average = 151.525 ms
> latency average = 150.392 ms

Thanks for testing this!  I got a few promising results early on with
-O0, and the technique seemed like a neat way to do things.

I generated a million int4s intended to be uniformly distributed
across the range of int4, and similarly across int8.

int4:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average:  362.149 ms 359.933 ms
latency stddev:  3.44 ms3.40 ms

int8:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average:  434.944 ms 422.270 ms
latency stddev:  3.23 ms4.02 ms

when compiled with -O2:

int4:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 167.262 ms 148.673 ms
latency stddev:6.26  ms   1.28 ms

i.e. it was actually slower, at least over the 10 runs I did.

I assume that "uniform distribution across the range" is a bad case
scenario for ints, but I was a little surprised to measure worse
performance.  Interestingly, what I got for int8s generated to be
uniform across their range was

int8:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 171.737 ms 174.013 ms
latency stddev:1.94  ms   6.84 ms

which doesn't look like a difference to me.

Intuitively, I'd expect us to get things in the neighborhood of 1 a
lot more often than things in the neighborhood of 1 << (30 or 60).  Do
we have some idea of the distribution, or at least of the distribution
family, that we should expect for ints?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Slim down integer formatting

2021-07-26 Thread David Fetter
Folks,

Please find attached a patch to do $subject. It's down to a one table
lookup and 3 instructions.

In covering the int64 versions, I swiped a light weight division from
the Ryu stuff.  I'm pretty sure that what I did is not how to do
#includes, but it's a PoC.  What would be a better way to do this?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 1cf7202facd9fee161865d90304e5ede1e3c65cf Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 26 Jul 2021 16:43:43 -0700
Subject: [PATCH v1] PoC: Slim down integer printing some more

---
 src/backend/utils/adt/numutils.c | 62 +++-
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git src/backend/utils/adt/numutils.c src/backend/utils/adt/numutils.c
index b93096f288..c4f2d5cfb1 100644
--- src/backend/utils/adt/numutils.c
+++ src/backend/utils/adt/numutils.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+#include "../common/d2s_intrinsics.h"
 #include "common/int.h"
 #include "utils/builtins.h"
 #include "port/pg_bitutils.h"
@@ -39,50 +40,45 @@ static const char DIGIT_TABLE[200] =
 "90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
 
 /*
- * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ * Adapted from
+ * https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/
  */
 static inline int
 decimalLength32(const uint32 v)
 {
-	int			t;
-	static const uint32 PowersOfTen[] = {
-		1, 10, 100,
-		1000, 1, 10,
-		100, 1000, 1,
-		10
+	/*
+	 * Each element in the array, when added to a number with MSB that is the
+	 * array index, will produce a number whose top 32 bits are its decimal
+	 * length, so that can now be had using:
+	 *
+	 * 1 CLZ instruction,
+	 * 1 table lookup,
+	 * 1 add, and
+	 * 1 shift
+	 *
+	 */
+	static const uint64 digit_transitions[32] = {
+		4294967295,  8589934582,  8589934582,  8589934582,  12884901788,
+		12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
+		21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
+		25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
+		34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
+		38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
+		42949672960, 42949672960
 	};
 
-	/*
-	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
-	 * good-enough approximation of the base-2 logarithm of 10
-	 */
-	t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096;
-	return t + (v >= PowersOfTen[t]);
+	return (v + digit_transitions[pg_leftmost_one_pos32(v)]) >> 32;
 }
 
 static inline int
 decimalLength64(const uint64 v)
 {
-	int			t;
-	static const uint64 PowersOfTen[] = {
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000)
-	};
-
-	/*
-	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
-	 * good-enough approximation of the base-2 logarithm of 10
-	 */
-	t = (pg_leftmost_one_pos64(v) + 1) * 1233 / 4096;
-	return t + (v >= PowersOfTen[t]);
+	if (v >> 32 == 0)
+		return decimalLength32(v);
+	else if (div1e8(v) >> 32 == 0)
+		return 8 + decimalLength32(div1e8(v));
+	else
+		return 16 + decimalLength32(div1e8(div1e8(v)));
 }
 
 /*
-- 
2.31.1



Re: [PATCH] Allow multiple recursive self-references

2021-07-18 Thread David Fetter
On Thu, Jul 15, 2021 at 09:18:00AM +0200, Denis Hirn wrote:
> > The patch does not apply on Head anymore, could you rebase and post a patch.
> 
> Sure thing. Here's the new patch.
> 
> Best wishes,

Thanks for updating this.

In the next version of the patch, would you be so kind as to include
updated documentation of the feature and at least one regression test
of same?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use singular number when appropriate

2021-06-15 Thread David Fetter
On Tue, Jun 15, 2021 at 09:37:11AM +0200, Laurenz Albe wrote:
> On Tue, 2021-06-15 at 04:59 +0000, David Fetter wrote:
> > I thought about using the dual, but wasn't sure how many languages
> > support it.
> 
> I think none of the languages for which we cater uses the dual.  But
> I guess you were joking, since the tests are not translated ...

I was.

> > if (fail_count == 0 && fail_ignore_count == 0)
> > snprintf(buf, sizeof(buf),
> > -_(" All %d tests passed. "),
> > -success_count);
> > +_(" %s %d test%s passed. "),
> > +success_count == 1 ? "The" : "All",
> > +success_count,
> > +    success_count == 1 ? "" : "s");
> 
> ... and that wouldn't be translatable.

Thanks, will rearrange.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Use singular number when appropriate

2021-06-14 Thread David Fetter
Hi,

I thought about using the dual, but wasn't sure how many languages
support it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 1fd595576c5972d8604adf77d1959008daebacb0 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 14 Jun 2021 21:51:20 -0700
Subject: [PATCH v1] Singular number when appropriate

---
 src/test/regress/pg_regress.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git src/test/regress/pg_regress.c src/test/regress/pg_regress.c
index 05296f7ee1..d6f9e829e8 100644
--- src/test/regress/pg_regress.c
+++ src/test/regress/pg_regress.c
@@ -2641,25 +2641,31 @@ regression_main(int argc, char *argv[],
 	 */
 	if (fail_count == 0 && fail_ignore_count == 0)
 		snprintf(buf, sizeof(buf),
- _(" All %d tests passed. "),
- success_count);
+ _(" %s %d test%s passed. "),
+ success_count == 1 ? "The" : "All",
+ success_count,
+ success_count == 1 ? "" : "s");
 	else if (fail_count == 0)	/* fail_count=0, fail_ignore_count>0 */
 		snprintf(buf, sizeof(buf),
- _(" %d of %d tests passed, %d failed test(s) ignored. "),
+ _(" %d of %d test%s passed, %d failed test%s ignored. "),
  success_count,
  success_count + fail_ignore_count,
- fail_ignore_count);
+ success_count == 1 ? "" : "s",
+ fail_ignore_count,
+ fail_ignore_count == 1 ? "" : "s");
 	else if (fail_ignore_count == 0)	/* fail_count>0 && fail_ignore_count=0 */
 		snprintf(buf, sizeof(buf),
- _(" %d of %d tests failed. "),
+ _(" %d of %d test%s failed. "),
  fail_count,
- success_count + fail_count);
+ success_count + fail_count,
+ fail_count == 1 ? "" : "s");
 	else
 		/* fail_count>0 && fail_ignore_count>0 */
 		snprintf(buf, sizeof(buf),
- _(" %d of %d tests failed, %d of these failures ignored. "),
+ _(" %d of %d test%s failed, %d of these failures ignored. "),
  fail_count + fail_ignore_count,
  success_count + fail_count + fail_ignore_count,
+ fail_count + fail_ignore_count == 1 ? "" : "s",
  fail_ignore_count);
 
 	putchar('\n');
-- 
2.31.1



Re: pg_receivewal makes a bad daemon

2021-05-05 Thread David Fetter
t anything with an incompatible license will
upset people who are accustomed to ensuring that we have what legally
amounts to an MIT license clean distribution, but I'm thinking that
option is at least worth discussing, even if the immediate consensus
is, "libreadline is bad enough. We went to a lot of trouble to purge
that other stuff back in the bad old days. Let's not make that mistake
again."

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Make some column descriptions easier to distinguish visually

2021-05-05 Thread David Fetter
Folks,

I was writing up a query on pg_constraint, and the columns whose
descriptions I've changed here were pretty hard to puzzle out, as they
were only distinct up to the difference between F and P, which isn't
always easy to see.  Please find attached a patch to disambiguate
them.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 4202229a9319fa2f307b2761696775a2c5905fcd Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 5 May 2021 15:48:53 -0700
Subject: [PATCH v1] Clarify some column descriptions in pg_constraint
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.31.1"

This is a multi-part message in MIME format.
--2.31.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git doc/src/sgml/catalogs.sgml doc/src/sgml/catalogs.sgml
index 492ed348b3..2c501f21f7 100644
--- doc/src/sgml/catalogs.sgml
+++ doc/src/sgml/catalogs.sgml
@@ -2664,7 +2664,7 @@ SCRAM-SHA-256$iteration count:
(references pg_operator.oid)
   
   
-   If a foreign key, list of the equality operators for PK = FK comparisons
+   If a foreign key, list of the equality operators for establishing whether primary key columns are equal to the corresponding foreign key columns (PK = FK)
   
  
 
@@ -2674,7 +2674,7 @@ SCRAM-SHA-256$iteration count:
(references pg_operator.oid)
   
   
-   If a foreign key, list of the equality operators for PK = PK comparisons
+   If a foreign key, list of the equality operators for establishing whether primary key columns are equal (PK = PK)
   
  
 
@@ -2684,7 +2684,7 @@ SCRAM-SHA-256$iteration count:
(references pg_operator.oid)
   
   
-   If a foreign key, list of the equality operators for FK = FK comparisons
+   If a foreign key, list of the equality operators for establishing whether foreign key columns are equal (FK = FK)
   
  
 

--2.31.1--




Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-04-27 Thread David Fetter
On Tue, Apr 27, 2021 at 12:33:25PM +0300, Aleksander Alekseev wrote:
> Hi David,
> 
> > I noticed that $subject completes with already valid constraints,
> > please find attached a patch that fixes it. I noticed that there are
> > other places constraints can be validated, but didn't check whether
> > similar bugs exist there yet.
> 
> There was a typo in the patch ("... and and not convalidated"). I've fixed
> it. Otherwise the patch passes all the tests and works as expected:
> 
> eax=# create table foo (x int);
> CREATE TABLE
> eax=# alter table foo add constraint bar check (x < 3) not valid;
> ALTER TABLE
> eax=# alter table foo add constraint baz check (x <> 5) not valid;
> ALTER TABLE
> eax=# alter table foo validate constraint ba
> bar baz
> eax=# alter table foo validate constraint bar;
> ALTER TABLE

Sorry about that typo, and thanks for poking at this!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-04-26 Thread David Fetter
Folks,

I noticed that $subject completes with already valid constraints,
please find attached a patch that fixes it. I noticed that there are
other places constraints can be validated, but didn't check whether
similar bugs exist there yet.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From e191928111a7500c3a3bc84558797fe86451c24b Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 26 Apr 2021 17:17:15 -0700
Subject: [PATCH v1] tab-completion of ALTER TABLE...VALIDATE CONSTRAINT
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.30.2"

This is a multi-part message in MIME format.
--2.30.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


...now chooses only among constraints that aren't already valid.

diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c
index 7c493b..2252b7d3ac 100644
--- src/bin/psql/tab-complete.c
+++ src/bin/psql/tab-complete.c
@@ -785,6 +785,15 @@ static const SchemaQuery Query_for_list_of_collations = {
 "   and pg_catalog.quote_ident(c1.relname)='%s'"\
 "   and pg_catalog.pg_table_is_visible(c1.oid)"
 
+/* the silly-looking length condition is just to eat up the current word */
+#define Query_for_nonvalid_constraint_of_table \
+"SELECT pg_catalog.quote_ident(conname) "\
+"  FROM pg_catalog.pg_class c1, pg_catalog.pg_constraint con "\
+" WHERE c1.oid=conrelid and and not convalidated"\
+"   and (%d = pg_catalog.length('%s'))"\
+"   and pg_catalog.quote_ident(c1.relname)='%s'"\
+"   and pg_catalog.pg_table_is_visible(c1.oid)"
+
 #define Query_for_all_table_constraints \
 "SELECT pg_catalog.quote_ident(conname) "\
 "  FROM pg_catalog.pg_constraint c "\
@@ -2118,11 +2127,16 @@ psql_completion(const char *text, int start, int end)
 	 * If we have ALTER TABLE  ALTER|DROP|RENAME|VALIDATE CONSTRAINT,
 	 * provide list of constraints
 	 */
-	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME|VALIDATE", "CONSTRAINT"))
+	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME", "CONSTRAINT"))
 	{
 		completion_info_charp = prev3_wd;
 		COMPLETE_WITH_QUERY(Query_for_constraint_of_table);
 	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table);
+	}
 	/* ALTER TABLE ALTER [COLUMN]  */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny) ||
 			 Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny))

--2.30.2--




Re: popcount

2021-03-24 Thread David Fetter
On Tue, Mar 23, 2021 at 10:51:08AM +0100, Peter Eisentraut wrote:
> On 21.03.21 02:31, David Fetter wrote:
> > > I have now read the entire internet on what a suitable name for this
> > > function could be.  I think the emerging winner is BIT_COUNT(), which
> > > already exists in MySQL, and also in Python (int.bit_count()) and Java
> > > (Integer.bitCount()).
> > 
> > Thanks for doing this tedious work. Please find attached the next
> > version of the patch.
> 
> committing, with some polishing

Thanks!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: popcount

2021-03-20 Thread David Fetter
On Sat, Mar 20, 2021 at 09:01:25PM +0100, Peter Eisentraut wrote:
> On 18.03.21 13:51, John Naylor wrote:
> > Hi David,
> > 
> > Just a nitpick:
> > 
> > +SET bytea_output TO hex;
> > 
> > Since we don't see the string in the output, I don't immediately see the
> > reason to change the output format here?

That's how I got it to work. If there's a way to make it go without
that, I'd be delighted to learn what it is :)

> > Aside from that, this patch works as expected, and is ready for committer.
> 
> I have now read the entire internet on what a suitable name for this
> function could be.  I think the emerging winner is BIT_COUNT(), which
> already exists in MySQL, and also in Python (int.bit_count()) and Java
> (Integer.bitCount()).

Thanks for doing this tedious work. Please find attached the next
version of the patch.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 9483d41941b5daa44e46e6fb164846647d716406 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v5] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.30.2"

This is a multi-part message in MIME format.
--2.30.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 68fe6a95b4..066431fd3c 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4030,6 +4030,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   
  
 
+  
+   
+
+ bit_count
+
+bit_count ( bytes bytea )
+bigint
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+bit_count('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ bit_count
+
+bit_count ( bits bit )
+bigint
+   
+   
+Counts the bits set in a bit string.
+   
+   
+bit_count(B'101010101010101010')
+9
+   
+  
+
   

 
@@ -4869,6 +4903,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index e259531f60..feb00eccf9 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'bit_count', prorettype => 'int8', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3876,6 +3879,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'bit_count', prorettype => 'int8', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 2235866244..c9c6c73422 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -36,6 +36,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1878,3 +1879,30 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit string.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	/* There's really no chance of an overflow here because
+	 * to get to INT64_MAX set bits, an object would have to be
+	 * an exbibyte long, exceeding what PostgreSQL can currently
+	 * store by a factor of 2^28
+	 */
+	int64		popcount;
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len;
+
+	p = VARBITS(arg1);
+	len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+
+	popcount = pg_popcount((char *)p, len);
+
+	PG_RETURN_INT64(popcount);
+}
diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c
index 0bc345aa4d..95

Re: WIP: document the hook system

2021-03-10 Thread David Fetter
On Wed, Mar 10, 2021 at 09:38:39AM -0500, David Steele wrote:
> On 3/9/21 12:20 PM, Bruce Momjian wrote:
> > On Sat, Mar  6, 2021 at 08:32:43PM -0500, Tom Lane wrote:
> > > I think that the best you should hope for here is that people are
> > > willing to add a short, not-too-detailed para to a markup-free
> > > plain-text README file that lists all the hooks.  As soon as it
> > > gets any more complex than that, either the doco aspect will be
> > > ignored, or there simply won't be any more hooks.
> > > 
> > > (I'm afraid I likewise don't believe in the idea of carrying a test
> > > module for each hook.  Again, requiring that is a good way to
> > > ensure that new hooks just won't happen.)
> > 
> > Agreed.  If you document the hooks too much, it allows them to drift
> > away from matching the code, which makes the hook documentation actually
> > worse than having no hook documentation at all.
> 
> There's doesn't seem to be agreement on how to proceed here, so closing.
> 
> David, if you do decide to proceed with a README then it would probably be
> best to create a new thread/entry.

Thanks for the work on this and the helpful feedback!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [PATCH] pg_permissions

2021-03-07 Thread David Fetter
On Sat, Mar 06, 2021 at 08:03:17PM +0100, Joel Jacobson wrote:
> Hi,
> 
> It's easy to answer the question...
> 
>- What permissions are there on this specific object?
> 
> ...but to answer the question...
> 
>- What permissions are there for a specific role in the database?
> 
> you need to manually query all relevant pg_catalog or 
> information_schema.*_privileges views,
> which is a O(n) mental effort, while the first question is mentally O(1).
> 
> I think this can be improved by providing humans a single pg_permissions 
> system view
> which can be queried to answer the second question. This should save a lot of 
> keyboard punches.
> 
> Demo:
> 
> SELECT * FROM pg_permissions WHERE 'joel' IN (grantor,grantee);
>regclass   |  obj_desc   | grantor | grantee | 
> privilege_type | is_grantable
> --+-+-+-++--
> pg_namespace | schema foo  | joel| joel| USAGE
>   | f
> pg_namespace | schema foo  | joel| joel| CREATE   
>   | f
> pg_class | table foo.bar   | joel| joel| INSERT   
>   | f
> pg_class | table foo.bar   | joel| joel| SELECT   
>   | f
> pg_class | table foo.bar   | joel| joel| UPDATE   
>   | f
> pg_class | table foo.bar   | joel| joel| DELETE   
>   | f
> pg_class | table foo.bar   | joel| joel| TRUNCATE 
>   | f
> pg_class | table foo.bar   | joel| joel| REFERENCES   
>   | f
> pg_class | table foo.bar   | joel| joel| TRIGGER  
>   | f
> pg_attribute | column baz of table foo.bar | joel| joel| SELECT   
>   | f
> pg_attribute | column baz of table foo.bar | joel| joel| UPDATE   
>   | f
> (11 rows)
> 
> All catalogs with _aclitem columns have been included in the view.
> 
> I think a similar one for ownerships would be nice too.
> But I'll let you digest this one first to see if the concept is fruitful.

+1 for both this and the ownerships view.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: WIP: document the hook system

2021-03-06 Thread David Fetter
On Fri, Feb 12, 2021 at 08:02:51PM +0300, Anastasia Lubennikova wrote:
> On 17.01.2021 16:53, Magnus Hagander wrote:
> > On Fri, Jan 15, 2021 at 8:28 AM Peter Eisentraut
> >  wrote:
> > > On 2020-12-31 04:28, David Fetter wrote:
> > > > This could probably use a lot of filling in, but having it in the
> > > > actual documentation beats needing to know folklore even to know
> > > > that the capability is there.
> > > This patch seems quite short of a state where one could begin to
> > > evaluate it.  Documenting the hooks better seems a worthwhile goal.   I
> > > think the question is whether we can develop documentation that is
> > > genuinely useful by itself without studying the relevant source code.
> > > This submission does not address that question.
> > Even just having a list of available hooks would be a nice improvement 
> > though :)
> > 
> > But maybe it's something that belongs better in a README file instead,
> > since as you say it's unlikely to be properly useful without looking
> > at the source anyway. But just a list of hooks and a *very* high
> > overview of where each of them hooks in would definitely be useful to
> > have somewhere, I think. Having to find with "grep" whether there may
> > or may not exist a hook for approximately what it is you're looking
> > for is definitely a process to improve on.
> > 
> +1 for README.
> Hooks are intended for developers and can be quite dangerous without proper
> understanding of the internal code.
> 
> I also want to remind about a readme gathered by mentees [1]. It was done
> under a PostgreSQL license, so we can use it.
> By the way, is there any agreement on the plain-text format of PostrgeSQL
> README files or we can use md?
> 
> [1] https://github.com/AmatanHead/psql-hooks/blob/master/Detailed.md

This is much more thorough than what I've done so far, and a much
better document in terms of pointing to actual hunks of the source for
context.

I'm -1 on making a README alone. These are public APIs, and as such,
the fact of their existence shouldn't be a mystery discoverable only
by knowing that there's something to look for in the source tree and
then running an appropriate grep command to find the current ones

Would a document simply listing current hooks and pointing to
something along the lines of README_hooks in src work better?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [patch] bit XOR aggregate functions

2021-03-06 Thread David Fetter
On Sat, Mar 06, 2021 at 09:03:25PM +0100, Vik Fearing wrote:
> On 3/6/21 9:00 PM, David Fetter wrote:
> > On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:
> >> On 3/6/21 8:55 PM, David Fetter wrote:
> >>> On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
> >>>> On 10.02.21 06:42, Kyotaro Horiguchi wrote:
> >>>>> We already had CREATE AGGREATE at the time, so BIT_XOR can be
> >>>>> thought as it falls into the same category with BIT_AND and
> >>>>> BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?
> >>>>
> >>>> I think the use of BIT_XOR is quite separate from BIT_AND and
> >>>> BIT_OR. The latter give you an "all" or "any" result of the bits
> >>>> set.  BIT_XOR will return 1 or true if an odd number of inputs are 1
> >>>> or true, which isn't useful by itself.  But it can be used as a
> >>>> checksum, so it seems pretty reasonable to me to add it.  Perhaps
> >>>> the use case could be pointed out in the documentation.
> >>>
> >>> If this is the only use case, is there some way to refuse to execute
> >>> it if it doesn't contain an unambiguous ORDER BY, as illustrated
> >>> below?
> >>>
> >>> SELECT BIT_XOR(b ORDER BY a, c).../* works */
> >>> SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
> >>> SELECT BIT_XOR(b) FROM... /* errors out */
> >>
> >>
> >> Why would such an error be necessary, or even desirable?
> > 
> > Because there is no way to ensure that the results remain consistent
> > from one execution to the next without such a guarantee.
> 
> I think one of us is forgetting how XOR works.

Oops. You're right.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [patch] bit XOR aggregate functions

2021-03-06 Thread David Fetter
On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote:
> On 3/6/21 8:55 PM, David Fetter wrote:
> > On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
> >> On 10.02.21 06:42, Kyotaro Horiguchi wrote:
> >>> We already had CREATE AGGREATE at the time, so BIT_XOR can be
> >>> thought as it falls into the same category with BIT_AND and
> >>> BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?
> >>
> >> I think the use of BIT_XOR is quite separate from BIT_AND and
> >> BIT_OR. The latter give you an "all" or "any" result of the bits
> >> set.  BIT_XOR will return 1 or true if an odd number of inputs are 1
> >> or true, which isn't useful by itself.  But it can be used as a
> >> checksum, so it seems pretty reasonable to me to add it.  Perhaps
> >> the use case could be pointed out in the documentation.
> > 
> > If this is the only use case, is there some way to refuse to execute
> > it if it doesn't contain an unambiguous ORDER BY, as illustrated
> > below?
> > 
> > SELECT BIT_XOR(b ORDER BY a, c).../* works */
> > SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
> > SELECT BIT_XOR(b) FROM... /* errors out */
> 
> 
> Why would such an error be necessary, or even desirable?

Because there is no way to ensure that the results remain consistent
from one execution to the next without such a guarantee.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [patch] bit XOR aggregate functions

2021-03-06 Thread David Fetter
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote:
> On 10.02.21 06:42, Kyotaro Horiguchi wrote:
> > We already had CREATE AGGREATE at the time, so BIT_XOR can be
> > thought as it falls into the same category with BIT_AND and
> > BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation?
> 
> I think the use of BIT_XOR is quite separate from BIT_AND and
> BIT_OR. The latter give you an "all" or "any" result of the bits
> set.  BIT_XOR will return 1 or true if an odd number of inputs are 1
> or true, which isn't useful by itself.  But it can be used as a
> checksum, so it seems pretty reasonable to me to add it.  Perhaps
> the use case could be pointed out in the documentation.

If this is the only use case, is there some way to refuse to execute
it if it doesn't contain an unambiguous ORDER BY, as illustrated
below?

SELECT BIT_XOR(b ORDER BY a, c).../* works */
SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */
SELECT BIT_XOR(b) FROM...     /* errors out */

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Public APIs

2021-03-06 Thread David Fetter
Hi,

While I was looking at the Citus code base for a project at work, I
noticed a really ugly thing. It was a UDF called
alter_columnar_table_set(). It's clearly there because our current DDL
is a few bricks shy of a load, as others have phrased such things,
when it comes to accommodating the table AM API. A similar problem
exists when it comes to changing any GUC ending in "_libraries". There
is no way provided to change an existing value in place, only to
overwrite it. This is maybe OK if you'll only ever have between 0 and 1
.SOs loaded, but it gets to be a potentially serious problem if, for
example, you have two files in conf.d that set one. Then, we get
surprises caused by questions extension implementers really shouldn't
be saddled with, like "what order do such files get included in, and
what happens when a new one appears?"

The general issue, as I see it, is one that we can address by
providing reference implementations, however tiny, pointless, and
trivial, of each of our extension points
https://wiki.postgresql.org/wiki/PostgresServerExtensionPoints. I
learned about one, rendezvous variables, while writing this email.
Being public APIs, these all really need to be documented in our
official documentation, a thing I started on with the patch I'm
working on for the hooks system.

At a code level, this would be in the spirit of unit tests to ensure
that those extension points keep working by putting them in, say,
`make check-world` so as not to burden casual test runs.

Separately, it would be reasonable to make some efforts to ensure that
interactions among them are either safe or disallowed when attempted,
whichever seems reasonable to do. We can't cover the entire
combinatorial explosion, but adding a cross-check when someone has
reported a problem we can reasonably anticipate could recur would be a
big improvement.

We could start with "black hole" implementations, as Andrew Dunstan,
Michaël Paquier, and possibly others, have done, but actual working
systems would expose more weak points.

Would documenting these APIs be the right place to start?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Feedback on table expansion hook (including patch)

2021-03-06 Thread David Fetter
On Sat, Mar 06, 2021 at 01:09:10PM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 07.05.20 10:11, Erik Nordström wrote:
> >> I am looking for feedback on the possibility of adding a table expansion 
> >> hook to PostgreSQL (see attached patch).
> 
> > Unlike the get_relation_info_hook, your proposed hook would *replace* 
> > expand_inherited_rtentry() rather than just tack on additional actions. 
> > That seems awfully fragile.  Could you do with a hook that does 
> > additional things rather than replace a whole chunk of built-in code?
> 
> I suppose Erik is assuming that he could call expand_inherited_rtentry
> (or better, the previous hook occupant) when his special case doesn't
> apply.  But I'm suspicious that he'd still end up duplicating large
> chunks of optimizer/util/inherit.c in order to carry out the special
> case, since almost all of that is private/static functions.  It
> does seem like a more narrowly-scoped hook might be better.
> 
> Would it be unreasonable of us to ask for a worked-out example making
> use of the proposed hook?  That'd go a long way towards resolving the
> question of whether you can do anything useful without duplicating
> lots of code.
> 
> I've also been wondering, given the table-AM projects that are
> going on, whether we shouldn't refactor things to give partitioned
> tables a special access method, and then shove most of the planner
> and executor's hard-wired partitioning logic into access method
> callbacks.  That would make it a lot more feasible for extensions
> to implement custom partitioning-like behavior ... or so I guess.

That seems pretty reasonable. I suspect that that process will expose
bits of the planning and execution machinery that have gotten less
isolated than they should be.

More generally, and I'll start a separate thread on this, we should be
working up to including a reference implementation, however tiny, of
every extension point we supply in order to ensure that our APIs are
at a minimum reasonably usable and remain so.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: regexp_positions()

2021-02-27 Thread David Fetter
On Sat, Feb 27, 2021 at 08:51:27PM +0100, Joel Jacobson wrote:
> Hi,
> 
> Finding all matches in a string is convenient using regexp_matches() with the 
> 'g' flag.
> 
> But if instead wanting to know the start and end positions of the occurrences,
> one would have to first call regexp_matches(...,'g') to get all matches,
> and then iterate through the results and search using strpos() and length()
> repeatedly to find all start and end positions.
> 
> Assuming regexp_matches() internally already have knowledge of the 
> occurrences,
> maybe we could add a regexp_ranges() function that returns a two-dimensional 
> array,
> with all the [[start,end], ...] positions?

Maybe an int4multirange, which would fit unless I'm misunderstanding
g's meaning with respect to non-overlapping patterns, but that might
be a little too cute and not easy ever to extend.

Come to that, would a row structure that looked like

(match, start, end)

be useful?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread David Fetter
On Mon, Feb 22, 2021 at 07:34:51AM -0500, Dave Cramer wrote:
> On Fri, 19 Feb 2021 at 15:39, Álvaro Hernández  wrote:
> 
> > On 19/2/21 19:30, Jan Wieck wrote:
> > > [...]
> > >
> > > I also am not sure if building a connection pool into a
> > > background worker or postmaster is a good idea to begin with.
> > > One of the important features of a pool is to be able to suspend
> > > traffic and make the server completely idle to for example be
> > > able to restart the postmaster without forcibly disconnecting
> > > all clients. A pool built into a background worker cannot do
> > > that.
> 
> Yes, when did it become a good idea to put a connection pooler in
> the backend???

It became a great idea when we noticed just how large and
resource-intensive backends were, especially in light of applications'
broad tendency to assume that they're free. While I agree that that's
not a good assumption, it's one that's so common everywhere in
computing that we really need to face up to the fact that it's not
going away any time soon.

Decoupling the parts that serve requests from the parts that execute
queries also goes a long way toward things we've wanted for quite
awhile like admission control systems and/or seamless zero-downtime
upgrades.

Separately, as the folks at AWS and elsewhere have mentioned, being
able to pretend at some level to be a different RDBMS can only happen
if we respond to its wire protocol.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: computing dT from an interval

2021-02-22 Thread David Fetter
On Mon, Feb 22, 2021 at 10:52:42AM -0500, Tom Lane wrote:
> "Michael J. Baars"  writes:
> > So how do you compute the number of seconds in 8 years?
> 
> IMO, that's a meaningless computation, because the answer is not fixed.
> Before you claim otherwise, think about the every-four-hundred-years
> leap year exception in the Gregorian rules.  Besides, what if the
> question is "how many seconds in 7 years"?  Then it definitely varies
> depending on the number of leap days included.
> 
> What does make sense is timestamp subtraction, where the actual
> endpoints of the interval are known.

True.

I'm not sure whether this is a bug or an infelicity we document, but
at least in some parts of the world, this calculation doesn't comport
with the calendar in place at the time:

SELECT to_timestamp('1753', '') - to_timestamp('1752', '');
 ?column? 
══
 366 days
(1 row)

I'd like to imagine nobody will ever go mucking with the calendar to
the extent the British did that year, but one never knows.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Extensions not dumped when --schema is used

2021-02-21 Thread David Fetter
On Thu, Feb 18, 2021 at 11:13:06AM +0100, Guillaume Lelarge wrote:
> Le mar. 26 janv. 2021 à 13:42, Guillaume Lelarge  a
> écrit :
> 
> > Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge 
> > a écrit :
> >
> >> Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud  a
> >> écrit :
> >>
> >>> On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge
> >>>  wrote:
> >>> >
> >>> > "Anytime soon" was a long long time ago, and I eventually completely
> >>> forgot this, sorry. As nobody worked on it yet, I took a shot at it. See
> >>> attached patch.
> >>>
> >>> Great!
> >>>
> >>> I didn't reviewed it thoroughly yet, but after a quick look it sounds
> >>> sensible.  I'd prefer to see some tests added, and it looks like a
> >>> test for plpgsql could be added quite easily.
> >>>
> >>>
> >> I tried that all afternoon yesterday, but failed to do so. My had still
> >> hurts, but I'll try again though it may take some time.
> >>
> >>
> > s/My had/My head/ ..
> >
> >
> I finally managed to get a working TAP test for my patch. I have no idea if
> it's good, and if it's enough. Anyway, new version of the patch attached.
> 
> 
> -- 
> Guillaume.

Thanks for doing this work!

> diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
> index bcbb7a25fb..95d45fabfb 100644
> --- a/doc/src/sgml/ref/pg_dump.sgml
> +++ b/doc/src/sgml/ref/pg_dump.sgml
> @@ -215,6 +215,38 @@ PostgreSQL documentation
>
>   
>  
> + 
> +  -e  class="parameter">pattern
> +  --extension= class="parameter">pattern
> +  
> +   
> +Dump only extensions matching  +class="parameter">pattern.  When this option is not
> +specified, all non-system extensions in the target database will be
> +dumped.  Multiple schemas can be selected by writing multiple

I think this should read "Multiple extensions".

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [Proposal] Page Compression for OLTP

2021-02-18 Thread David Fetter
On Tue, Feb 16, 2021 at 11:15:36PM +0800, chenhj wrote:
> At 2021-02-16 21:51:14, "Daniel Gustafsson"  wrote:
> 
> >> On 16 Feb 2021, at 15:45, chenhj  wrote:
> >
> >> I want to know whether this patch can be accepted by the community, that 
> >> is, whether it is necessary for me to continue working for this Patch. 
> >> If you have any suggestions, please feedback to me.
> >
> >It doesn't seem like the patch has been registered in the commitfest app so 
> >it
> >may have been forgotten about, the number of proposed patches often outnumber
> >the code review bandwidth.  Please register it at:
> >
> > https://commitfest.postgresql.org/32/
> >
> >..to make sure it doesn't get lost.
> >
> >--
> 
> >Daniel Gustafssonhttps://vmware.com/
> 
> 
> Thanks, I will complete this patch and registered it later.
> Chen Huajun

The simplest way forward is to register it now so it doesn't miss the
window for the upcoming commitfest (CF), which closes at the end of
this month. That way, everybody has the entire time between now and
the end of the CF to review the patch, work on it, etc, and the CF bot
will be testing it against the changing code base to ensure people
know if such a change causes it to need a rebase.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Tid scan improvements

2021-02-16 Thread David Fetter
On Tue, Feb 16, 2021 at 10:22:41PM +1300, David Rowley wrote:
> On Thu, 4 Feb 2021 at 23:51, David Rowley  wrote:
> > Updated patch attached.
> 
> I made another pass over this patch and did a bit of renaming work
> around the heap_* functions and the tableam functions. I think the new
> names are a bit more aligned to the existing names.

Thanks! I'm looking forward to making use of this :)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Fuzz testing COPY FROM parsing

2021-02-05 Thread David Fetter
On Fri, Feb 05, 2021 at 12:45:30PM +0200, Heikki Linnakangas wrote:
> Hi,
> 
> I've been mucking around with COPY FROM lately, and to test it, I wrote some
> tools to generate input files and load them with COPY FROM:
> 
> https://github.com/hlinnaka/pgcopyfuzz

Neat!

The way it's already produced results is impressive.

Looking at honggfuzz, I see it's been used for wire protocols, of
which we have several.  Does testing our wire protocols seem like a
big lift?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Add SQL function for SHA1

2021-01-25 Thread David Fetter
On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote:
> Hi all,
> 
> SHA-1 is now an option available for cryptohashes, and like the
> existing set of functions of SHA-2, I don't really see a reason why we
> should not have a SQL function for SHA1.  Attached is a patch doing
> that.

Thanks for doing this!

While there are applications SHA1 is no longer good for, there are
plenty where it's still in play. One I use frequently is git. While
there are plans for creating an upgrade path to more cryptographically
secure hashes, it will take some years before repositories have
converted over.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: popcount

2021-01-19 Thread David Fetter
On Tue, Jan 19, 2021 at 07:58:12AM -0500, Robert Haas wrote:
> On Tue, Jan 19, 2021 at 3:06 AM Peter Eisentraut
>  wrote:
> > On 2021-01-18 16:34, Tom Lane wrote:
> > > Peter Eisentraut  writes:
> > >> [ assorted nits ]
> > >
> > > At the level of bikeshedding ... I quite dislike using the name "popcount"
> > > for these functions.  I'm aware that some C compilers provide primitives
> > > of that name, but I wouldn't expect a SQL programmer to know that;
> > > without that context the name seems pretty random and unintuitive.
> > > Moreover, it invites confusion with SQL's use of "pop" to abbreviate
> > > "population" in the statistical aggregates, such as var_pop().
> >
> > I was thinking about that too, but according to
> > <https://en.wikipedia.org/wiki/Hamming_weight>, popcount is an accepted
> > high-level term, with "pop" also standing for "population".
> 
> Yeah, I am not sure that it's going to be good to invent our own
> name for this, although maybe. But at least I think we should make
> sure there are some good comments in an easily discoverable place.
> Some people seem to think every programmer in the universe should
> know what things like popcount() and fls() and ffs() and stuff like
> that are, but it's far from obvious and I often have to refresh my
> memory.  Let's make it easy for someone to figure out, if they don't
> know already.

I went with count_set_bits() for the next version, and I hope that
helps clarify what it does.

> Like just a comment that says "this returns the number of 1 bits in
> the integer supplied as an argument" or something can save somebody
> a lot of trouble.

You bring up an excellent point, which is that our builtin functions
could use a lot more documentation directly to hand than they now
have.  For example, there's a lot of needless ambiguity created by
function comments which leave it up in the air as to which positional
argument does what in functions like string_to_array, which take
multiple arguments. I'll try to get a patch in for the next CF with a
fix for that, and a separate one that doesn't put it on people to use
\df+ to find the comments we do provide. There have been proposals for
including an optional space for COMMENT ON in DDL, but I suspect that
those won't fly unless and until they make their way into the
standard.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: popcount

2021-01-18 Thread David Fetter
On Mon, Jan 18, 2021 at 10:34:10AM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > [ assorted nits ]

fixed, I think.

> At the level of bikeshedding ... I quite dislike using the name "popcount"
> for these functions.  I'm aware that some C compilers provide primitives
> of that name, but I wouldn't expect a SQL programmer to know that;
> without that context the name seems pretty random and unintuitive.
> Moreover, it invites confusion with SQL's use of "pop" to abbreviate
> "population" in the statistical aggregates, such as var_pop().
> 
> Perhaps something along the lines of count_ones() or count_set_bits()
> would be more apropos.

Done that way.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 92f3d5bf1718b1b52a5a4d792f5c75e0bcc7fb74 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v4] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index aa99665e2e..7626edeee7 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4030,6 +4030,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   
  
 
+  
+   
+
+ count_set_bits
+
+count_set_bits ( bytes bytea )
+bigint
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+count_set_bits('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ count_set_bits
+
+count_set_bits ( bits bit )
+bigint
+   
+   
+Counts the bits set in a bit string.
+   
+   
+count_set_bits(B'101010101010101010')
+9
+   
+  
+
   

 
@@ -4869,6 +4903,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index b5f52d4e4a..416ee0c2fb 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'count_set_bits', prorettype => 'int8', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3865,6 +3868,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'count_set_bits', prorettype => 'int8', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 2235866244..c9c6c73422 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -36,6 +36,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1878,3 +1879,30 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit string.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	/* There's really no chance of an overflow here because
+	 * to get to INT64_MAX set bits, an object would have to be
+	 * an exbibyte long, exceeding what PostgreSQL can currently
+	 * store by a factor of 2^28
+	 */
+	int64		popcount;
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len;
+
+	p = VARBITS(arg1);
+	len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+
+	popcount = pg_popcount((char *)p, len);
+
+	PG_RETURN_INT64(popcount);
+}
diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c
index 479ed9ae54..3f1179a0e8 100644
--- src/backend/utils/adt/varlena.c
+++ src/backend/utils/adt/varlena.c
@@ -3439,6 +3439,22 @@ bytea_overlay(bytea 

Re: NOT VALID for Unique Indexes

2021-01-17 Thread David Fetter
On Thu, Jan 14, 2021 at 04:22:17PM +, Simon Riggs wrote:
> As you may be aware the NOT VALID qualifier currently only applies to
> CHECK and FK constraints, but not yet to unique indexes. I have had
> customer requests to change that.

This is a great feature!

Not exactly on point with this, but in a pretty closely related
context, is there some way we could give people the ability to declare
at their peril that a constraint is valid without incurring the full
scan that VALIDATE currently does? This is currently doable by
fiddling directly with the catalog, which operation is broadly more
dangerous and ill-advised.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: popcount

2021-01-11 Thread David Fetter
On Mon, Jan 11, 2021 at 03:50:54PM +0100, Peter Eisentraut wrote:
> On 2020-12-30 17:41, David Fetter wrote:
> > > The input may have more than 2 billion bits set to 1. The biggest possible
> > > result should be 8 billion for bytea (1 GB with all bits set to 1).
> > > So shouldn't this function return an int8?
> > It does now, and thanks for looking at this.
> 
> The documentation still reflects the previous int4 return type (in two
> different spellings, too).

Thanks for looking this over!

Please find attached the next version with corrected documentation.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f0f8e0639a4d08db7d454d5181aa9d98d31264a3 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v3] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 02a37658ad..1d86d610dd 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ popcount
+
+popcount ( bytes bytea )
+bigint
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+popcount('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
+  
+   
+
+ popcount
+
+popcount ( bits bit )
+bigint
+   
+   
+Counts the bits set in a bit string.
+   
+   
+popcount(B'101010101010101010')
+9
+   
+  
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index d7b55f57ea..2d53e0d46d 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3865,6 +3868,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 2235866244..a6a44f3f4f 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -36,6 +36,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1878,3 +1879,29 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit array.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len;
+	int8		popcount;
+
+	/*
+	 * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE)
+	 * done to minimize branches and instructions.
+	 */
+	len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE;
+	p = VARBITS(arg1);
+
+	popcount = pg_popcount((const char *)p, len);
+
+	PG_RETURN_INT64(popcount);
+}
diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c
index 4838bfb4ff..da3ba769c4 100644
--- src/backend/utils/adt/varlena.c
+++ src/backend/utils/adt/varlena.c
@@ -3433,6 +3433,22 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl)
 	return result;
 }
 
+/*
+ * popcount
+ */
+Datum
+byteapopcount(PG_FUNCTION_ARGS)
+{
+	bytea	*t1 = PG_GETARG_BYTEA_PP(0);
+	int		len;
+	int8	result;
+
+	len = VARSIZE_ANY_EXHDR(t1);
+	result = pg_popcount(VARDATA_ANY(t1), len);
+
+	PG_RETURN_INT64(result);
+}
+
 /*
  * byteapos -
  *	  Return the position of the specified substring.
diff --git src/test/regress/expected/bit.out

Re: Implement for window functions

2021-01-03 Thread David Fetter
On Fri, Jan 01, 2021 at 01:21:10PM -0800, Zhihong Yu wrote:
> Krasiyan:
> Happy New Year.
> 
> For WinGetFuncArgInPartition():
> 
> +   if (target > 0)
> +   step = 1;
> +   else if (target < 0)
> +   step = -1;
> +   else
> +   step = 0;
> 
> When would the last else statement execute ? Since the above code is
> for WINDOW_SEEK_CURRENT, I wonder why step should be 0.

If it does actually need step to be one of those three choices, it
might be shorter (well, less branchy) to write as

step = (target > 0) - (target < 0);

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Tid scan improvements

2020-12-31 Thread David Fetter
On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote:
> On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote:
> > So, I think we need to either get some help from someone familiar with
> > heapam.c, or maybe shelve the patch.  It has been work in progress for
> > over a year now.
> 
> Okay, still nothing has happened after two months.  Once this is
> solved a new patch submission could be done.  For now I have marked
> the entry as returned with feedback.  

I dusted off the last version of the patch without implementing the
suggestions in this thread between here and there.

I think this is a capability worth having, as I was surprised when it
turned out that it didn't exist when I was looking to make an
improvement to pg_dump. My idea, which I'll get back to when this is
in, was to use special knowledge of heap AM tables to make it possible
to parallelize dumps of large tables by working separately on each
underlying file, of which there could be quite a few for a large one.

Will try to understand the suggestions upthread better and implement
same.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 96046239014de8a7dec62e2f60b5210deb1bd32a Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Thu, 31 Dec 2020 16:42:07 -0800
Subject: [PATCH v10] first cut
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


 create mode 100644 src/include/executor/nodeTidrangescan.h
 create mode 100644 src/backend/executor/nodeTidrangescan.c
 create mode 100644 src/test/regress/expected/tidrangescan.out
 create mode 100644 src/test/regress/sql/tidrangescan.sql


--2.29.2
Content-Type: text/x-patch; name="v10-0001-first-cut.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v10-0001-first-cut.patch"

diff --git src/include/access/tableam.h src/include/access/tableam.h
index 387eb34a61..5776f8ba6e 100644
--- src/include/access/tableam.h
+++ src/include/access/tableam.h
@@ -218,6 +218,15 @@ typedef struct TableAmRoutine
 bool set_params, bool allow_strat,
 bool allow_sync, bool allow_pagemode);
 
+	/*
+	 * Set the range of a scan.
+	 *
+	 * Optional callback: A table AM can implement this to enable TID range
+	 * scans.
+	 */
+	void		(*scan_setlimits) (TableScanDesc scan,
+   BlockNumber startBlk, BlockNumber numBlks);
+
 	/*
 	 * Return next tuple from `scan`, store in slot.
 	 */
@@ -875,6 +884,16 @@ table_rescan(TableScanDesc scan,
 	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
 }
 
+/*
+ * Set the range of a scan.
+ */
+static inline void
+table_scan_setlimits(TableScanDesc scan,
+	 BlockNumber startBlk, BlockNumber numBlks)
+{
+	scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks);
+}
+
 /*
  * Restart a relation scan after changing params.
  *
diff --git src/include/catalog/pg_operator.dat src/include/catalog/pg_operator.dat
index 9c6bf6c9d1..bb7193b9e7 100644
--- src/include/catalog/pg_operator.dat
+++ src/include/catalog/pg_operator.dat
@@ -237,15 +237,15 @@
   oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt',
   oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' },
-{ oid => '2800', descr => 'greater than',
+{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than',
   oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt',
   oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' },
-{ oid => '2801', descr => 'less than or equal',
+{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle',
   oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' },
-{ oid => '2802', descr => 'greater than or equal',
+{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal',
   oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool',
   oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge',
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
diff --git src/include/executor/nodeTidrangescan.h src/include/executor/nodeTidrangescan.h
new file mode 100644
index 

WIP: document the hook system

2020-12-30 Thread David Fetter
Please find attached :)

This could probably use a lot of filling in, but having it in the
actual documentation beats needing to know folklore even to know
that the capability is there.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 5152b3f5255ec91b6ea34b76ea765a26d392d3ac Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 19:13:57 -0800
Subject: [PATCH v1] WIP: Document the hooks system
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Outline of something that should probably have more detail, but it's
probably better than nothing at all.

 create mode 100644 doc/src/sgml/hooks.sgml


--2.29.2
Content-Type: text/x-patch; name="v1-0001-WIP-Document-the-hooks-system.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="v1-0001-WIP-Document-the-hooks-system.patch"

diff --git doc/src/sgml/filelist.sgml doc/src/sgml/filelist.sgml
index 38e8aa0bbf..16f78b371a 100644
--- doc/src/sgml/filelist.sgml
+++ doc/src/sgml/filelist.sgml
@@ -58,6 +58,7 @@
 
 
 
+
 
 
 
diff --git doc/src/sgml/hooks.sgml doc/src/sgml/hooks.sgml
new file mode 100644
index 00..5891f74dc8
--- /dev/null
+++ doc/src/sgml/hooks.sgml
@@ -0,0 +1,321 @@
+
+
+
+ Hooks System
+
+ 
+  Hook
+ 
+
+ 
+  This chapter explains the PostgreSQL
+  hooks system, a way to inject functionality in many different parts
+  of the database via function pointers to shared libraries. They simply
+  need to honor the hooks API, which consists roughly of the following:
+
+  
+
+ 
+ Initialize a hook of the correct type for what you are doing, conventionally
+ using a function with signature:
+
+void _PG_init(void)
+
+ which initializes whatever the hook code needs initialized.  Here, you will cache
+ any previous hook(s) with code that looks like:
+
+prev_Foo = Foo_hook;
+prev_Bar = Bar_hook;
+
+ 
+
+
+
+ 
+ Let your imagination run wild, but try very hard not to crash while doing so.
+ 
+
+
+
+ 
+ Restore cached hooks in a function conventionally called
+
+void _PG_fini(void)
+
+ with code that looks like:
+
+Foo_hook = prev_Foo;
+Bar_hook = prev_Bar;
+
+ 
+
+
+  
+ 
+
+ 
+  Available Hooks
+
+  
+   The following backend hooks are available for your use:
+  
+
+  
+   Analyze hook
+
+   
+Take control at the end of parse analysis.
+   
+  
+
+  
+   Authentication hook
+
+   
+Take control in ClientAuthentication.
+   
+  
+
+  
+   Logging hook
+
+   
+Intercept messages before they are sent to the server log.
+   
+  
+
+  
+   Executor hooks
+
+   
+The following hooks control the executor:
+   
+
+   
+Executor Start Hook
+
+
+ Take control in ExecutorStart().
+
+   
+
+   
+Executor Run Hook
+
+
+ Take control in ExecutorRun().
+
+   
+
+   
+Executor Finish Hook
+
+
+ Take control in ExecutorFinish().
+
+   
+
+   
+Executor End Hook
+
+
+ Take control in ExecutorEnd().
+
+   
+
+   
+Executor Check Permissions Hook
+
+
+ Take control in ExecutorCheckRTPerms().
+
+   
+
+  
+
+  
+   Explain hook
+
+   
+Take control of EXPLAIN.
+   
+
+   
+Explain One Query
+
+
+ Take control in ExplainOneQuery().
+
+   
+
+   
+Explain Get Index Name
+
+
+ Take control in explain_get_index_name().
+
+   
+
+  
+
+  
+   Function Call Hooks
+
+   
+Needs Function Manager
+
+
+ Look around and decide whether the function manager hook is needed.
+
+   
+
+   
+Function Manager
+
+
+ Take control at start, end, or abort of a ROUTINE.
+
+   
+  
+
+  
+   Shared Memory Startup
+
+   
+Take control of chunks of shared memory at startup.
+   
+  
+
+  
+   OpenSSL TLS Init
+
+   
+Take control in initialization of OpenSSL.
+   
+  
+
+  
+   Get Attribute Average Width
+
+   
+Take control in get_attavgwidth().
+   
+  
+
+  
+   Object Access Hooks
+
+   
+Take control just before or just after CREATE,
+DROP, ALTER,
+namespace search, function execution, and TRUNCATE.
+   
+  
+
+  
+   Path (Optimizer) Hooks
+
+   
+Take control during query planning.
+   
+
+   
+Set Relation Pathlist
+
+
+ Take control in set_rel_pathlist().
+
+   
+
+   
+Set Join Pathlist
+
+
+ Take control in set_join_pathlist().
+
+   
+
+   
+Join Search
+
+
+ Take control in join_search().
+
+   
+  
+
+  
+   Get Relation Info Hook
+
+   
+Take control in get_relation_info().
+   
+  
+
+  
+   Planner Hooks
+
+   
+Take control in parts of the planner inten

Re: Let people set host(no)ssl settings from initdb

2020-12-30 Thread David Fetter
On Wed, Dec 30, 2020 at 03:00:17PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > On Wed, Dec 30, 2020 at 08:24:06PM +0100, David Fetter wrote:
> >> On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote:
> >>> I have looked at the patch of this thread, and I doubt that it is a
> >>> good idea to put more burden into initdb for that.  I agree that
> >>> being able to reject easily non-SSL connections in pg_hba.conf is a
> >>> bit of a hassle now, but putting more logic into initdb does not seem
> >>> the right course to me.  Perhaps we could consider an idea like
> >>> Peter's to have a sslmode=require on the server side and ease the
> >>> generation of HBA rules..
> 
> >> Please find attached the rebased patch.
> >> 
> >> Peter's suggestion seems a little more subtle to me than requiring TLS
> >> on the server side in that what people generally want to do is
> >> disallow clear text connections entirely.  In those scenarios, people
> >> would also want to set (and be able to change at runtime) some kind of
> >> cryptographic policy, as SSH and TLS do. While I see this as a worthy
> >> goal, it's a much bigger lift than an optional argument or two to
> >> initdb, and requires a lot more discussion than it's had to date.
> 
> FWIW, I still agree with what Michael says above.  I do not think
> that adding more options to initdb is a useful solution here.
> In the first place, it's unlikely that we'll manage to cover many
> people's exact requirements this way.  In the second place, it's
> very unclear where to stop adding options.  In the third place,
> I believe the vast majority of users don't invoke initdb "by hand"
> anymore.  The typical scenario is to go through a packager-provided
> script, which almost certainly won't offer access to these additional
> options.  In the fourth place, many people won't know at initdb time
> exactly what they should do, or they'll change their minds later.

To that last, I suspect that there are folks in regulated industries
who want to be able to show that they've deployed at some kind of
minimal level of protection.  If there's not a window during which a
non-conforming pg_hba.conf is in play, that's easier to do.

> The last two points suggest that what'd be more useful is some sort
> of tool to modify an existing pg_hba.conf file.  Or maybe even just
> audit a file to see if it implements $desired-policy, such as
> "no unencrypted network connections" or "no plaintext passwords".
> (I kind of like the auditing-tool approach; it seems less scary
> than something that actually rewrites the file.)

Am I understanding correctly that you're suggesting we write up a
formal specification of pg_hba.conf? That could be handy if we don't
choose to export the parser the backend uses for it, for example
because we want it to respond super quickly to HUPs, which might
conflict with making it usable by things that weren't the backend.

I agree that anything that does a write to pg_hba.conf needs to be
approached with a good deal of caution. Audit tools such as you
propose could spit out a suggestion that doesn't overwrite, although
it could get a little hairy if it's intended to patch something that
has an include directive in it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Implement for window functions

2020-12-30 Thread David Fetter
On Wed, Dec 30, 2020 at 09:32:26PM +0200, Krasiyan Andreev wrote:
> Hi, after latest committed patches about multirange datatypes, I get a
> compilation error,

Oh, right. I'd been meaning to send a patch to fix that. Here it is.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 7c8ae2cac7b2157f48e9e15c863eccde29a539b4 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 22 Dec 2020 19:47:35 -0800
Subject: [PATCH v2] Vik's  patch
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 5021ac1ca9..0e877c48df 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -20373,6 +20373,14 @@ SELECT count(*) FROM sometable;
about frame specifications.
   
 
+  
+   The functions lead, lag,
+   first_value, last_value, and
+   nth_value accept a null treatment option which is
+   RESPECT NULLS or IGNORE NULLS.
+   If this option is not specified, the default is RESPECT NULLS.
+  
+
   
When an aggregate function is used as a window function, it aggregates
over the rows within the current row's window frame.
@@ -20386,14 +20394,9 @@ SELECT count(*) FROM sometable;
 
   

-The SQL standard defines a RESPECT NULLS or
-IGNORE NULLS option for lead, lag,
-first_value, last_value, and
-nth_value.  This is not implemented in
-PostgreSQL: the behavior is always the
-same as the standard's default, namely RESPECT NULLS.
-Likewise, the standard's FROM FIRST or FROM LAST
-option for nth_value is not implemented: only the
+The SQL standard defines a FROM FIRST or FROM LAST
+option for nth_value.  This is not implemented in
+PostgreSQL: only the
 default FROM FIRST behavior is supported.  (You can achieve
 the result of FROM LAST by reversing the ORDER BY
 ordering.)
diff --git doc/src/sgml/ref/create_function.sgml doc/src/sgml/ref/create_function.sgml
index 3c1eaea651..31e08c26b4 100644
--- doc/src/sgml/ref/create_function.sgml
+++ doc/src/sgml/ref/create_function.sgml
@@ -28,6 +28,7 @@ CREATE [ OR REPLACE ] FUNCTION
   { LANGUAGE lang_name
 | TRANSFORM { FOR TYPE type_name } [, ... ]
 | WINDOW
+| TREAT NULLS
 | IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF
 | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
 | [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
@@ -293,6 +294,17 @@ CREATE [ OR REPLACE ] FUNCTION
  
 
 
+
+ TREAT NULLS
+
+ 
+  TREAT NULLS indicates that the function is able
+  to handle the RESPECT NULLS and IGNORE
+  NULLS options.  Only window functions may specify this.
+  
+ 
+
+
 
  IMMUTABLE
  STABLE
diff --git doc/src/sgml/syntax.sgml doc/src/sgml/syntax.sgml
index d66560b587..103595d21b 100644
--- doc/src/sgml/syntax.sgml
+++ doc/src/sgml/syntax.sgml
@@ -1766,6 +1766,8 @@ FROM generate_series(1,10) AS s(i);
 The syntax of a window function call is one of the following:
 
 
+function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER window_name
+function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER ( window_definition )
 function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
 function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
 function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
@@ -1779,6 +1781,18 @@ FROM generate_series(1,10) AS s(i);
 [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ frame_clause ]
 
+   
+
+   
+
+ The versions with RESPECT NULLS or IGNORE
+ NULLS only apply to true window functions, whereas the versions
+ with FILTER only apply to aggregate functions used as
+ window functions.
+
+   
+
+   
 The optional frame_clause
 can be one of
 
diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index 139f4a08bd..50c7b47be2 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -9770,33 +9770,42 @@
   proargtypes => 'int4', prosrc => 'window_ntile' },
 { oid => '3106', descr => 'fetch the preceding row value',
   proname => 'lag', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement', prosrc => 'window_lag' },
+  proargtypes => 'anyelement', prosrc => 'window_lag',
+  pronulltreatment => 't' },
 { oid => '3107', descr => 'fetch the Nth preceding row value',
   proname => 'lag', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement int4',

Re: Let people set host(no)ssl settings from initdb

2020-12-30 Thread David Fetter
On Wed, Dec 30, 2020 at 08:24:06PM +0100, David Fetter wrote:
> On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote:
> > On Thu, Jul 02, 2020 at 04:02:21PM +0200, Daniel Gustafsson wrote:
> > > The CF Patch Tester consider this patch to be malformed and is unable to 
> > > apply
> > > and test it.  Can you please submit a rebased version?
> > 
> > I have looked at the patch of this thread, and I doubt that it is a
> > good idea to put more burden into initdb for that.  I agree that
> > being able to reject easily non-SSL connections in pg_hba.conf is a
> > bit of a hassle now, but putting more logic into initdb does not seem
> > the right course to me.  Perhaps we could consider an idea like
> > Peter's to have a sslmode=require on the server side and ease the
> > generation of HBA rules..
> > 
> > The patch has stalled for two months now without a rebase provided, so
> > I am marking it as returned with feedback.
> 
> Please find attached the rebased patch.
> 
> Peter's suggestion seems a little more subtle to me than requiring TLS
> on the server side in that what people generally want to do is
> disallow clear text connections entirely.  In those scenarios, people
> would also want to set (and be able to change at runtime) some kind of
> cryptographic policy, as SSH and TLS do. While I see this as a worthy
> goal, it's a much bigger lift than an optional argument or two to
> initdb, and requires a lot more discussion than it's had to date.

*sigh*

This time with patch actually attached.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 537e58136890d17d33699e9965d0518d4c8a1822 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 11 Dec 2019 18:55:03 -0800
Subject: [PATCH v5] Enable setting pg_hba.conf permissions from initdb
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


The new optional arguments are --auth-hostssl, --auth-hostnossl,
--auth-hostgssenc, and --auth-hostnogssenc.

diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml
index 385ac25150..c63e2c316e 100644
--- doc/src/sgml/ref/initdb.sgml
+++ doc/src/sgml/ref/initdb.sgml
@@ -152,6 +152,50 @@ PostgreSQL documentation
   
  
 
+ 
+  --auth-hostssl=authmethod
+  
+   
+This option specifies the authentication method for users via
+TLS remote connections used in pg_hba.conf
+(hostssl lines).
+   
+  
+ 
+
+ 
+  --auth-hostnossl=authmethod
+  
+   
+This option specifies the authentication method for users via
+non-TLS remote connections used in pg_hba.conf
+(hostnossl lines).
+   
+  
+ 
+
+ 
+  --auth-hostgssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+encrypted GSSAPI remote connections used in pg_hba.conf
+(hostgssenc lines).
+   
+  
+ 
+
+ 
+  --auth-hostnogssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+remote connections not using encrypted GSSAPI in pg_hba.conf
+(hostnogssenc lines).
+   
+  
+ 
+
  
   --auth-local=authmethod
   
diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample
index b6de12b298..e132cf4db3 100644
--- src/backend/libpq/pg_hba.conf.sample
+++ src/backend/libpq/pg_hba.conf.sample
@@ -91,3 +91,15 @@ hostall all ::1/128 @authmethodhost@
 @remove-line-for-nolocal@local   replication all @authmethodlocal@
 hostreplication all 127.0.0.1/32@authmethodhost@
 hostreplication all ::1/128 @authmethodhost@
+
+# hostnossl all   all 0.0.0.0/0   @authmethodhostnossl@
+# hostnossl all   all ::/0@authmethodhostnossl@
+
+# hostssl all all 0.0.0.0/0   @authmethodhostssl@
+# hostssl all all ::/0@authmethodhostssl@
+
+# hostnogssenc all   all 0.0.0.0/0   @authmethodhostnogssenc@
+# hostnogssenc all   all ::/0@authmethodhostnogssenc@
+
+# hostgssenc all all 0.0.0.0/0   @authmethodhostgssenc@
+# hostgssenc all all ::/0@authmethodhostgssenc@
diff --git src/bin/init

Re: Let people set host(no)ssl settings from initdb

2020-12-30 Thread David Fetter
On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote:
> On Thu, Jul 02, 2020 at 04:02:21PM +0200, Daniel Gustafsson wrote:
> > The CF Patch Tester consider this patch to be malformed and is unable to 
> > apply
> > and test it.  Can you please submit a rebased version?
> 
> I have looked at the patch of this thread, and I doubt that it is a
> good idea to put more burden into initdb for that.  I agree that
> being able to reject easily non-SSL connections in pg_hba.conf is a
> bit of a hassle now, but putting more logic into initdb does not seem
> the right course to me.  Perhaps we could consider an idea like
> Peter's to have a sslmode=require on the server side and ease the
> generation of HBA rules..
> 
> The patch has stalled for two months now without a rebase provided, so
> I am marking it as returned with feedback.

Please find attached the rebased patch.

Peter's suggestion seems a little more subtle to me than requiring TLS
on the server side in that what people generally want to do is
disallow clear text connections entirely.  In those scenarios, people
would also want to set (and be able to change at runtime) some kind of
cryptographic policy, as SSH and TLS do. While I see this as a worthy
goal, it's a much bigger lift than an optional argument or two to
initdb, and requires a lot more discussion than it's had to date.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: popcount

2020-12-30 Thread David Fetter
On Wed, Dec 30, 2020 at 05:27:04PM +0100, Daniel Verite wrote:
>   David Fetter wrote:
> 
> +Datum
> +byteapopcount(PG_FUNCTION_ARGS)
> +{
> + bytea   *t1 = PG_GETARG_BYTEA_PP(0);
> + int len, result;
> +
> + len = VARSIZE_ANY_EXHDR(t1);
> + result = pg_popcount(VARDATA_ANY(t1), len);
> +
> + PG_RETURN_INT32(result);
> +}
> 
> The input may have more than 2 billion bits set to 1. The biggest possible
> result should be 8 billion for bytea (1 GB with all bits set to 1).
> So shouldn't this function return an int8?

It does now, and thanks for looking at this.

> There is a pre-existing function in core: bit_length(bytea) returning int4,
> but it errors out for large values (in fact it's a SQL function that returns
> octet_length($1)*8).
> 
> For the varbit case, it doesn't seem like it can hold so many bits, although
> I don't see the limit documented in
> https://www.postgresql.org/docs/current/datatype-bit.html

Out of an abundance of caution, I changed that one, too.

I'd noticed earlier that ceil() doesn't need to be quite as wasteful
as the way I had it earlier, and fixed that with help from Andrew
Gierth.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 6d35100b3bc16c1c7d3bf08c62589f3e039cee76 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v2] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 5021ac1ca9..439a3a4a06 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ popcount
+
+popcount ( bytes bytea )
+integer
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+popcount('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
+  
+   
+
+ popcount
+
+popcount ( bits bit )
+int4
+   
+   
+Counts the bits set in a bit string.
+   
+   
+popcount(B'101010101010101010')
+9
+   
+  
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index 139f4a08bd..82e3d50114 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3865,6 +3868,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 3c03459f51..a71c2b6bf4 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -36,6 +36,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1872,3 +1873,29 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit array.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len;
+	int8		popcount;
+
+	/*
+	 * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE)
+	 * done to minimize branches and instructions.
+	 */
+	len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE;
+	p = VARBITS(arg1);
+
+	popcount = pg_popcount((const char *)p, len);
+
+	PG_RETURN_INT64(popcount);
+}
diff --git src/backen

popcount

2020-12-30 Thread David Fetter
Hi,

Per request, I'd like to see about surfacing $Subject to SQL.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 28c9b7acad605197a8a242ff929bcce6f3100c91 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v1] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 5021ac1ca9..439a3a4a06 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ popcount
+
+popcount ( bytes bytea )
+integer
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+popcount('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
+  
+   
+
+ popcount
+
+popcount ( bits bit )
+int4
+   
+   
+Counts the bits set in a bit string.
+   
+   
+popcount(B'101010101010101010')
+9
+   
+  
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index 139f4a08bd..0e6d0636fa 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int4', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3865,6 +3868,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int4', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 3c03459f51..e92d1a7f8f 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -29,6 +29,8 @@
  *-
  */
 
+#include 
+
 #include "postgres.h"
 
 #include "access/htup_details.h"
@@ -36,6 +38,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1872,3 +1875,24 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit array.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len, popcount;
+
+	len = ceil(VARBITLEN(arg1) / (float4)BITS_PER_BYTE);
+	p = VARBITS(arg1);
+
+	popcount = (int)pg_popcount((const char *)p, len);
+
+	PG_RETURN_INT32(popcount);
+}
diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c
index 9300d19e0c..9e788babd3 100644
--- src/backend/utils/adt/varlena.c
+++ src/backend/utils/adt/varlena.c
@@ -3415,6 +3415,21 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl)
 	return result;
 }
 
+/*
+ * popcount
+ */
+Datum
+byteapopcount(PG_FUNCTION_ARGS)
+{
+	bytea	*t1 = PG_GETARG_BYTEA_PP(0);
+	int		len, result;
+
+	len = VARSIZE_ANY_EXHDR(t1);
+	result = pg_popcount(VARDATA_ANY(t1), len);
+
+	PG_RETURN_INT32(result);
+}
+
 /*
  * byteapos -
  *	  Return the position of the specified substring.
diff --git src/test/regress/expected/bit.out src/test/regress/expected/bit.out
index a1fab7ebcb..c91ad3d0d5 100644
--- src/test/regress/expected/bit.out
+++ src/test/regress/expected/bit.out
@@ -681,6 +681,19 @@ SELECT overlay(B'0101011100' placing '001' from 20);
  010101111
 (1 row)
 
+-- Popcount
+SELECT popcount(B'0101011100'::bit(10));
+ popcount 
+--
+5
+(1 row)
+
+SELECT popcount(B'11'::bit(10));
+ popcount 
+--
+   10
+(1 row)
+
 -- This table is intentionally left aroun

Re: range_agg

2020-12-27 Thread David Fetter
On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote:
> Hi,
> 
> This is not an ideal way to index multirages, but something we can
> easily have.

What sort of indexing improvements do you have in mind?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: \gsetenv

2020-12-20 Thread David Fetter
On Sun, Dec 20, 2020 at 10:42:40PM +0200, Heikki Linnakangas wrote:
> On 20/12/2020 21:05, David Fetter wrote:
> > We have plenty of ways to spawn shells and cause havoc, and we
> > wouldn't be able to block them all even if we decided to put a bunch
> > of pretty onerous restrictions on psql at this very late date. We have
> > \set, backticks, \!, and bunches of things less obvious that could,
> > even without a compromised server, cause real mischief.
> 
> There is a big difference between having to trust the server or not. Yeah,
> you could cause a lot of mischief if you let a user run arbitrary psql
> scripts on your behalf. But that's no excuse for opening up a whole another
> class of problems.

I'm skittish about putting exploits out in public in advance of
discussions about how to mitigate them, but I have constructed several
that do pretty bad things using only hostile content in a server and
the facilities `psql` already provides.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: \gsetenv

2020-12-20 Thread David Fetter
On Sun, Dec 20, 2020 at 01:07:12PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote:
> >> SELECT 'Calvin' AS foo \gset
> >> \setenv FOO :foo
> >> \! echo $FOO
> >> Calvin
> 
> > You're the second person who's mentioned this workaround, which goes
> > to a couple of points I tried to make earlier:
> 
> > - This is not by any means a new capability, just a convenience, and
> > - In view of the fact that it's a very old capability, the idea that
> >   it has implications for controlling access or other parts of the
> >   space of threat models is pretty silly.
> 
> This is essentially the same workaround as what we recommend for anyone
> who's unhappy with the fix for CVE-2020-25696: do \gset into a non-special
> variable and then copy to the special variable.  The reason it seems okay
> is that now it is clear that client-side logic intends the special
> variable change to happen.  Thus a compromised server cannot hijack your
> client-side session all by itself.  There's nonzero risk in letting the
> server modify your PROMPT1, PATH, or whatever, but you took the risk
> intentionally (and, presumably, it's necessary to your purposes).
> 
> I tend to agree with you that the compromised-server argument is a little
> bit of a stretch.  Still, we did have an actual user complaining about
> the case for \gset, and it's clear that in at least some scenarios this
> sort of attack could be used to parlay a server compromise into additional
> access.  So we're not likely to undo the CVE-2020-25696 fix, and we're
> equally unlikely to provide an unrestricted way to set environment
> variables directly from the server.
> 
> If we could draw a line between "safe" and "unsafe" environment
> variables, I'd be willing to consider a patch that allows directly
> setting only the former.  But I don't see how to draw that line.
> Most of the point of any such feature would have to be to affect
> the behavior of shell commands subsequently invoked with \! ...
> and we can't know what a given variable would do to those.  So on
> the whole I'm inclined to leave things as-is, where people have to
> do the assignment manually.

I suppose now's not a great time for this from an optics point of
view.  Taking on the entire security theater industry is way out of
scope for the PostgreSQL project.

We have plenty of ways to spawn shells and cause havoc, and we
wouldn't be able to block them all even if we decided to put a bunch
of pretty onerous restrictions on psql at this very late date. We have
\set, backticks, \!, and bunches of things less obvious that could,
even without a compromised server, cause real mischief. I believe that
a more effective way to deal with this reality in a way that helps
users is to put clear warnings in the documentation about the fact
that psql programs are at least as capable as shell programs in that
they are innately capable of doing anything that the psql's system
user is authorized to do.

Would a patch along that line help?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: \gsetenv

2020-12-20 Thread David Fetter
On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote:
> Hello David,
> 
> > We have \gset to set some parameters, but not ones in the environment,
> > so I fixed this with a new analogous command, \gsetenv. I considered
> > refactoring SetVariable to include environment variables, but for a
> > first cut, I just made a separate function and an extra if.
> 
> My 0.02€: ISTM that you do not really need that, it can already be achieved
> with gset, so I would not bother to add a gsetenv.
> 
>   sh> psql
> SELECT 'Calvin' AS foo \gset
> \setenv FOO :foo
> \! echo $FOO
> Calvin

Thanks!

You're the second person who's mentioned this workaround, which goes
to a couple of points I tried to make earlier:

- This is not by any means a new capability, just a convenience, and
- In view of the fact that it's a very old capability, the idea that
  it has implications for controlling access or other parts of the
  space of threat models is pretty silly.

Having dispensed with the idea that there's a new attack surface here,
I'd like to request that people at least have a look at it as a
feature psql users might appreciate having. As the author, I obviously
see it that way, but again as the author, it's not for me to make that
call.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: \gsetenv

2020-12-16 Thread David Fetter
On Wed, Dec 16, 2020 at 05:30:13PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > We have \gset to set some parameters, but not ones in the environment,
> > so I fixed this with a new analogous command, \gsetenv.
> 
> In view of the security complaints we just had about \gset
> (CVE-2020-25696), I cannot fathom why we'd consider adding another
> way to cause similar problems.

The RedHat site says, in part:

the attacker can execute arbitrary code as the operating system
account running psql

This is true of literally everything that requires a login shell in
order to use. I remember a "virus" my friend Keith McMillan wrote in
TeX back in the 1994. You can download the PostScript file detailing
the procedure, bearing in mind that PostScript also contains ways to
execute arbitrary code if opened:

ftp://ftp.cerias.purdue.edu/pub/doc/viruses/KeithMcMillan-PlatformIndependantVirus.ps

That one got itself a remote code execution by fiddling with a
person's .emacs, and it got Keith a master's degree in CS.  I suspect
that equally nasty things are possible when it comes to \i and \o in
psql. It would be a terrible idea to hobble psql in the attempt to
prevent such attacks.

> We were fortunate enough to be able to close off the main security
> risk of \gset without deleting the feature altogether ... but how
> exactly would we distinguish "safe" from "unsafe" environment
> variables?  It kind of seems like anything that would be worth
> setting at all would tend to fall into the "unsafe" category,
> because the implications of setting it would be unclear.  But *for
> certain* we're not taking a patch that allows remotely setting PATH
> and things like that.

Would you be so kind as to explain what the actual problem is here
that not doing this would mitigate?

If people run code they haven't seen from a server they don't trust,
neither psql nor anything else[1] can protect them from the
consequences. Seeing what they're about to run is dead easy in this
case because \gsetenv, like \gset and what in my view is the much more
dangerous \gexec, is something anyone with the tiniest modicum of
caution would run only after testing it with \g.

> Besides which, you haven't bothered with even one word of positive
> justification.  What's the non-hazardous use case?

Thanks for asking, and my apologies for not including it.

I ran into a situation where we sometimes got a very heavily loaded
and also well-protected PostgreSQL server. At times, just getting a
shell on it could take a few tries. To mitigate situations like that,
I used a method that's a long way from new, abstruse, or secret: have
psql open in a long-lasting tmux or screen session. It could both
access the database at a time when getting a new connection would be
somewhere between difficult and impossible.  The bit that's unlikely
to be new was when I noticed that it could also shell out
and send information off to other places, but only when I put together
a pretty baroque procedure that involved using combinations of \gset,
\o, and \!. All of the same things \gsetenv could do were doable with
those, just less convenient, so I drafted up a patch in the hope that
fewer others would find themselves jumping through the hoops I did to
get that set up.

Separately, I confess to some bafflement at the reasoning behind the
CVE you referenced. By the time an attacker has compromised a database
server, it's already game over. Code running on the compromised
database is capable of doing much nastier things than crashing a
client machine, and very likely has access to other high-value targets
on its own say-so than said client does.

Best,
David.

[1] search for "gods themselves" here:
https://en.wikiquote.org/wiki/Friedrich_Schiller
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




\gsetenv

2020-12-16 Thread David Fetter
Hi,

We have \gset to set some parameters, but not ones in the environment,
so I fixed this with a new analogous command, \gsetenv. I considered
refactoring SetVariable to include environment variables, but for a
first cut, I just made a separate function and an extra if.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 04059e68ffcd8cf4052ccb6a013f0cf2e0095eb8 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 16 Dec 2020 13:17:32 -0800
Subject: [PATCH v1] Implement \gsetenv, similar to \gset
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.28.0"

This is a multi-part message in MIME format.
--2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


In passing, make \setenv without arguments dump the environment in a way
similar to what bare \set and \pset do.

diff --git doc/src/sgml/ref/psql-ref.sgml doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..97d5a2ed19 100644
--- doc/src/sgml/ref/psql-ref.sgml
+++ doc/src/sgml/ref/psql-ref.sgml
@@ -2297,6 +2297,49 @@ hello 10
 
   
 
+  
+\gsetenv [ prefix ]
+
+
+
+ Sends the current query buffer to the server and stores the
+ query's output into environment variables.
+ The query to be executed must return exactly one row.  Each column of
+ the row is stored into a separate variable, named the same as the
+ column.  For example:
+
+= SELECT 'hello' AS var1, 10 AS var2
+- \gsetenv
+= \! echo $var1 $var2
+hello 10
+
+
+
+ If you specify a prefix,
+ that string is prepended to the query's column names to create the
+ variable names to use:
+
+= SELECT 'hello' AS var1, 10 AS var2
+- \gsetenv result_
+= \! echo $result_var1 $result_var2
+hello 10
+
+
+
+ If a column result is NULL, the corresponding environment variable is unset
+ rather than being set.
+
+
+ If the query fails or does not return one row,
+ no variables are changed.
+
+
+ If the current query buffer is empty, the most recently sent query
+ is re-executed instead.
+
+
+  
+
 
   
 \gx [ (option=value [...]) ] [ filename ]
diff --git src/bin/psql/command.c src/bin/psql/command.c
index 38b52d..8b8749aca6 100644
--- src/bin/psql/command.c
+++ src/bin/psql/command.c
@@ -94,7 +94,9 @@ static backslashResult process_command_g_options(char *first_option,
  const char *cmd);
 static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch);
-static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_gset(PsqlScanState scan_state,
+		 bool active_branch,
+		 GSET_TARGET gset_target);
 static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_html(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_include(PsqlScanState scan_state, bool active_branch,
@@ -346,7 +348,9 @@ exec_command(const char *cmd,
 	else if (strcmp(cmd, "gexec") == 0)
 		status = exec_command_gexec(scan_state, active_branch);
 	else if (strcmp(cmd, "gset") == 0)
-		status = exec_command_gset(scan_state, active_branch);
+		status = exec_command_gset(scan_state, active_branch, GSET_TARGET_PSET);
+	else if (strcmp(cmd, "gsetenv") == 0)
+		status = exec_command_gset(scan_state, active_branch, GSET_TARGET_ENV);
 	else if (strcmp(cmd, "h") == 0 || strcmp(cmd, "help") == 0)
 		status = exec_command_help(scan_state, active_branch);
 	else if (strcmp(cmd, "H") == 0 || strcmp(cmd, "html") == 0)
@@ -1451,10 +1455,11 @@ exec_command_gexec(PsqlScanState scan_state, bool active_branch)
 }
 
 /*
- * \gset [prefix] -- send query and store result into variables
+ * \gset[env] [prefix] -- send query and store result into variables
  */
 static backslashResult
-exec_command_gset(PsqlScanState scan_state, bool active_branch)
+exec_command_gset(PsqlScanState scan_state, bool active_branch,
+  GSET_TARGET gset_target)
 {
 	backslashResult status = PSQL_CMD_SKIP_LINE;
 
@@ -1463,6 +1468,8 @@ exec_command_gset(PsqlScanState scan_state, bool active_branch)
 		char	   *prefix = psql_scan_slash_option(scan_state,
 	OT_NORMAL, NULL, false);
 
+		pset.gset_target = gset_target;
+
 		if (prefix)
 			pset.gset_prefix = prefix;
 		else
@@ -2275,39 +2282,8 @@ exec_command_setenv(PsqlScanState scan_state, bool active_branch,
 	OT_NORMAL, NULL, false);
 		char	   *envval = psql_scan_slash_option(scan_state,
 		

Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-10 Thread David Fetter
On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote:
> I've pushed the core patch now.  The jsonb parts now have to be
> rebased onto this design, which I'm assuming Dmitry will tackle
> (I do not intend to).  It's not quite clear to me whether we have
> a meeting of the minds on what the jsonb functionality should be,
> anyway.  Alexander seemed to be thinking about offering an option
> to let the subscript be a jsonpath, but how would we distinguish
> that from a plain-text field name?
> 
> BTW, while reviewing the thread to write the commit message,
> I was reminded of my concerns around the "is it a container"
> business.  As things stand, if type A has a typelem link to
> type B, then the system supposes that A contains B physically;
> this has implications for what's allowed in DDL, for example
> (cf find_composite_type_dependencies() and other places).
> We now have a feature whereby subscripting can yield a type
> that is not contained in the source type in that sense.
> I'd be happier if the "container" terminology were reserved for
> that sort of physical containment, which means that I think a lot
> of the commentary around SubscriptingRef is misleading.  But I do
> not have a better word to suggest offhand.  Thoughts?

Would this be something more along the lines of a "dependent type," or
is that adding too much baggage?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: SELECT INTO deprecation

2020-12-02 Thread David Fetter
On Wed, Dec 02, 2020 at 12:58:36PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> > While reading about deprecating and removing various things in
> > other threads, I was wondering about how deprecated SELECT INTO
> > is.  There are various source code comments about this, but the
> > SELECT INTO reference page only contains soft language like
> > "recommended".  I'm proposing the attached patch to stick a more
> > explicit deprecation notice right at the top.
> 
> I don't see much value in this.  Users already have 5 years to adapt
> their code to new major versions of PG and that strikes me as plenty
> enough time and is why we support multiple major versions of PG for
> so long.  Users who keep pace and update for each major version
> aren't likely to have issue making this change since they're already
> used to regularly updating their code for new major versions, while
> others are going to complain no matter when we remove it and will
> ignore any deprecation notices we put out there, so there isn't much
> point in them.

+1 for removing it entirely and including this prominently in the
release notes.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: POC: postgres_fdw insert batching

2020-11-30 Thread David Fetter
On Wed, Nov 25, 2020 at 05:04:36AM +, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
> > On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> > > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> > GetInsertBatchSize may be better.  That is, this API deals with multiple
> > records in a single INSERT statement.  Your GetModifyBatchSize will be
> > reserved for statement batching when libpq has supported batch/pipelining to
> > execute multiple INSERT/UPDATE/DELETE statements, as in the following
> > JDBC batch updates.  What do you think?
> > >
> > 
> > I don't know. I was really only thinking about batching in the context
> > of a single DML command, not about batching of multiple commands at the
> > protocol level. IMHO it's far more likely we'll add support for batching
> > for DELETE/UPDATE than libpq pipelining, which seems rather different
> > from how the FDW API works. Which is why I was suggesting to use a name
> > that would work for all DML commands, not just for inserts.
> 
> Right, I can't imagine now how the interaction among the client, server core 
> and FDWs would be regarding the statement batching.  So I'll take your 
> suggested name.
> 
> 
> > Not sure, but I'd guess knowing whether batching is used would be
> > useful. We only print the single-row SQL query, which kinda gives the
> > impression that there's no batching.
> 
> Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.
> 
> 
> > > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a 
> > > value
> > that was already saved in a struct in create_foreign_modify().
> > >
> > 
> > Well, I do worry for two reasons.
> > 
> > Firstly, the fact that in postgres_fdw the call is cheap does not mean
> > it'll be like that in every other FDW. Presumably, the other FDWs might
> > cache it in the struct and do the same thing, of course.
> > 
> > But the fact that we're calling it over and over for each row kinda
> > seems like we allow the value to change during execution, but I very
> > much doubt the code is expecting that. I haven't tried, but assume the
> > function first returns 10 and then 100. ISTM the code will allocate
> > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> > That can't end well. Sure, we can claim it's a bug in the FDW extension,
> > but it's also due to the API design.
> 
> You worried about other FDWs than postgres_fdw.  That's reasonable.  I 
> insisted in other threads that PG developers care only about postgres_fdw, 
> not other FDWs, when designing the FDW interface, but I myself made the same 
> mistake.  I made changes so that the executor calls GetModifyBatchSize() once 
> per relation per statement.

Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Clarifying the ImportForeignSchema API

2020-08-03 Thread David Fetter
Folks,

I noticed that the API document for IMPORT FOREIGN SCHEMA states in
part:

It should return a list of C strings, each of which must contain a
CREATE FOREIGN TABLE command. These strings will be parsed and
executed by the core server.

A reasonable reading of the above is that it disallows statements
other than CREATE FOREIGN TABLE, which seems overly restrictive for no
reason I can discern.  The list of C strings seems reasonable as a
requirement, but I think it would be better to rephrase this along the
lines of:

It should return a list of C strings, each of which must contain a
DDL command, for example CREATE FOREIGN TABLE. These strings will
be parsed and executed by the core server in order to create the
objects in the schema.

as a foreign schema might need types (the case I ran across) or other
database objects like CREATE EXTERNAL ROUTINE, when we dust off the
implementation of that, to support it.

I was unable to discern from my draft version of the spec whether
statements other than CREATE FOREIGN TABLE are specifically
disallowed, or whether it is intended to (be able to) contain CREATE
ROUTINE MAPPING statements.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Bump default wal_level to logical

2020-06-08 Thread David Fetter
On Mon, Jun 08, 2020 at 11:10:38AM +0200, Magnus Hagander wrote:
> On Mon, Jun 8, 2020 at 8:46 AM Michael Paquier  wrote:
> 
> > On Mon, Jun 08, 2020 at 11:59:14AM +0530, Amit Kapila wrote:
> > > I think we should first do performance testing to see what is the
> > > overhead of making this default.  I think pgbench read-write at
> > > various scale factors would be a good starting point.  Also, we should
> > > see how much additional WAL it generates as compared to current
> > > default.
> >
> > +1.  I recall that changing wal_level to logical has been discussed in
> > the past and performance was the actual take to debate on.
> >
> 
> That was at least the discussion (long-going and multi-repeated) before we
> upped it from minimal to replica. There were some pretty extensive
> benchmarking done to prove that the difference was very small, and this was
> weighed against the ability to take basic backups of the system (which
> arguably is more important than being able to do logical replication).

I'd argue this a different direction.

Logical replication has been at fundamental to how a lot of systems
operate since Slony came out for the very good reason that it was far
and away the simplest way to accomplish a bunch of design goals. There
are now, and have been for some years, both free and proprietary
systems whose sole purpose is change data capture. PostgreSQL can play
nicely with those systems with this switch flipped on by default.

Looking into the future of PostgreSQL itself, there are things we've
been unable to do thus far that logical replication makes tractable.
These include:

- Zero downtime version changes
- Substantive changes to our on-disk representations between versions
  because upgrading in place places sharp limits on what we could do.

> I agree that we should consider changing it *if* it does not come
> with a substantial overhead, but that has to be shown.

What overhead would be substantial enough to require more work than
changing the default, and under what circumstances?

I ask this because on a heavily loaded system, the kind where
differences could be practical as opposed to merely statistically
significant, statement logging at even the most basic level is a much
bigger burden than the maxed-out WALs are.  Any overhead those WALs
might impose simply disappears in the noise.  The difference is even
more stark in systems subject to audit.

> Of course, what would be even neater would be if it could be changed
> so you don't have to bounce the server to change the wal_level.
> That's a bigger change though, but perhaps it is now possible once
> we have the "global barriers" in 13?

As much as I would love to have this capability, I was hoping to keep
the scope of this contained.  As pointed out down-thread, there's lots
more to doing this dynamically that just turning up the wal_level.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Bump default wal_level to logical

2020-06-07 Thread David Fetter
Hi,

I'd like to propose $subject, as embodied in the attached patch. This
makes it possible to discover and fulfill a need for logical
replication that can arise at a time when bouncing the server has
become impractical, i.e. when there is already high demand on it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 8179603827107d4b4ec5fe5e893b274e839a0ac1 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 7 Jun 2020 21:31:50 -0700
Subject: [PATCH v1] Bump default wal_level to logical
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.26.2"

This is a multi-part message in MIME format.
--2.26.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml
index aca8f73a50..08bb1d7fd7 100644
--- doc/src/sgml/config.sgml
+++ doc/src/sgml/config.sgml
@@ -2486,14 +2486,13 @@ include_dir 'conf.d'
   

 wal_level determines how much information is written to
-the WAL. The default value is replica, which writes enough
-data to support WAL archiving and replication, including running
-read-only queries on a standby server. minimal removes all
-logging except the information required to recover from a crash or
-immediate shutdown.  Finally,
-logical adds information necessary to support logical
-decoding.  Each level includes the information logged at all lower
-levels.  This parameter can only be set at server start.
+the WAL. The default value, logical, includes all
+information necessary to support logical decoding. replica
+writes enough data to support WAL archiving and replication, including running
+read-only queries on a standby server.  Finally, minimal
+removes all logging except the information required to recover from a crash or
+immediate shutdown.  Each level includes the information logged at all lower levels.
+This parameter can only be set at server start.


 In minimal level, no information is logged for
diff --git src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c
index 2f3e0a70e0..8d27a9c2a4 100644
--- src/backend/utils/misc/guc.c
+++ src/backend/utils/misc/guc.c
@@ -4651,7 +4651,7 @@ static struct config_enum ConfigureNamesEnum[] =
 			NULL
 		},
 		_level,
-		WAL_LEVEL_REPLICA, wal_level_options,
+		WAL_LEVEL_LOGICAL, wal_level_options,
 		NULL, NULL, NULL
 	},
 
diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample
index ac02bd0c00..d637133db6 100644
--- src/backend/utils/misc/postgresql.conf.sample
+++ src/backend/utils/misc/postgresql.conf.sample
@@ -192,7 +192,7 @@
 
 # - Settings -
 
-#wal_level = replica			# minimal, replica, or logical
+#wal_level = logical			# minimal, replica, or logical
 	# (change requires restart)
 #fsync = on# flush data to disk for crash safety
 	# (turning this off can cause

--2.26.2--




Re: Default gucs for EXPLAIN

2020-06-02 Thread David Fetter
On Tue, Jun 02, 2020 at 09:28:48PM +0200, Vik Fearing wrote:
> On 6/2/20 7:54 PM, David G. Johnston wrote:
> > At this point, given the original goal of the patch was to try and
> > grease a smoother path to changing the default for BUFFERS, and
> > that people seem OK with doing just that without having this
> > patch, I'd say we should just change the default and forget this
> > patch.  There hasn't been any other demand from our users for this
> > capability and I also doubt that having BUFFERS on by default is
> > going to bother people.
> 
> What about WAL?  Can we turn that one one by default, too?
> 
> I often find having VERBOSE on helps when people don't qualify their
> columns and I don't know the schema.  We should turn that on by
> default, as well.

+1 for all on (except ANALYZE because it would be a foot cannon) by
default. For those few to whom it really matters, there'd be OFF
switches.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Default gucs for EXPLAIN

2020-05-31 Thread David Fetter
On Tue, May 26, 2020 at 02:49:46AM +, Nikolay Samokhvalov wrote:
> On Mon, May 25, 2020 at 6:36 PM, Bruce Momjian < br...@momjian.us > wrote:
> > I am not excited about this new feature. Why do it only for
> > EXPLAIN? That is a log of GUCs. I can see this becoming a feature
> > creep disaster.
> 
> How about changing the default behavior, making BUFFERS enabled by
> default? Those who don't need it, always can say BUFFERS OFF — the
> say as for TIMING.

+1 for changing the default of BUFFERS to ON.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: proposal: possibility to read dumped table's name from file

2020-05-29 Thread David Fetter
On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:
> Hi
> 
> one my customer has to specify dumped tables name by name. After years and
> increasing database size and table numbers he has problem with too short
> command line. He need to read the list of tables from file (or from stdin).
> 
> I wrote simple PoC patch
> 
> Comments, notes, ideas?

This seems like a handy addition. What I've done in cases similar to
this was to use `grep -f` on the output of `pg_dump -l` to create a
file suitable for `pg_dump -L`, or mash them together like this:

pg_restore -L <(pg_dump -l /path/to/dumpfile | grep -f /path/to/listfile) 
-d new_db /path/to/dumpfile

That's a lot of shell magic and obscure corners of commands to expect
people to use.

Would it make sense to expand this patch to handle other objects?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2020-05-25 Thread David Fetter
On Mon, May 25, 2020 at 09:43:32AM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > One problem (other than perhaps performance, tbd.) is that this would no 
> > longer allow processing infinite timestamps, since numeric does not 
> > support infinity.  It could be argued that running extract() on infinite 
> > timestamps isn't very useful, but it's something to consider explicitly.
> 
> I wonder if it's time to fix that, ie introduce +-Infinity into numeric.c.
> This isn't the first time we've seen issues with numeric not being a
> superset of float, and it won't be the last.
> 
> At first glance there's no free bits in the on-disk format for numeric,
> but we could do something by defining the low-order bits of the header
> word for a NaN to distinguish between real NaN and +/- infinity.
> It looks like those bits should reliably be zero right now.

+1 for adding +/- infinity to NUMERIC.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: factorial function/phase out postfix operators?

2020-05-18 Thread David Fetter
On Mon, May 18, 2020 at 10:03:13PM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > What are the thoughts about then marking the postfix operator deprecated 
> > and eventually removing it?
> 
> If we do this it'd require a plan.  We'd have to also warn about the
> feature deprecation in (at least) the CREATE OPERATOR man page, and
> we'd have to decide how many release cycles the deprecation notices
> need to stand for.
> 
> If that's the intention, though, it'd be good to get those deprecation
> notices published in v13 not v14.

+1 for deprecating in v13.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Proposing WITH ITERATIVE

2020-04-27 Thread David Fetter
On Mon, Apr 27, 2020 at 10:44:04PM -0400, Jonah H. Harris wrote:
> On Mon, Apr 27, 2020 at 10:32 PM David Fetter  wrote:
> > On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> > > Hey, everyone.
> >
> > > If there's any interest, I'll clean-up their patch and submit. Thoughts?
> >
> > Where's the current patch?
> 
> 
> It’s private. Hence, why I’m inquiring as to interest.

Have the authors agreed to make it available to the project under a
compatible license?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Proposing WITH ITERATIVE

2020-04-27 Thread David Fetter
On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> Hey, everyone.

> If there's any interest, I'll clean-up their patch and submit. Thoughts?

Where's the current patch?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Commitfest 2020-03 Now in Progress

2020-04-08 Thread David Fetter
On Wed, Apr 08, 2020 at 12:36:37PM -0400, David Steele wrote:
> On 4/1/20 10:09 AM, David Steele wrote:
> > On 3/17/20 8:10 AM, David Steele wrote:
> > > On 3/1/20 4:10 PM, David Steele wrote:
> > > > The last Commitfest for v13 is now in progress!
> > > > 
> > > > Current stats for the Commitfest are:
> > > > 
> > > > Needs review: 192
> > > > Waiting on Author: 19
> > > > Ready for Committer: 4
> > > > Total: 215
> > > 
> > > Halfway through, here's where we stand.  Note that a CF entry was
> > > added after the stats above were recorded so the total has
> > > increased.
> > > 
> > > Needs review: 151 (-41)
> > > Waiting on Author: 20 (+1)
> > > Ready for Committer: 9 (+5)
> > > Committed: 25
> > > Moved to next CF: 1
> > > Withdrawn: 4
> > > Returned with Feedback: 6
> > > Total: 216
> > 
> > After regulation time here's where we stand:
> > 
> > Needs review: 111 (-40)
> > Waiting on Author: 26 (+6)
> > Ready for Committer: 11 (+2)
> > Committed: 51 (+11)
> > Moved to next CF: 2 (+1)
> > Returned with Feedback: 10 (+4)
> > Rejected: 1
> > Withdrawn: 5 (+1)
> > Total: 217
> > 
> > We have one more patch so it appears that one in a completed state
> > (committed, etc.) at the beginning of the CF has been moved to an
> > uncompleted state. Or perhaps my math is just bad.
> > 
> > The RMT has determined that the CF will be extended for one week so I'll
> > hold off on moving and marking patches until April 8.
> 
> The 2020-03 Commitfest is officially closed.
> 
> Final stats are (for entire CF, not just from March 1 this time):
> 
> Committed: 90.
> Moved to next CF: 115.
> Withdrawn: 8.
> Rejected: 1.
> Returned with Feedback: 23.
> Total: 237.
> 
> Good job everyone!

Thanks so much for your hard work managing this one!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Let people set host(no)ssl settings from initdb

2020-04-08 Thread David Fetter
On Mon, Apr 06, 2020 at 10:12:16PM +, Cary Huang wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
> 
> Hi 
> I applied the patch 
> "v3-0001-Enable-setting-pg_hba.conf-permissions-from-initd.patch" and did 
> some verification with it. The intended feature works overall and I think it 
> is quite useful to support default auth methods for ssl and gss host types. I 
> have also found some minor things in the patch and would like to share as 
> below:
> 
> > +"# CAUTION: Configuring the system for \"trust\" authentication\n" \
> > +"# allows any user who can reach the databse on the route specified\n" \
> > +"# to connect as any PostgreSQL user, including the database\n" \
> > +"# superuser.  If you do not trust all the users who could\n" \
> > +"# reach the database on the route specified, use a more restrictive\n" \
> > +"# authentication method.\n"
> 
> Found a typo: should be 'database' instead of 'databse'

Fixed.

> >  * a sort of poor man's grep -v
> >  */
> > -#ifndef HAVE_UNIX_SOCKETS
> >  static char **
> >  filter_lines_with_token(char **lines, const char *token)
> >  {
> > @@ -461,7 +466,6 @@ filter_lines_with_token(char **lines, const char *token)
> >  
> > return result;
> > }
> > -#endif
> 
> I see that you have removed "#ifndef HAVE_UNIX_SOCKETS" around the
> filter_lines_with_token() function definition so it would be always
> available, which is used to remove the @tokens@ in case user does
> not specify a default auth method for the new hostssl, hostgss
> options. I think you should also remove the "#ifndef
> HAVE_UNIX_SOCKETS" around its declaration as well so both function
> definition and declaration would make sense.

Fixed.

Thanks very much for the review!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 3b49afec7f0b708285eadc10a097b4dc23d23927 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 11 Dec 2019 18:55:03 -0800
Subject: [PATCH v4] Enable setting pg_hba.conf permissions from initdb
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.25.2"

This is a multi-part message in MIME format.
--2.25.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


The new optional arguments are --auth-hostssl, --auth-hostnossl,
--auth-hostgssenc, and --auth-hostnogssenc.

diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml
index a04a180165..ee65cf925d 100644
--- doc/src/sgml/ref/initdb.sgml
+++ doc/src/sgml/ref/initdb.sgml
@@ -154,6 +154,50 @@ PostgreSQL documentation
   
  
 
+ 
+  --auth-hostssl=authmethod
+  
+   
+This option specifies the authentication method for users via
+TLS remote connections used in pg_hba.conf
+(hostssl lines).
+   
+  
+ 
+
+ 
+  --auth-hostnossl=authmethod
+  
+   
+This option specifies the authentication method for users via
+non-TLS remote connections used in pg_hba.conf
+(hostnossl lines).
+   
+  
+ 
+
+ 
+  --auth-hostgssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+encrypted GSSAPI remote connections used in pg_hba.conf
+(hostgssenc lines).
+   
+  
+ 
+
+ 
+  --auth-hostnogssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+remote connections not using encrypted GSSAPI in pg_hba.conf
+(hostnogssenc lines).
+   
+  
+ 
+
  
   --auth-local=authmethod
   
diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample
index c853e36232..7d5189dd8f 100644
--- src/backend/libpq/pg_hba.conf.sample
+++ src/backend/libpq/pg_hba.conf.sample
@@ -87,3 +87,15 @@ hostall all ::1/128 @authmethodhost@
 @remove-line-for-nolocal@local   replication all @authmethodlocal@
 hostreplication all 127.0.0.1/32@authmethodhost@
 hostreplication all ::1/128 @authmethodhost@
+
+# hostnossl all   all 0.0.0.0/0   @authmethodhostnossl@
+# hostnossl all   all ::/0@authmethodhostnossl@
+
+# hostssl all all

Re: range_agg

2020-03-10 Thread David Fetter
On Mon, Mar 09, 2020 at 06:34:04PM -0700, Jeff Davis wrote:
> On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote:
> > Actually ... have you given any thought to just deciding that ranges
> > and
> > multiranges are the same type? 
> 
> It has come up in a number of conversations, but I'm not sure if it was
> discussed on this list.
> 
> > I think on the whole the advantages win,
> > and I feel like that might also be the case here.
> 
> Some things to think about:
> 
> 1. Ranges are common -- at least implicitly -- in a lot of
> applications/systems. It's pretty easy to represent extrernal data as
> ranges in postgres, and also to represent postgres ranges in external
> systems. But I can see multiranges causing friction around a lot of
> common tasks, like displaying in a UI. If you only expect ranges, you
> can add a CHECK constraint, so this is annoying but not necessarily a
> deal-breaker.

It could become well and truly burdensome in a UI or an API. The
difference between one, as ranges are now, and many, as multi-ranges
would be if we shoehorn them into the range type, are pretty annoying
to deal with.

> 2. There are existing client libraries[1] that support range types and
> transform them to types within the host language. Obviously, those
> would need to be updated to expect multiple ranges.

The type systems that would support such types might get unhappy with
us if we started messing with some of the properties like
contiguousness.

> 3. It seems like we would want some kind of base "range" type. When you
> try to break a multirange down into constituent ranges, what type would
> those pieces be? (Aside: how do you get the constituent ranges?)
> 
> I'm thinking more about casting to see if there's a possible compromise
> there.

I think the right compromise is to recognize that the closure of a set
(ranges) over an operation (set union) may well be a different set
(multi-ranges). Other operations have already been proposed, complete
with concrete use cases that could really make PostgreSQL stand out.

That we don't have an obvious choice of "most correct" operation over
which to close ranges makes it even bigger a potential foot-gun
when we choose one arbitrarily and declare it to be the canonical one.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread David Fetter
On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote:
> Hi David,
> 
> On Wed, Feb 26, 2020 at 9:56 PM David Fetter  wrote:
> >
> > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote:
> > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > > > > improvement. My biggest cringe might be in pg_bitutils:
> > > > > > >
> > > > > > > 1. Is ceil_log2_64 dead code?
> > > > > >
> > > > > > Let's call it nascent code. I suspect there are places it could go, 
> > > > > > if
> > > > > > I look for them.  Also, it seemed silly to have one without the 
> > > > > > other.
> > > > > >
> > > > >
> > > > > While not absolutely required, I'd like us to find at least one
> > > > > place and start using it. (Clang also nags at me when we have
> > > > > unused functions).
> > > >
> > > > Done in the expanded patches attached.
> I see that you've found use of it in dynahash, thanks!
> 
> The math in the new (from v4 to v6) patch is wrong: it yields
> ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted
> the restriction of "num greater than one" for ceil_log2() in this patch
> set, but it's now _more_ problematic to base those functions on
> pg_leftmost_one_pos().
> 
> I'm not comfortable with your changes to pg_leftmost_one_pos() to remove
> the restriction on word being non-zero. Specifically
> pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its
> current callers (in HEAD) is harmed, this introduces muddy semantics:
> 
> 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning
> for a set bit in a zero word won't find it anywhere.
> 
> 2. we can _try_ generalizing it to accommodate ceil_log2 by
> extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In
> that case, the extrapolation yields -1 for pg_leftmost_one_pos(0).
> 
> I'm not convinced that others on the list will be comfortable with the
> generalization suggested in 2 above.
> 
> I've quickly put together a PoC patch on top of yours, which
> re-implements ceil_log2 using LZCNT coupled with a CPUID check.
> Thoughts?

Per discussion on IRC with Andrew (RhodiumToad) Gierth:

The runtime detection means there's always an indirect call overhead
and no way to inline.  This is counter to what using compiler
intrinsics is supposed to do.

It's better to rely on the compiler, because:
(a) The compiler often knows whether the value can or can't be 0 and
can therefore skip a conditional jump.
(b) If you're targeting a recent microarchitecture, the compiler can
just use the right instruction.
(c) Even if the conditional branch is left in, it's not a big overhead.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: range_agg

2020-03-07 Thread David Fetter
On Sat, Mar 07, 2020 at 06:45:44PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > There's another use case not yet covered here that could make this
> > even more complex, we should probably plan for it: multi-ranges
> > with weights.
> 
> I'm inclined to reject that as completely out of scope.  The core
> argument for unifying multiranges with ranges, if you ask me, is to
> make the data type closed under union.  Weights are from some other
> universe.

I don't think they are. SQL databases are super useful because they do
bags in addition to sets, so set union isn't the only, or maybe even
the most important, operation over which ranges ought to be closed.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: range_agg

2020-03-07 Thread David Fetter
On Sat, Mar 07, 2020 at 04:06:32PM -0500, Tom Lane wrote:
> I wrote:
> > However, what I'm on about right at the moment is that I don't think
> > there should be any delta in that test at all.  As far as I can see,
> > the design idea here is that multiranges will be automatically created
> > over range types, and the user doesn't need to do that.  To my mind,
> > that means that they're an implementation detail and should not show up as
> > separately-owned objects, any more than an autogenerated array type does.
> 
> Actually ... have you given any thought to just deciding that ranges and
> multiranges are the same type?  That is, any range can now potentially
> contain multiple segments?  That would eliminate a whole lot of the
> tedious infrastructure hacking involved in this patch, and let you focus
> on the actually-useful functionality.

If we're changing range types rather than constructing a new
multi-range layer atop them, I think it would be helpful to have some
way to figure out quickly whether this new range type was contiguous.
One way to do that would be to include a "range cardinality" in the
data structure which be the number of left ends in it.

One of the things I'd pictured doing with multiranges was along the
lines of a "full coverage" constraint like "During a shift, there can
be no interval that's not covered," which would correspond to a "range
cardinality" of 1.

I confess I'm getting a little twitchy about the idea of eliding the
cases of "one" and "many", though.

> Assuming that that's ok, it seems like we could consider the traditional
> range functions like lower() and upper() to report on the first or last
> range bound in a multirange --- essentially, they ignore any "holes"
> that exist inside the range.  And the new functions for multiranges
> act much like array slicing, in that they give you back pieces of a range
> that aren't actually of a distinct type.

So new functions along the lines of lowers(), uppers(), opennesses(),
etc.? I guess this could be extended as needs emerge.

There's another use case not yet covered here that could make this
even more complex, we should probably plan for it: multi-ranges with
weights.

For example,

SELECT weighted_range_union(r)
FROM (VALUES('[0,1)'::float8range), ('[0,3)'), '('[2,5)')) AS t(r)

would yield something along the lines of:

(([0,1),1), ([1,3),2), ([3,5),1))

and wedging that into the range type seems messy. Each range would
then have a cardinality, and each range within would have a weight,
all of which would be an increasingly heavy burden on the common case
where there's just a single range.

Enhancing a separate multirange type to have weights seems like a
cleaner path forward.

Given that, I'm -1 on mushing multi-ranges into a special case of
ranges, or /vice versa/.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: proposal \gcsv

2020-02-29 Thread David Fetter
On Sat, Feb 29, 2020 at 11:59:22AM +0100, Pavel Stehule wrote:
> so 29. 2. 2020 v 11:34 odesílatel Vik Fearing 
> napsal:
> 
> > On 29/02/2020 06:43, Pavel Stehule wrote:
> > > Hi
> > >
> > > I would to enhance \g command about variant \gcsv
> > >
> > > proposed command has same behave like \g, only the result will be every
> > > time in csv format.
> > >
> > > It can helps with writing psql macros wrapping \g command.
> > >
> > > Options, notes?
> >
> > But then we would need \ghtml and \glatex etc.  If we want a shortcut
> > for setting a one-off format, I would go for \gf or something.
> >
> > \gf csv
> > \gf html
> > \gf latex
> >
> 
> usability of html or latex format in psql is significantly lower than csv
> format. There is only one generic format for data - csv.

Not exactly.  There's a lot of uses for things along the lines of 

\gf json
\gf yaml

I'd rather add a new \gf that takes arguments, as it seems more
extensible. For example, there are uses for

\gf csv header

if no header is the default, or 

\gf csv noheader

if header is the default.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use compiler intrinsics for bit ops in hash

2020-02-28 Thread David Fetter
On Thu, Feb 27, 2020 at 02:41:49PM +0800, John Naylor wrote:
> On Thu, Feb 27, 2020 at 1:56 PM David Fetter  wrote:
> > [v6 set]
> 
> Hi David,
> 
> In 0002, the pg_bitutils functions have a test (input > 0), and the
> new callers ceil_log2_* and next_power_of_2_* have asserts. That seems
> backward to me.

To me, too, now that you mention it.  My thinking was a little fuzzed
by trying to accommodate platforms with intrinsics where clz is
defined for 0 inputs.

> I imagine some callers of bitutils will already know the value > 0,
> and it's probably good to keep that branch out of the lowest level
> functions. What do you think?

I don't know quite how smart compilers and CPUs are these days, so
it's unclear to me how often that branch would actually happen.

Anyhow, I'll get a revised patch set out later today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use compiler intrinsics for bit ops in hash

2020-02-26 Thread David Fetter
On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote:
> On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > > improvement. My biggest cringe might be in pg_bitutils:
> > > > >
> > > > > 1. Is ceil_log2_64 dead code?
> > > >
> > > > Let's call it nascent code. I suspect there are places it could go, if
> > > > I look for them.  Also, it seemed silly to have one without the other.
> > > >
> > > 
> > > While not absolutely required, I'd like us to find at least one
> > > place and start using it. (Clang also nags at me when we have
> > > unused functions).
> > 
> > Done in the expanded patches attached.
> 
> These bit-rotted a little, so I've updated them.

05d8449e73694585b59f8b03aaa087f04cc4679a broke this patch set, so fix.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 2f8778f6133be23f6b7c375a39e9940ecc9fb063 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 29 Jan 2020 02:09:59 -0800
Subject: [PATCH v6 1/3] de-long-ify
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..482a569814 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple item);
 static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple *item);
-static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
-static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum);
+static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
+static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum);
 
-static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr);
-static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr);
+static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
+static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
 
 
 /*
@@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	/* Initialize free page management. */
 	gfbb->nFreeBlocks = 0;
 	gfbb->freeBlocksLen = 32;
-	gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long));
+	gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64));
 
 	/*
 	 * Current memory context will be used for all in-memory data structures
@@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer,
 /*
  * Select a currently unused block for writing to.
  */
-static long
+static uint64
 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb)
 {
 	/*
@@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb)
  * Return a block# to the freelist.
  */
 static void
-gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum)
+gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum)
 {
 	int			ndx;
 
@@ -495,9 +495,9 @@ gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum)
 	if (gfbb->nFreeBlocks >= gfbb->freeBlocksLen)
 	{
 		gfbb->freeBlocksLen *= 2;
-		gfbb->freeBlocks = (long *) repalloc(gfbb->freeBlocks,
+		gfbb->freeBlocks = (int64 *) repalloc(gfbb->freeBlocks,
 			 gfbb->freeBlocksLen *
-			 sizeof(long));
+			 sizeof(uint64));
 	}
 
 	/* Add blocknum to array */
@@ -755,7 +755,7 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
  */
 
 static void
-ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
+ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
@@ -764,7 +764,7 @@ ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 }
 
 static void
-WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
+WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c46764bf42..4fc478640a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,7 @@ stat

Re: truncating timestamps on arbitrary intervals

2020-02-26 Thread David Fetter
On Wed, Feb 26, 2020 at 06:38:57PM +0800, John Naylor wrote:
> On Wed, Feb 26, 2020 at 3:51 PM David Fetter  wrote:
> >
> > I believe the following should error out, but doesn't.
> >
> > # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 
> > 20:38:40');
> >  date_trunc_interval
> > ═
> >  2001-01-01 00:00:00
> > (1 row)
> 
> You're quite right. I forgot to add error checking for
> second-and-below units. I've added your example to the tests. (I
> neglected to mention in my first email that because I chose to convert
> the interval to the pg_tm struct (seemed easiest), it's not
> straightforward how to allow multiple unit types, and I imagine the
> use case is small, so I had it throw an error.)

I suspect that this could be sanely expanded to span some sets of
adjacent types in a future patch, e.g. year + month or hour + minute.

> > Please find attached an update that I believe fixes the bug I found in
> > a principled way.
> 
> Thanks for that! I made a couple adjustments and incorporated your fix
> into v3: While working on v1, I noticed the DTK_FOO macros already had
> an idiom for bitmasking (see utils/datetime.h),

Oops!  Sorry I missed that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use compiler intrinsics for bit ops in hash

2020-02-26 Thread David Fetter
On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > improvement. My biggest cringe might be in pg_bitutils:
> > > >
> > > > 1. Is ceil_log2_64 dead code?
> > >
> > > Let's call it nascent code. I suspect there are places it could go, if
> > > I look for them.  Also, it seemed silly to have one without the other.
> > >
> > 
> > While not absolutely required, I'd like us to find at least one
> > place and start using it. (Clang also nags at me when we have
> > unused functions).
> 
> Done in the expanded patches attached.

These bit-rotted a little, so I've updated them.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 90d3575b0716839b76953023bb63242b1d0698ef Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 29 Jan 2020 02:09:59 -0800
Subject: [PATCH v5 1/3] de-long-ify
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..482a569814 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple item);
 static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple *item);
-static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
-static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum);
+static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
+static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum);
 
-static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr);
-static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr);
+static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
+static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
 
 
 /*
@@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	/* Initialize free page management. */
 	gfbb->nFreeBlocks = 0;
 	gfbb->freeBlocksLen = 32;
-	gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long));
+	gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64));
 
 	/*
 	 * Current memory context will be used for all in-memory data structures
@@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer,
 /*
  * Select a currently unused block for writing to.
  */
-static long
+static uint64
 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb)
 {
 	/*
@@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb)
  * Return a block# to the freelist.
  */
 static void
-gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum)
+gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum)
 {
 	int			ndx;
 
@@ -495,9 +495,9 @@ gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum)
 	if (gfbb->nFreeBlocks >= gfbb->freeBlocksLen)
 	{
 		gfbb->freeBlocksLen *= 2;
-		gfbb->freeBlocks = (long *) repalloc(gfbb->freeBlocks,
+		gfbb->freeBlocks = (int64 *) repalloc(gfbb->freeBlocks,
 			 gfbb->freeBlocksLen *
-			 sizeof(long));
+			 sizeof(uint64));
 	}
 
 	/* Add blocknum to array */
@@ -755,7 +755,7 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
  */
 
 static void
-ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
+ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
@@ -764,7 +764,7 @@ ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 }
 
 static void
-WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
+WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c46764bf42..4fc478640a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,7 @@ static int	_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
 static void _SPI_error_callback(void *arg);
 
 static void _SPI_cursor_operation(Portal portal,
-  FetchDirection direction, long count

Re: truncating timestamps on arbitrary intervals

2020-02-25 Thread David Fetter
On Wed, Feb 26, 2020 at 10:50:19AM +0800, John Naylor wrote:
> Hi,
> 
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit.

Thanks for adding this very handy feature!

> In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
> 
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> -
>  2020-02-16 20:45:00
> (1 row)

I believe the following should error out, but doesn't.

# SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
 date_trunc_interval 
═
 2001-01-01 00:00:00
(1 row)

> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.

I agree that turning it into a wrapper would be separate work.

> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple,
> but I wanted to get feedback on the basic idea first before making
> it complete. I've also written tests and very basic documentation.

Please find attached an update that I believe fixes the bug I found in
a principled way.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 5e36c4c888c65e358d2f87d84b64bc14d52f2b39 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 25 Feb 2020 23:49:35 -0800
Subject: [PATCH v2] Add date_trunc_interval(interval, timestamp)
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


per John Naylor

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ceda48e0fc..3863c222a2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6949,6 +6949,15 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 2 days 03:00:00

 
+   
+date_trunc_interval(interval, timestamp)
+timestamp
+Truncate to specified precision; see 
+
+date_trunc_interval('15 minutes', timestamp '2001-02-16 20:38:40')
+2001-02-16 20:30:00
+   
+

 
  
@@ -7818,7 +7827,7 @@ SELECT date_part('hour', INTERVAL '4 hours 3 minutes');
   
 
   
-   date_trunc
+   date_trunc, date_trunc_interval
 

 date_trunc
@@ -7902,6 +7911,21 @@ SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
 Result: 3 days 02:00:00
 

+
+   
+The function date_trunc_interval is 
+similar to the date_trunc, except that it
+truncates to an arbitrary interval.
+   
+
+   
+Example:
+
+SELECT date_trunc_interval('5 minutes', TIMESTAMP '2001-02-16 20:38:40');
+Result: 2001-02-16 20:35:00
+
+   
+
   
 
   
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 0b6c9d5ea8..ed742592af 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -30,6 +30,7 @@
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "parser/scansup.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
@@ -3804,6 +3805,144 @@ timestamptz_age(PG_FUNCTION_ARGS)
  *-*/
 
 
+/* timestamp_trunc_interval()
+ * Truncate timestamp to specified interval.
+ */
+Datum
+timestamp_trunc_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *interval = PG_GETARG_INTERVAL_P(0);
+	Timestamp	timestamp = PG_GETARG_TIMESTAMP(1);
+	Timestamp	result;
+	fsec_t		ifsec,
+tfsec;
+	uint32_t	unit = 0,
+popcount = 0;
+	enum		TimeUnit {
+	us = 1 << 0,
+	ms = 1 << 1,
+	second = 1 << 2,
+	minute = 1 << 3,
+	hour   = 1 << 4,
+	day= 1 << 5,
+	month  = 1 << 6,
+	year   = 1 << 7
+};
+
+	struct pg_tm it;
+	struct pg_tm tt;
+
+	if (TIMESTAMP_NOT_FINITE(timestamp))
+		PG_RETURN_TIMESTAMP(timestamp);
+
+	if (interval2tm(*interval, , ) != 0)
+		elog(ERROR, "could not convert interval to tm");
+
+	if (timestamp2tm(timestamp, NULL, , , NULL, NULL) != 0)
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("timestamp out of range")));
+
+	if (it.tm_year != 0)
+	{
+		tt.tm_year = it.tm_year * (tt.tm_year

Re: Error on failed COMMIT

2020-02-24 Thread David Fetter
On Mon, Feb 24, 2020 at 06:40:16PM +0100, Vik Fearing wrote:
> On 24/02/2020 18:37, David Fetter wrote:
> 
> > If we'd done this from a clean sheet of paper, it would have been the
> > right decision. We're not there, and haven't been for decades.
> 
> OTOH, it's never too late to do the right thing.

Some right things take a lot of prep work in order to actually be
right things. This is one of them.  Defaulting to SERIALIZABLE
isolation is another.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Error on failed COMMIT

2020-02-24 Thread David Fetter
On Mon, Feb 24, 2020 at 06:04:28PM +0530, Robert Haas wrote:
> On Mon, Feb 24, 2020 at 1:56 PM Shay Rojansky  wrote:
> > As Dave wrote, the problem here isn't with the driver, but with framework 
> > or user-code which swallows the initial exception and allows code to 
> > continue to the commit. Npgsql (and I'm sure the JDBC driver too) does 
> > surface PostgreSQL errors as exceptions, and internally tracks the 
> > transaction status provided in the CommandComplete message. That means 
> > users have the ability - but not the obligation - to know about failed 
> > transactions, and some frameworks or user coding patterns could lead to a 
> > commit being done on a failed transaction.
> 
> Agreed. All of that can be fixed in the driver, though.
> 
> > If we think the current *user-visible* behavior is problematic (commit on 
> > failed transaction completes without throwing), then the only remaining 
> > question is where this behavior should be fixed - at the server or at the 
> > driver. As I wrote above, from the user's perspective it makes no 
> > difference - the change would be identical (and just as breaking) either 
> > way. So while drivers *could* implement the new behavior, what advantages 
> > would that have over doing it at the server? Some disadvantages do seem 
> > clear (repetition of the logic across each driver - leading to 
> > inconsistency across drivers, changing semantics at the driver by turning a 
> > non-error into an exception...).
> 
> The advantage is that it doesn't cause a compatibility break.
> 
> > > Well, it seems quite possible that there are drivers and applications 
> > > that don't have this issue; I've never had a problem with this behavior, 
> > > and I've been using PostgreSQL for something like two decades [...]
> >
> > If we are assuming that most user code is already written to avoid 
> > committing on failed transactions (by tracking transaction state etc.), 
> > then making this change at the server wouldn't affect those applications; 
> > the only applications affected would be those that do commit on failed 
> > transactions today, and it could be argued that those are likely to be 
> > broken today (since drivers today don't really expose the rollback in an 
> > accessible/discoverable way).
> 
> libpq exposes it just fine, so I think you're overgeneralizing here.
> 
> As I said upthread, I think one of the things that would be pretty
> badly broken by this is psql -f something.sql, where something.sql
> contains a series of blocks of the form "begin; something; something;
> something; commit;". Right now whichever transactions succeed get
> committed. With the proposed change, if one transaction block fails,
> it'll merge with all of the following blocks. You may think that
> nobody is doing this sort of thing, but I think people are, and that
> they will come after us with pitchforks if we break it.

I'm doing it, and I don't know about pitchforks, but I do know about
suddenly needing to rewrite (and re-test, and re-integrate, and
re-test some more) load-bearing code, and I'm not a fan of it.

If we'd done this from a clean sheet of paper, it would have been the
right decision. We're not there, and haven't been for decades.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parallel copy

2020-02-20 Thread David Fetter
On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
> On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote:
> > On Thu, Feb 20, 2020 at 5:12 AM David Fetter  wrote:
> > > 
> > > On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
> > > > This work is to parallelize the copy command and in particular "Copy
> > > >  from 'filename' Where ;" command.
> > > 
> > > Apropos of the initial parsing issue generally, there's an interesting
> > > approach taken here: https://github.com/robertdavidgraham/wc2
> > > 
> > 
> > Thanks for sharing.  I might be missing something, but I can't figure
> > out how this can help here.  Does this in some way help to allow
> > multiple workers to read and tokenize the chunks?
> 
> I think the wc2 is showing that maybe instead of parallelizing the
> parsing, we might instead try using a different tokenizer/parser and
> make the implementation more efficient instead of just throwing more
> CPUs on it.

That was what I had in mind.

> I don't know if our code is similar to what wc does, maytbe parsing
> csv is more complicated than what wc does.

CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parallel copy

2020-02-19 Thread David Fetter
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
> This work is to parallelize the copy command and in particular "Copy
>  from 'filename' Where ;" command.

Apropos of the initial parsing issue generally, there's an interesting
approach taken here: https://github.com/robertdavidgraham/wc2

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parallel copy

2020-02-18 Thread David Fetter
On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote:
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > > This is something similar to what I had also in mind for this idea.  I
> > > had thought of handing over complete chunk (64K or whatever we
> > > decide).  The one thing that slightly bothers me is that we will add
> > > some additional overhead of copying to and from shared memory which
> > > was earlier from local process memory.  And, the tokenization (finding
> > > line boundaries) would be serial.  I think that tokenization should be
> > > a small part of the overall work we do during the copy operation, but
> > > will do some measurements to ascertain the same.
> >
> > I don't think any extra copying is needed.
> 
> I am talking about access to shared memory instead of the process
> local memory.  I understand that an extra copy won't be required.

Isn't accessing shared memory from different pieces of execution what
threads were designed to do?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parallel copy

2020-02-15 Thread David Fetter
On Sat, Feb 15, 2020 at 06:02:06PM +0530, Amit Kapila wrote:
> On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner  wrote:
> >
> > On Sat, 15 Feb 2020 at 04:55, Amit Kapila  wrote:
> > >
> > > On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner  
> > > wrote:
> > > >
> > ...
> > > >
> > > > Parsing rows from the raw input (the work done by CopyReadLine()) in a 
> > > > single process would accommodate line returns in quoted fields. I don't 
> > > > think there's a way of getting parallel workers to manage the 
> > > > in-quote/out-of-quote state required.
> > > >
> > >
> > > AFAIU, the whole of this in-quote/out-of-quote state is manged inside
> > > CopyReadLineText which will be done by each of the parallel workers,
> > > something on the lines of what Thomas did in his patch [1].
> > > Basically, we need to invent a mechanism to allocate chunks to
> > > individual workers and then the whole processing will be done as we
> > > are doing now except for special handling for partial tuples which I
> > > have explained in my previous email.  Am, I missing something here?
> > >
> > The problem case that I see is the chunk boundary falling in the
> > middle of a quoted field where
> >  - The quote opens in chunk 1
> >  - The quote closes in chunk 2
> >  - There is an EoL character between the start of chunk 2 and the closing 
> > quote
> >
> > When the worker processing chunk 2 starts, it believes itself to be in
> > out-of-quote state, so only data between the start of the chunk and
> > the EoL is regarded as belonging to the partial line. From that point
> > on the parsing of the rest of the chunk goes off track.
> >
> > Some of the resulting errors can be avoided by, for instance,
> > requiring a quote to be preceded by a delimiter or EoL. That answer
> > fails when fields end with EoL characters, which happens often enough
> > in the wild.
> >
> > Recovering from an incorrect in-quote/out-of-quote state assumption at
> > the start of parsing a chunk just seems like a hole with no bottom. So
> > it looks to me like it's best done in a single process which can keep
> > track of that state reliably.
> >
> 
> Good point and I agree with you that having a single process would
> avoid any such stuff.   However, I will think some more on it and if
> you/anyone else gets some idea on how to deal with this in a
> multi-worker system (where we can allow each worker to read and
> process the chunk) then feel free to share your thoughts.

I see two pieces of this puzzle: an input format we control, and the
ones we don't.

In the former case, we could encode all fields with base85 (or
something similar that reduces the input alphabet efficiently), then
reserve bytes that denote delimiters of various types. ASCII has
separators for file, group, record, and unit that we could use as
inspiration.

I don't have anything to offer for free-form input other than to agree
that it looks like a hole with no bottom, and maybe we should just
keep that process serial, at least until someone finds a bottom.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: allow frontend use of the backend's core hashing functions

2020-02-14 Thread David Fetter
On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote:
> > On Feb 14, 2020, at 8:15 AM, David Fetter  wrote:
> > 
> > On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> >> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> >>  wrote:
> >>> I have made these changes and rebased Robert’s patches but
> >>> otherwise changed nothing.  Here they are:
> >> 
> >> Thanks. Anyone else have comments? I think this is pretty
> >> straightforward and unobjectionable work so I'm inclined to press
> >> forward with committing it fairly soon, but if someone feels
> >> otherwise, please speak up.
> > 
> > One question. It might be possible to make these functions faster
> > using compiler intrinsics. Would those still be available to front-end
> > code?
> 
> Do you have a specific proposal that would preserve on-disk compatibility?

I hadn't planned on changing the representation, just cutting
instructions out of the calculation of same.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: allow frontend use of the backend's core hashing functions

2020-02-14 Thread David Fetter
On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
>  wrote:
> > I have made these changes and rebased Robert’s patches but
> > otherwise changed nothing.  Here they are:
> 
> Thanks. Anyone else have comments? I think this is pretty
> straightforward and unobjectionable work so I'm inclined to press
> forward with committing it fairly soon, but if someone feels
> otherwise, please speak up.

One question. It might be possible to make these functions faster
using compiler intrinsics. Would those still be available to front-end
code?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-02-13 Thread David Fetter
On Thu, Feb 13, 2020 at 11:13:32AM +, Alexey Bashtanov wrote:
> Hello,
> 
> Currently the documentation says that one can put "a list of table
> expressions"
> after FROM in UPDATE or after USING in DELETE.
> However, "table expression" is defined as a complex of
> FROM, WHERE, GROUP BY and HAVING clauses [1].
> The thing one can list in the FROM clause in a comma-separated manner
> is called a table reference [2].
> SELECT reference does not use this term but explains what they could be [3].
> 
> Please could someone have a look at the patch attached?
> It's not just pedantry but rather based on a real-life example of someone
> reading and being not sure
> whether e.g. joins can be used in there.

Thanks for doing this!

Speaking of examples, there should be more of them illustrating some
of the cases you name.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Just for fun: Postgres 20?

2020-02-12 Thread David Fetter
On Wed, Feb 12, 2020 at 05:25:15PM +0100, Juan José Santamaría Flecha wrote:
> On Wed, Feb 12, 2020 at 3:47 PM Tom Lane  wrote:
> 
> >
> > Yeah; I don't think it's *that* unlikely for it to happen again.  But
> > my own principal concern about this mirrors what somebody else already
> > pointed out: the one-major-release-per-year schedule is not engraved on
> > any stone tablets.  So I don't want to go to a release numbering system
> > that depends on us doing it that way for the rest of time.
> >
> >
> We could you use  as version identifier, so people will not expect
> correlative numbering. SQL Server is being released every couple of years
> and they are using this naming shema. The problem would be releasing twice
> the same year, but how likely would that be?

We've released more than one major version in a year before, so we
have a track record of that actually happening.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Internal key management system

2020-02-11 Thread David Fetter
On Mon, Feb 10, 2020 at 05:57:47PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2020-02-08 12:07:57 -0500, Tom Lane wrote:
> > For the same reason, I don't think that an "internal key management"
> > feature in the core code is ever going to be acceptable.  It has to
> > be an extension.  (But, as long as it's an extension, whether it's
> > bringing its own crypto or relying on some other extension for that
> > doesn't matter from the legal standpoint.)
> 
> I'm not convinced by that. We have optional in-core functionality that
> requires external libraries, and we add more cases, if necessary.

Take for example libreadline, without which our CLI is at best
dysfunctional.

> > > Sure, I know the code is currently calling ooenssl functions. I
> > > was responding to Masahiko-san's message that we might
> > > eventually merge this openssl code into our tree.
> > 
> > No.  This absolutely, positively, will not happen.  There will
> > never be crypto functions in our core tree, because then there'd
> > be no recourse for people who want to use Postgres in countries
> > with restrictions on crypto software.  It's hard enough for them
> > that we have such code in contrib --- but at least they can remove
> > pgcrypto and be legal.  If it's in src/common then they're stuck
> 
> Isn't that basically a problem of the past by now?
> 
> Partially due to changed laws (e.g. France, which used to be a
> problematic case),

It's less of a problem than it once was, but it's not exactly gone yet.
https://en.wikipedia.org/wiki/Restrictions_on_the_import_of_cryptography

> but also because it's basically futile to have
> import restrictions on cryptography by now. Just about every larger
> project contains significant amounts of cryptographic code and it's
> entirely impractical to operate anything interfacing with network
> without some form of transport encryption.  And just about all open
> source distribution mechanism have stopped separating out crypto
> code a long time ago.

That's true.  We have access to legal counsel. Maybe it's worth asking
them how best to include cryptographic functionality, "how" being the
question one asks when one wants to get a positive response.

> I however do agree that we should strive to not introduce
> cryptographic code into the pg source tree - nobody here seems to
> have even close to enough experience to maintaining / writing that.

+1 for not turning ourselves into implementers of cryptographic
primitives.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Use compiler intrinsics for bit ops in hash

2020-01-31 Thread David Fetter
On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > improvement. My biggest cringe might be in pg_bitutils:
> > >
> > > 1. Is ceil_log2_64 dead code?
> >
> > Let's call it nascent code. I suspect there are places it could go, if
> > I look for them.  Also, it seemed silly to have one without the other.
> >
> 
> While not absolutely required, I'd like us to find at least one
> place and start using it. (Clang also nags at me when we have
> unused functions).

Done in the expanded patches attached.

> > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote:
> > > 4. It seems like you *really* would like an operation like LZCNT in x86
> > > (first appearing in Haswell) that is well defined on zero input. ISTM
> > > the alternatives are:
> > >
> > >a) Special case 1. That seems straightforward, but the branching cost
> > >on a seemingly unlikely condition seems to be a lot of performance
> > >loss
> > >
> > >b) Use architecture specific intrinsic (and possibly with CPUID
> > >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
> > >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
> > >instructions that are well defined on zero in most ISA's other than
> > >x86, so maybe we can get away with special-casing x86?
> 
> i. We can detect LZCNT instruction by checking one of the
> "extended feature" (EAX=8001) bits using CPUID. Unlike the
> "basic features" (EAX=1), extended feature flags have been more
> vendor-specific, but fortunately it seems that the feature bit
> for LZCNT is the same [1][2].
> 
> ii. We'll most likely still need to provide a fallback
> implementation for processors that don't have LZCNT (either
> because they are from a different vendor, or an older Intel/AMD
> processor). I wonder if simply checking for 1 is "good enough".
> Maybe a micro benchmark is in order?

I'm not sure how I'd run one on the architectures we support.  What
I've done here is generalize our implementation to be basically like
LZCNT and TZCNT at the cost of a brief branch that might go away at
runtime.

> 2. (My least favorite) use inline asm (a la our popcount
> implementation).

Yeah, I'd like to fix that, but I kept the scope of this one
relatively narrow.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 5fcaa74146206e4de05ca8cbd863aca20bba94bf Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 29 Jan 2020 02:09:59 -0800
Subject: [PATCH v4 1/2] de-long-ify
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..482a569814 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple item);
 static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer,
 IndexTuple *item);
-static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
-static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum);
+static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb);
+static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum);
 
-static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr);
-static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr);
+static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
+static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr);
 
 
 /*
@@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel)
 	/* Initialize free page management. */
 	gfbb->nFreeBlocks = 0;
 	gfbb->freeBlocksLen = 32;
-	gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long));
+	gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64));
 
 	/*
 	 * Current memory context will be used for all in-memory data structures
@@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer,
 /*
  * Select a currently unused block for writing to.
  */
-static long
+static uint64
 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb)
 {
 	/*
@@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBui

Re: Increase psql's password buffer size

2020-01-21 Thread David Fetter
On Tue, Jan 21, 2020 at 07:05:47PM +0100, David Fetter wrote:
> On Tue, Jan 21, 2020 at 10:23:59AM -0500, Bruce Momjian wrote:
> > On Tue, Jan 21, 2020 at 04:19:13PM +0100, David Fetter wrote:
> > > On Tue, Jan 21, 2020 at 10:12:52AM -0500, Bruce Momjian wrote:
> > > > I think we should be using a macro to define the maximum length, rather
> > > > than have 100 used in various places.
> > > 
> > > It's not just 100 in some places. It's different in different places,
> > > which goes to your point.
> > > 
> > > How about using a system that doesn't meaningfully impose a maximum
> > > length? The shell variable is a const char *, so why not just
> > > re(p)alloc as needed?
> > 
> > Uh, how do you know how big to make the buffer that receives the read?
> 
> You can start at any size, possibly even 100, and then increase the
> size in a loop along the lines of (untested)

[and unworkable]

I should have tested the code, but my point about using rep?alloc()
remains.

Best,
David.

Working code:

int main(int argc, char **argv)
{
size_t my_size = 2,
   curr_size = 0;
char *buf;
int c;

buf = (char *) malloc(my_size);

printf("Enter a nice, long string.\n");

while( (c = getchar()) != '\0' )
{
buf[curr_size++] = c;
if (curr_size == my_size)
{
my_size *= 2;
buf = (char *) realloc(buf, my_size);
}
}
printf("The string %s is %zu bytes long.\n", buf, curr_size);
}
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Minor issues in .pgpass

2020-01-21 Thread David Fetter
On Tue, Jan 21, 2020 at 03:27:50PM +0900, Fujii Masao wrote:
> Hi,
> 
> When I was researching the maximum length of password in PostgreSQL
> to answer the question from my customer, I found that there are two
> minor issues in .pgpass file.
> 
> (1) If the length of a line in .pgpass file is larger than 319B,
>libpq silently treats each 319B in the line as a separate
>setting line.

This seems like a potentially serious bug. For example, a truncated
password could get retried enough times to raise intruder alarms, and
it wouldn't be easy to track down.

> (2) The document explains that a line beginning with # is treated
>as a comment in .pgpass. But as far as I read the code,
>there is no code doing such special handling.

This is a flat-out bug, as it violates a promise the documentation has
made.

>Also if the length of that "comment" line is larger than 319B,
>the latter part of the line can be treated as valid setting.

> You may think that these unexpected behaviors are not so harmful
> in practice because "usually" the length of password setting line is
> less than 319B and the hostname beginning with # is less likely to be
> used. But the problem exists. And there are people who want to use
> large password or to write a long comment (e.g., with multibyte
> characters like Japanese) in .pgass, so these may be more harmful
> in the near future.
> 
> For (1), I think that we should make libpq warn if the length of a line
> is larger than 319B, and throw away the remaining part beginning from
> 320B position. Whether to enlarge the length of a line should be
> a separate discussion, I think.

Agreed.

> For (2), libpq should treat any lines beginning with # as comments.

Would it make sense for lines starting with whitespace and then # to
be treated as comments, too, e.g.:

 # Please don't treat this as a parameter

?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Increase psql's password buffer size

2020-01-21 Thread David Fetter
On Tue, Jan 21, 2020 at 10:23:59AM -0500, Bruce Momjian wrote:
> On Tue, Jan 21, 2020 at 04:19:13PM +0100, David Fetter wrote:
> > On Tue, Jan 21, 2020 at 10:12:52AM -0500, Bruce Momjian wrote:
> > > I think we should be using a macro to define the maximum length, rather
> > > than have 100 used in various places.
> > 
> > It's not just 100 in some places. It's different in different places,
> > which goes to your point.
> > 
> > How about using a system that doesn't meaningfully impose a maximum
> > length? The shell variable is a const char *, so why not just
> > re(p)alloc as needed?
> 
> Uh, how do you know how big to make the buffer that receives the read?

You can start at any size, possibly even 100, and then increase the
size in a loop along the lines of (untested)

my_size = 100;
my_buf = char[my_size];
curr_size = 0;
while (c = getchar() != '\0')
{
my_buf[curr_size++] = c;
if (curr_size == my_size) /* If we want an absolute maximum,
 this'd be the place to test for it.
 */
{
my_size *= 2;
repalloc(my_buf, my_size);
}
}

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




  1   2   3   4   5   >