Re: Finer grain log timestamps
On Mon, Jun 13, 2022 at 04:22:42PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Sun, May 8, 2022 at 4:45 PM David Fetter wrote: > >> Please find attached a patch to change the sub-second granularity of > >> log timestamps from milliseconds to microseconds. > > > Why is this a good idea? > > I can imagine that some people would have a use for microsecond > resolution in log files, and I can also imagine that as machines > get faster more people will want that. Your imagination matches situations I've seen in production where there was some ambiguity as to what happened when inside a millisecond boundary, and I'm sure I'm not alone in this. I've gotten this request from at least three people who to my knowledge knew nothing about each other, and as I recall, the first time someone brought it up to me was over five years back. > As against that, this will bloat log files by a non-microscopic > amount, and it's pretty likely to break some log-scanning tools too. Three bytes per line, and log-scanning parsers that finicky are already breaking all the time, respectively. > It's unclear to me that that's a tradeoff we should force on > everyone. The tradeoff we're forcing on people at the moment is a loss of precision they didn't ask for, implemented by some extra instructions they didn't ask us to execute in a part of the code that's a hot path at exactly the times when the machine is busiest. > I think a proposal less likely to have push-back would be to invent > a different log_line_prefix %-escape to produce microseconds. > Sadly, "%u" is already taken, but perhaps we could use "%U"? > > A different line of thought is to extend %t to provide a precision > field a la sprintf, so that for example "%.3t" is equivalent to > "%m" and "%.6t" does what David wants, and we won't have to > search for a new escape letter when the day arrives that > somebody wants nanosecond resolution. The same could be done > with %n, avoiding the need to find a different escape letter > for that. I'll build this more sprintf-like thing if not doing so prevents the change from happening, but frankly, I don't really see a point in it because the next "log timestamps at some random negative power of 10 second granularity" requirement I see will be the first. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Parse CE and BCE in dates and times
On Mon, Jun 13, 2022 at 09:11:56AM +0200, Erik Rijkers wrote: > Op 13-06-2022 om 07:51 schreef David Fetter: > > Folks, > > > > Please find attached a patch to do $Subject. As dates in a fair number > > of fields of endeavor are expressed this way, it seems reasonable to > > ensure tha we can parse them on input. Making it possible to use them > > in output is a more invasive patch, and would involve changes to > > to_date and similar that would require careful consideration. > > Hi David, > > I find some unexpected results: > > # select '112-04-30 BC'::date; > date > --- > 0112-04-30 BC > (1 row) > > but the same with the ' BCE' suffix seems broken: > > # select '112-04-30 BCE'::date; > ERROR: invalid input syntax for type date: "112-04-30 BCE" > LINE 1: select '112-04-30 BCE'::date; > > The same goes for '112-04-30 AD' (works) and its CE version (errors out). > > Or is this as expected? It's not, and thanks for looking at this. Will check to see what's going on here. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Finer grain log timestamps
On Mon, May 09, 2022 at 11:21:26AM +0100, Dagfinn Ilmari Mannsåker wrote: > David Fetter writes: > > > diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c > > index 55ee5423af..4698e32ab7 100644 > > --- src/backend/utils/error/elog.c > > +++ src/backend/utils/error/elog.c > > @@ -2295,7 +2295,7 @@ char * > > get_formatted_log_time(void) > > { > > pg_time_t stamp_time; > > - charmsbuf[13]; > > + charmsbuf[16]; > > Now that it holds microseconds (µs), not milliseconds (ms), should it > not be renamed to `usbuf`? Good point. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 9283d9ae4d8d10876aee1da0753e7db4257e7a11 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Mon, 13 Jun 2022 00:01:04 -0700 Subject: [PATCH v3] Change log timestamps from milli- to microseconds --- doc/src/sgml/config.sgml | 12 ++-- doc/src/sgml/file-fdw.sgml | 2 +- src/backend/utils/error/elog.c | 14 +++--- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml index 5b7ce6531d..adc4fc090a 100644 --- doc/src/sgml/config.sgml +++ doc/src/sgml/config.sgml @@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql %t - Time stamp without milliseconds + Time stamp at second resolution no %m - Time stamp with milliseconds + Time stamp with microseconds no %n - Time stamp with milliseconds (as a Unix epoch) + Time stamp with microseconds (as a Unix epoch) no @@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' This option emits log lines in comma-separated-values (CSV) format, with these columns: -time stamp with milliseconds, +time stamp with microseconds, user name, database name, process ID, @@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' CREATE TABLE postgres_log ( - log_time timestamp(3) with time zone, + log_time timestamp(6) with time zone, user_name text, database_name text, process_id integer, @@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; timestamp string - Time stamp with milliseconds + Time stamp with microseconds user diff --git doc/src/sgml/file-fdw.sgml doc/src/sgml/file-fdw.sgml index 5b98782064..c3efcbd679 100644 --- doc/src/sgml/file-fdw.sgml +++ doc/src/sgml/file-fdw.sgml @@ -242,7 +242,7 @@ CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE pglog ( - log_time timestamp(3) with time zone, + log_time timestamp(6) with time zone, user_name text, database_name text, process_id integer, diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c index 55ee5423af..1ee49ce2be 100644 --- src/backend/utils/error/elog.c +++ src/backend/utils/error/elog.c @@ -2295,7 +2295,7 @@ char * get_formatted_log_time(void) { pg_time_t stamp_time; - char msbuf[13]; + char usbuf[16]; /* leave if already computed */ if (formatted_log_time[0] != '\0') @@ -2315,13 +2315,13 @@ get_formatted_log_time(void) * nonempty or CSV mode can be selected. */ pg_strftime(formatted_log_time, FORMATTED_TS_LEN, - /* leave room for milliseconds... */ -"%Y-%m-%d %H:%M:%S %Z", + /* leave room for microseconds... */ +"%Y-%m-%d %H:%M:%S%Z", pg_localtime(_time, log_timezone)); /* 'paste' milliseconds into place... */ - sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000)); - memcpy(formatted_log_time + 19, msbuf, 4); + sprintf(usbuf, ".%06d", saved_timeval.tv_usec ); + memcpy(formatted_log_time + 19, usbuf, 7); return formatted_log_time; } @@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata) saved_timeval_set = true; } - snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d", + snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d", (long) saved_timeval.tv_sec, - (int) (saved_timeval.tv_usec / 1000)); + saved_timeval.tv_usec); if (padding != 0) appendStringInfo(buf, "%*s", padding, strfbuf); -- 2.36.1
Parse CE and BCE in dates and times
Folks, Please find attached a patch to do $Subject. As dates in a fair number of fields of endeavor are expressed this way, it seems reasonable to ensure tha we can parse them on input. Making it possible to use them in output is a more invasive patch, and would involve changes to to_date and similar that would require careful consideration. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 00654ad8707fef0c9bcb7a8b6a8073d1f143368a Mon Sep 17 00:00:00 2001 From: David Fetter Date: Tue, 8 Jun 2021 23:44:19 -0700 Subject: [PATCH v1] Common Era for dates --- doc/src/sgml/datatype.sgml | 25 doc/src/sgml/datetime.sgml | 4 +- doc/src/sgml/func.sgml | 16 +++--- src/backend/utils/adt/formatting.c | 80 +++--- src/test/regress/expected/date.out | 72 +++ src/test/regress/expected/horology.out | 4 +- src/test/regress/sql/date.sql | 4 +- 7 files changed, 134 insertions(+), 71 deletions(-) diff --git doc/src/sgml/datatype.sgml doc/src/sgml/datatype.sgml index 8e30b82273..d0ca9e500f 100644 --- doc/src/sgml/datatype.sgml +++ doc/src/sgml/datatype.sgml @@ -1740,24 +1740,24 @@ SELECT 'abc \153\154\155 \052\251\124'::bytea; timestamp [ (p) ] [ without time zone ] 8 bytes both date and time (no time zone) -4713 BC -294276 AD +4713 BC[E] +294276 AD/CE 1 microsecond timestamp [ (p) ] with time zone 8 bytes both date and time, with time zone -4713 BC -294276 AD +4713 BC[E] +294276 AD/CE 1 microsecond date 4 bytes date (no time of day) -4713 BC -5874897 AD +4713 BC[E] +5874897 AD/CE 1 day @@ -2168,23 +2168,24 @@ MINUTE TO SECOND Valid input for the time stamp types consists of the concatenation of a date and a time, followed by an optional time zone, - followed by an optional AD or BC. - (Alternatively, AD/BC can appear - before the time zone, but this is not the preferred ordering.) + followed by an optional AD, CE, + BC, or BCE + (Alternatively, AD/CE/BC/BCE + can appear before the time zone, but this is not the preferred ordering.) Thus: -1999-01-08 04:05:06 +2021-01-08 04:05:06 and: -1999-01-08 04:05:06 -8:00 +2021-01-08 04:05:06 -8:00 are valid values, which follow the ISO 8601 standard. In addition, the common format: -January 8 04:05:06 1999 PST +January 8 04:05:06 2021 PST is supported. diff --git doc/src/sgml/datetime.sgml doc/src/sgml/datetime.sgml index ecc3245a46..6266328107 100644 --- doc/src/sgml/datetime.sgml +++ doc/src/sgml/datetime.sgml @@ -167,8 +167,8 @@ -Gregorian years AD 199 can be entered by using 4 digits with leading -zeros (e.g., 0099 is AD 99). +Gregorian years AD/CE 199 can be entered by using 4 digits with leading +zeros (e.g., 0099 is AD/CE 99). diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 7b652460a1..3fcb9793e7 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -7864,13 +7864,13 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); last digit of ISO 8601 week-numbering year -BC, bc, -AD or ad +BC, bc, BCE, bce +AD, ad, CE, ce era indicator (without periods) -B.C., b.c., -A.D. or a.d. +B.C., b.c., B.C.E., b.c.e. +A.D., a.d., C.E., or c.e. era indicator (with periods) @@ -9849,13 +9849,15 @@ SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40'); -The first century starts at 0001-01-01 00:00:00 AD, although -they did not know it at the time. This definition applies to all +The first century starts at 0001-01-01 00:00:00 AD/CE, although +the calendar did not exist at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1 century to 1 century. If you disagree with this, please write your complaint to: -Pope, Cathedral Saint-Peter of Roma, Vatican. +His Holiness, Pope Francis +Apostolic Palace +00120 Vatican City diff --git src/backend/utils/adt/formatting.c src/backend/utils/adt/formatting.c index e909c1a200..11876dbd19 100644 --- src/backend/utils/adt/formatting.c +++ src/backend/utils/adt/formatting.c @@ -238,11 +238,19 @@ static const char *const days_short[] = { #define a_d_STR "a.d." #define AD_STR "AD" #define ad_STR
Re: Finer grain log timestamps
On Sun, May 08, 2022 at 04:12:27PM -0500, Justin Pryzby wrote: > On Sun, May 08, 2022 at 08:44:51PM +0000, David Fetter wrote: > > CREATE TABLE postgres_log > > ( > > - log_time timestamp(3) with time zone, > > + log_time timestamp(6) with time zone, > > Please also update the corresponding thing in doc/src/sgml/file-fdw.sgml Thanks for looking this over, and please find attached the next version. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From f6772fc0378879fd5c7fbf8cb320ee68f66341d2 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 8 May 2022 13:24:33 -0700 Subject: [PATCH v2] Change log timestamps from milli- to microseconds --- doc/src/sgml/config.sgml | 12 ++-- doc/src/sgml/file-fdw.sgml | 2 +- src/backend/utils/error/elog.c | 14 +++--- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml index 03986946a8..f00db09547 100644 --- doc/src/sgml/config.sgml +++ doc/src/sgml/config.sgml @@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql %t - Time stamp without milliseconds + Time stamp at second resolution no %m - Time stamp with milliseconds + Time stamp with microseconds no %n - Time stamp with milliseconds (as a Unix epoch) + Time stamp with microseconds (as a Unix epoch) no @@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' This option emits log lines in comma-separated-values (CSV) format, with these columns: -time stamp with milliseconds, +time stamp with microseconds, user name, database name, process ID, @@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' CREATE TABLE postgres_log ( - log_time timestamp(3) with time zone, + log_time timestamp(6) with time zone, user_name text, database_name text, process_id integer, @@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; timestamp string - Time stamp with milliseconds + Time stamp with microseconds user diff --git doc/src/sgml/file-fdw.sgml doc/src/sgml/file-fdw.sgml index 5b98782064..c3efcbd679 100644 --- doc/src/sgml/file-fdw.sgml +++ doc/src/sgml/file-fdw.sgml @@ -242,7 +242,7 @@ CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE pglog ( - log_time timestamp(3) with time zone, + log_time timestamp(6) with time zone, user_name text, database_name text, process_id integer, diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c index 55ee5423af..4698e32ab7 100644 --- src/backend/utils/error/elog.c +++ src/backend/utils/error/elog.c @@ -2295,7 +2295,7 @@ char * get_formatted_log_time(void) { pg_time_t stamp_time; - char msbuf[13]; + char msbuf[16]; /* leave if already computed */ if (formatted_log_time[0] != '\0') @@ -2315,13 +2315,13 @@ get_formatted_log_time(void) * nonempty or CSV mode can be selected. */ pg_strftime(formatted_log_time, FORMATTED_TS_LEN, - /* leave room for milliseconds... */ -"%Y-%m-%d %H:%M:%S %Z", + /* leave room for microseconds... */ +"%Y-%m-%d %H:%M:%S%Z", pg_localtime(_time, log_timezone)); /* 'paste' milliseconds into place... */ - sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000)); - memcpy(formatted_log_time + 19, msbuf, 4); + sprintf(msbuf, ".%06d", saved_timeval.tv_usec ); + memcpy(formatted_log_time + 19, msbuf, 7); return formatted_log_time; } @@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata) saved_timeval_set = true; } - snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d", + snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d", (long) saved_timeval.tv_sec, - (int) (saved_timeval.tv_usec / 1000)); + saved_timeval.tv_usec); if (padding != 0) appendStringInfo(buf, "%*s", padding, strfbuf); -- 2.35.1
Finer grain log timestamps
Folks, Please find attached a patch to change the sub-second granularity of log timestamps from milliseconds to microseconds. I started out working on a longer patch that will give people more choices than whole seconds and microseconds, but there were a lot of odd corner cases, including what I believe might have been a requirement for C11, should be wish to get sub-microsecond granularity. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From f4881d1669526597bdf608c7c59858f88314f8d1 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 8 May 2022 13:24:33 -0700 Subject: [PATCH v1] Change log timestamps from milli- to microseconds --- doc/src/sgml/config.sgml | 12 ++-- src/backend/utils/error/elog.c | 14 +++--- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml index 03986946a8..f00db09547 100644 --- doc/src/sgml/config.sgml +++ doc/src/sgml/config.sgml @@ -7174,17 +7174,17 @@ local0.*/var/log/postgresql %t - Time stamp without milliseconds + Time stamp at second resolution no %m - Time stamp with milliseconds + Time stamp with microseconds no %n - Time stamp with milliseconds (as a Unix epoch) + Time stamp with microseconds (as a Unix epoch) no @@ -7526,7 +7526,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' This option emits log lines in comma-separated-values (CSV) format, with these columns: -time stamp with milliseconds, +time stamp with microseconds, user name, database name, process ID, @@ -7557,7 +7557,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' CREATE TABLE postgres_log ( - log_time timestamp(3) with time zone, + log_time timestamp(6) with time zone, user_name text, database_name text, process_id integer, @@ -7682,7 +7682,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; timestamp string - Time stamp with milliseconds + Time stamp with microseconds user diff --git src/backend/utils/error/elog.c src/backend/utils/error/elog.c index 55ee5423af..4698e32ab7 100644 --- src/backend/utils/error/elog.c +++ src/backend/utils/error/elog.c @@ -2295,7 +2295,7 @@ char * get_formatted_log_time(void) { pg_time_t stamp_time; - char msbuf[13]; + char msbuf[16]; /* leave if already computed */ if (formatted_log_time[0] != '\0') @@ -2315,13 +2315,13 @@ get_formatted_log_time(void) * nonempty or CSV mode can be selected. */ pg_strftime(formatted_log_time, FORMATTED_TS_LEN, - /* leave room for milliseconds... */ -"%Y-%m-%d %H:%M:%S %Z", + /* leave room for microseconds... */ +"%Y-%m-%d %H:%M:%S%Z", pg_localtime(_time, log_timezone)); /* 'paste' milliseconds into place... */ - sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000)); - memcpy(formatted_log_time + 19, msbuf, 4); + sprintf(msbuf, ".%06d", saved_timeval.tv_usec ); + memcpy(formatted_log_time + 19, msbuf, 7); return formatted_log_time; } @@ -2652,9 +2652,9 @@ log_line_prefix(StringInfo buf, ErrorData *edata) saved_timeval_set = true; } - snprintf(strfbuf, sizeof(strfbuf), "%ld.%03d", + snprintf(strfbuf, sizeof(strfbuf), "%ld.%06d", (long) saved_timeval.tv_sec, - (int) (saved_timeval.tv_usec / 1000)); + saved_timeval.tv_usec); if (padding != 0) appendStringInfo(buf, "%*s", padding, strfbuf); -- 2.35.1
Re: Should we support new definition for Identity column : GENERATED BY DEFAULT ON NULL?
On Wed, Nov 03, 2021 at 11:07:58AM +0100, Vik Fearing wrote: > On 11/2/21 12:19 PM, Himanshu Upadhyaya wrote: > > Hi, > > > > Trying to insert NULL value to the Identity column defined by "GENERATED BY > > DEFAULT" is disallowed, but there can be use cases where the user would > > like to have an identity column where manual NULL insertion is required(and > > it should not error-out by Postgres). > > What could possibly be the use case for this? Unfortunately, the PREPARE/EXECUTE infrastructure, and not just for PostgreSQL, has no way of passing along DEFAULT explicitly, i.e. it can only explicitly pass literals and NULL, so people come up with workarounds like this. I keep saying "explicitly" because one of the workarounds is to pass DEFAULT by omission in the target list, which is not a great way to handle anything, being a communication by silence and all. I'm thinking that lack is the problem we should actually address. > > Thoughts? > > I don't like it. Neither do I, but I do get some of the motivation behind it. I don't suppose the standard actually says much about PREPARE, or whether we should care about anything it happens to say that gets in the way of doing something more helpful. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: PATCH: psql tab completion for SELECT
On Sat, Sep 22, 2018 at 06:53:55PM +1200, Edmund Horner wrote: > Hi all, > > Here are some rebased versions of the last two patches. No changes in > functionality, except a minor case sensitivity fix in the "completion > after commas" patch. > > Edmund I've rebased and updated these to be more principled about what functions could be tab completed. Still missing: tests. What precisely is this supposed to do? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 87fb63d70f5e08f853a2b8aa94e03c0d665f5000 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Thu, 7 Oct 2021 15:59:06 -0700 Subject: [PATCH v9 1/2] Infrastructure for tab completion in the SELECT list --- src/bin/psql/tab-complete.c | 65 ++--- 1 file changed, 54 insertions(+), 11 deletions(-) diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c index ecae9df8ed..5db143523b 100644 --- src/bin/psql/tab-complete.c +++ src/bin/psql/tab-complete.c @@ -1017,6 +1017,45 @@ static const VersionedQuery Query_for_list_of_subscriptions[] = { {0, NULL} }; +static const SchemaQuery Query_for_list_of_selectable_functions[] = { + { + /* min_server_version */ + 70400, + /* catname */ + "pg_catalog.pg_proc p", + /* Disallow functions which take or return pseudotypes which are other than all* or *record */ + "NOT((ARRAY[p.prorettype] || coalesce(p.proallargtypes, p.proargtypes)) " + "&& ARRAY(SELECT oid FROM pg_catalog.pg_type WHERE typtype='p' AND typname !~ '(^all|record$)'))" + /* Disallow stored procedures XXX this may change later. */ + " AND p.prokind <> 'p'" + /* viscondition */ + "pg_catalog.pg_function_is_visible(p.oid)", + /* namespace */ + "p.pronamespace", + /* result */ + "pg_catalog.quote_ident(p.proname)||'('", + /* qualresult */ + NULL + }, + {0, NULL} +}; + +/* + * This addon is used to find (unqualified) column names to include + * alongside the function names from the query above. + */ +static const VersionedQuery Query_addon_for_list_of_selectable_attributes[] = { + {70400, + "UNION ALL " + " SELECT DISTINCT pg_catalog.quote_ident(attname) " + " FROM pg_catalog.pg_attribute " + "WHERE attnum > 0 " + " AND NOT attisdropped " + " AND substring(pg_catalog.quote_ident(attname),1,%d)='%s'" + }, + {0, NULL} +}; + /* * This is a list of all "things" in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. @@ -3768,7 +3807,9 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("IS"); /* SELECT */ - /* naah . . . */ + else if (TailMatches("SELECT") || TailMatches("SELECT", "ALL|DISTINCT")) + COMPLETE_WITH_VERSIONED_SCHEMA_QUERY(Query_for_list_of_selectable_functions, + Query_addon_for_list_of_selectable_attributes); /* SET, RESET, SHOW */ /* Complete with a variable name */ @@ -4432,7 +4473,8 @@ complete_from_versioned_schema_query(const char *text, int state) * where %d is the string length of the text and %s the text itself. * * If both simple_query and schema_query are non-NULL, then we construct - * a schema query and append the (uninterpreted) string simple_query to it. + * a schema query and append the simple_query to it, replacing the %d and %s + * as described above. * * It is assumed that strings should be escaped to become SQL literals * (that is, what is in the query is actually ... '%s' ...) @@ -4579,21 +4621,22 @@ _complete_from_query(const char *simple_query, " WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,%d) =" " substring('%s',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) = 1", char_length, e_text); - - /* If an addon query was provided, use it */ - if (simple_query) -appendPQExpBuffer(_buffer, "\n%s", simple_query); } else { Assert(simple_query); - /* simple_query is an sprintf-style format string */ - appendPQExpBuffer(_buffer, simple_query, - char_length, e_text, - e_info_charp, e_info_charp, - e_info_charp2, e_info_charp2); } + /* + * simple_query is an sprintf-style format string (or it could be NULL, but + * only if this is a schema query with no addon). + */ + if (simple_query) + appendPQExpBuffer(_buffer, simple_query, + char_length, e_text, + e_info_charp, e_info_charp, + e_info_charp2, e_info_charp2); + /* Limit the number of records in the result */ appendPQExpBuffer(_buffer, "\nLIMIT %d", completion_max_records); -- 2.31.1 >From 3764b9fe211db1fa322fa83103d39356928942ae Mon Sep 17 00:00:
Re: When is int32 not an int32?
On Sun, Sep 26, 2021 at 05:32:11PM -0400, David E. Wheeler wrote: > Hell Hackers, long time no email! > > I got a bug report for the semver extension: > > https://github.com/theory/pg-semver/issues/58 > > It claims that a test unexpected passes. That is, Test #31 is expected to > fail, because it intentionally tests a version in which its parts overflow > the int32[3] they’re stored in, with the expectation that one day we can > refactor the type to handle larger version parts. > > I can’t imagine there would be any circumstance under which int32 would > somehow be larger than a signed 32-bit integer, but perhaps there is? > > Scroll to the bottom of these pages to see the unexpected passes on i386 and > armhf: > > > https://ci.debian.net/data/autopkgtest/unstable/i386/p/postgresql-semver/15208658/log.gz > > https://ci.debian.net/data/autopkgtest/unstable/armhf/p/postgresql-semver/15208657/log.gz > > Here’s the Postgres build output for those two platforms, as well, though > nothing jumps out at me: > > > https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=i386=13.4-3=1630408269=0 > > https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=armhf=13.4-3=1630412028=0 I noticed that in i386, configure finds none of (int8, uint8, int64, uint64), and I wonder whether we're actually testing whatever alternative we provide when we don't have them. I also noticed that the first of the long sequences of 9s doesn't even fit inside a uint64. The other two fit inside an int64, so if promotion were somehow happening, that wouldn't be a great test. 999.99.9 over 2^72 over 2^59 over 2^56 These two observations taken together, get me to my first guess is that the machinery we provide when we see non-working 64-bit integers is totally broken. If that's right, we should at least discuss reversing our claim that we support such systems, seeing as it doesn't appear that people will be deploying new versions of PostgreSQL on them. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: psql: tab completion differs on semicolon placement
On Mon, Sep 20, 2021 at 08:26:51PM +0100, Dagfinn Ilmari Mannsåker wrote: > Daniel Gustafsson writes: > > > While testing a patch I fat-fingered a CREATE DATABASE statement by tab > > completing *after* the semicolon, with no space between the objname and > > semicolon. The below options were presented, which at this point aren't > > really > > applicable: > > > > db=# create database foo; > > ALLOW_CONNECTIONS ENCODING LC_COLLATELOCALE > > TABLESPACE > > CONNECTION LIMIT IS_TEMPLATE LC_CTYPE OWNER > > TEMPLATE > > > > DROP DATABASE has a similar tab completion which makes about as much sense: > > > > db=# drop database foo;WITH ( > > > > Checking prev_wd for not ending with ';' as per the attached makes > > "objname;" > > behave like "objname ;". Is there a reason for not doing that which I'm > > missing? I didn't check for others, but if this seems reasonable I'll go > > through to find any other similar cases. > > The same applies to any completion after a MatchAny that ends in a any > of the WORD_BREAKS characters (except whitespace and () which are > handled specially). > > #define WORD_BREAKS "\t\n@$><=;|&{() " > > IMO a fix should be more principled than just special-casing semicolon > and CREATE TABLE. Maybe get_previous_words() should stop when it sees > an unquoted semicolon? Is there some reason get_previous_words() shouldn't stop for everything that's WORD_BREAKS? If not, making that the test might make the general rule a little simpler to write, and if WORD_BREAKS ever changed, for example to include all space, or all breaking space, or similar, the consequences would at least not propagate through seemingly unrelated code. At the moment, get_previous_words() does look for everything in WORD_BREAKS, and then accounts for double quotes (") and then does something clever to account for double quotes and the quoting behavior that doubling them ("") accomplishes. Anyhow, that looks like it should work in this case, but clearly it's not. Would it be less error prone to do these checks and maybe push or pop one or more stacks holding state as each character came in? I suspect the overhead would be unnoticeable even on the slowest* client. Best, David. * One possible exception would be a gigantic paste, a case where psql can be prevented from attempting tab completion, although the prevention measures involve a pretty obscure terminal setting: https://cirw.in/blog/bracketed-paste Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Hook for extensible parsing.
On Wed, Sep 15, 2021 at 02:25:17PM +0100, Simon Riggs wrote: > On Sat, 1 May 2021 at 08:24, Julien Rouhaud wrote: > > > Being able to extend core parser has been requested multiple times, and > > AFAICT > > all previous attempts were rejected not because this isn't wanted but > > because > > the proposed implementations required plugins to reimplement all of the core > > grammar with their own changes, as bison generated parsers aren't > > extensible. > > > > I'd like to propose an alternative approach, which is to allow multiple > > parsers > > to coexist, and let third-party parsers optionally fallback on the core > > parsers. > > Yes, that approach has been discussed by many people, most recently > around the idea to create a fast-path grammar to make the most > frequently used SQL parse faster. > > > 0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only > > the > > ability to produce "select [col, ] col FROM table" parsetree, for testing > > purpose. I chose it to ensure that everything works properly even with a > > totally different grammar that has different keywords, which doesn't even > > ends > > statements with a semicolon but a plain keyword. > > The general rule has always been that we don't just put hooks in, we > always require an in-core use for those hooks. I was reminded of that > myself recently. > > What we need is something in core that actually makes use of this. The > reason for that is not politics, but a simple test of whether the > feature makes sense AND includes all required bells and whistles to be > useful in the real world. > > Core doesn't need a LOL parser and I don't think we should commit that. It doesn't, but it very likely needs something people can use when they create a new table AM, and that we should use the hook in core to implement the heap* table AM to make sure the thing is working at DDL time. > If we do this, I think it should have CREATE LANGUAGE support, so > that each plugin can be seen as an in-core object and allow security > around which users can execute which language types, allow us to > switch between languages and have default languages for specific > users or databases. That's a great idea, but I must be missing something important as it relates to parser hooks. Could you connect those a little more explicitly? Best, David. * It's not actually a heap in the sense that the term is normally used in computing. I'd love to find out how it got to have this name and document same so others aren't also left wondering. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
On Fri, Sep 03, 2021 at 08:27:55PM +0200, Daniel Gustafsson wrote: > > On 19 May 2021, at 09:53, Michael Paquier wrote: > > > > On Tue, Apr 27, 2021 at 12:58:52PM +0300, Aleksander Alekseev wrote: > >> I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here > >> is an alternative version of the patch that fixes this as well. Not > >> sure if this should be in the same commit though. > > > > - /* If we have ALTER TABLE DROP, provide COLUMN or CONSTRAINT */ > > - else if (Matches("ALTER", "TABLE", MatchAny, "DROP")) > > + /* If we have ALTER TABLE ADD|DROP, provide COLUMN or CONSTRAINT > > */ > > + else if (Matches("ALTER", "TABLE", MatchAny, "ADD|DROP")) > > Seems to me that the behavior to not complete with COLUMN or > > CONSTRAINT for ADD is intentional, as it is possible to specify a > > constraint or column name without the object type first. This > > introduces a inconsistent behavior with what we do for columns with > > ADD, for one. So a more consistent approach would be to list columns, > > constraints, COLUMN and CONSTRAINT in the list of options available > > after ADD. > > > > + else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT")) > > + { > > + completion_info_charp = prev3_wd; > > + COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table); > > + } > > Specifying valid constraints is an authorized grammar, so it does not > > seem that bad to keep things as they are, either. I would leave that > > alone. > > This has stalled being marked Waiting on Author since May, and reading the > above it sounds like marking it Returned with Feedback is the logical next > step > (patch also no longer applies). Please find attached the next revision :) Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 716e9dde96ef9973fc6b0fc9ca9934df153af9da Mon Sep 17 00:00:00 2001 From: David Fetter Date: Tue, 14 Sep 2021 15:28:34 -0700 Subject: [PATCH v4] psql: Fix tab completion for ALTER TABLE... ... VALIDATE CONSTRAINT. Completions now include only constraints which have not yet been validated. --- src/bin/psql/tab-complete.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c index 5cd5838668..b55166aa64 100644 --- src/bin/psql/tab-complete.c +++ src/bin/psql/tab-complete.c @@ -788,6 +788,15 @@ Query_for_index_of_table \ " and pg_catalog.quote_ident(c1.relname)='%s'"\ " and pg_catalog.pg_table_is_visible(c1.oid)" +/* the silly-looking length condition is just to eat up the current word */ +#define Query_for_constraint_of_table_not_validated \ +"SELECT pg_catalog.quote_ident(conname) "\ +" FROM pg_catalog.pg_class c1, pg_catalog.pg_constraint con "\ +" WHERE c1.oid=conrelid and (%d = pg_catalog.length('%s'))"\ +" and pg_catalog.quote_ident(c1.relname)='%s'"\ +" and pg_catalog.pg_table_is_visible(c1.oid)" \ +" and NOT con.convalidated" + #define Query_for_all_table_constraints \ "SELECT pg_catalog.quote_ident(conname) "\ " FROM pg_catalog.pg_constraint c "\ @@ -2145,14 +2154,22 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_ATTR(prev3_wd, ""); /* - * If we have ALTER TABLE ALTER|DROP|RENAME|VALIDATE CONSTRAINT, + * If we have ALTER TABLE ALTER|DROP|RENAME CONSTRAINT, * provide list of constraints */ - else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME|VALIDATE", "CONSTRAINT")) + else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME", "CONSTRAINT")) { completion_info_charp = prev3_wd; COMPLETE_WITH_QUERY(Query_for_constraint_of_table); } + /* + * ALTER TABLE VALIDATE CONSTRAINT + */ + else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT")) + { + completion_info_charp = prev3_wd; + COMPLETE_WITH_QUERY(Query_for_constraint_of_table_not_validated); + } /* ALTER TABLE ALTER [COLUMN] */ else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny) || Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny)) -- 2.31.1
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:56:52AM -0400, Robert Haas wrote: > On Mon, Aug 23, 2021 at 10:15 AM Tom Lane wrote: > > And yes, I absolutely would prohibit extensions from accessing many > > of them, if there were a reasonable way to do it. It would be a good > > start towards establishing a defined API for extensions. > > Mostly, it would make extension development more difficult for no > discernable benefit to the project. > > You've made this argument many times over the years ... but we're no > closer to having an extension API than we ever were, and we continue > to get complaints about breaking stuff on Windows on a pretty regular > basis. > > Honestly, it seems unimaginable that an API is ever really going to be > possible. It would be a ton of work to maintain, and we'd just end up > breaking it every time we discover that there's a new feature we want > to implement which doesn't fit into the defined API now. That's what > we do *now* with functions that third-party extensions actually call, > and with variables that they access, and it's not something that, in > my experience, is any great problem in maintaining an extension. > You're running code that is running inside somebody else's executable > and sometimes you have to adjust it for upstream changes. That's life, > and I don't think that complaints about that topic are nearly as > frequent as complaints about extensions breaking on Windows because of > missing PGDLLIMPORT markings. > > And more than that, I'm pretty sure that you've previously taken the > view that we shouldn't document all the hook functions that only exist > in the backend for the purpose of extension use. As the person on the receiving end of that one, I was nonplussed, so I took a step back to think it over. I recognized at that time that I didn't have a great answer for a legitimate concern, namely that any change, however trivial, that goes into the core and doesn't go directly to core database functionality, represents a long-term maintenance burden. The thing I'm coming to is that the key architectural feature PostgreSQL has that other RDBMSs don't is its extensibility. Because that's been a stable feature over time, I'm pretty sure we actually need to see documenting that as a thing that does actually go to core database functionality. Yes, there are resources involved with doing a thing like this, but I don't think that they require constant or even frequent attention from committers or even from seasoned DB hackers. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Default to TIMESTAMP WITH TIME ZONE?
On Sat, Aug 14, 2021 at 10:03:01AM +0200, Peter Eisentraut wrote: > On 13.08.21 19:07, Tom Lane wrote: > > "David G. Johnston" writes: > > > On Fri, Aug 13, 2021 at 9:28 AM Simon Riggs > > > wrote: > > > > The only hope is to eventually change the default, so probably > > > > the best thing is to apply pressure via the SQL Std process. > > > > > Then there is no hope because this makes the situation worse. > > > > Agreed; the points I made upthread are just as valid if the change > > is made in the standard. But I'd be astonished if the SQL > > committee would consider such a change anyway. > > AFAIU, our timestamp with time zone type doesn't really do what the > SQL standard specifies anyway, as it doesn't actually record the > time zone, but it's more of a "timestamp with time zone aware > formatting". For SQL, it might make sense to add a (third) time > stamp type that behaves more like that. The way the standard defines time zones, namely as fix offsets from UTC, is just ludicrous and has been baseless in fact much longer than SQL has existed. If we're going to capture input time zones, we should come up with a way to capture the time zone as it existed when the write occurred, i.e. both its name and the UTC offset it represented at that time of the write. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Slim down integer formatting
On Wed, Jul 28, 2021 at 01:17:43PM +1200, David Rowley wrote: > On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera wrote: > > So how much faster is it than the original? > > I only did some very quick tests. They're a bit noisey. The results > indicate an average speedup of 1.7%, but the noise level is above > that, so unsure. > > create table a (a int); > insert into a select a from generate_series(1,100)a; > vacuum freeze a; > > bench.sql: copy a to '/dev/null'; > > master @ 93a0bf239 > drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres > latency average = 153.815 ms > latency average = 152.955 ms > latency average = 147.491 ms > > master + v2 patch > drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres > latency average = 144.749 ms > latency average = 151.525 ms > latency average = 150.392 ms Thanks for testing this! I got a few promising results early on with -O0, and the technique seemed like a neat way to do things. I generated a million int4s intended to be uniformly distributed across the range of int4, and similarly across int8. int4: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 362.149 ms 359.933 ms latency stddev: 3.44 ms3.40 ms int8: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 434.944 ms 422.270 ms latency stddev: 3.23 ms4.02 ms when compiled with -O2: int4: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 167.262 ms 148.673 ms latency stddev:6.26 ms 1.28 ms i.e. it was actually slower, at least over the 10 runs I did. I assume that "uniform distribution across the range" is a bad case scenario for ints, but I was a little surprised to measure worse performance. Interestingly, what I got for int8s generated to be uniform across their range was int8: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 171.737 ms 174.013 ms latency stddev:1.94 ms 6.84 ms which doesn't look like a difference to me. Intuitively, I'd expect us to get things in the neighborhood of 1 a lot more often than things in the neighborhood of 1 << (30 or 60). Do we have some idea of the distribution, or at least of the distribution family, that we should expect for ints? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Slim down integer formatting
Folks, Please find attached a patch to do $subject. It's down to a one table lookup and 3 instructions. In covering the int64 versions, I swiped a light weight division from the Ryu stuff. I'm pretty sure that what I did is not how to do #includes, but it's a PoC. What would be a better way to do this? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 1cf7202facd9fee161865d90304e5ede1e3c65cf Mon Sep 17 00:00:00 2001 From: David Fetter Date: Mon, 26 Jul 2021 16:43:43 -0700 Subject: [PATCH v1] PoC: Slim down integer printing some more --- src/backend/utils/adt/numutils.c | 62 +++- 1 file changed, 29 insertions(+), 33 deletions(-) diff --git src/backend/utils/adt/numutils.c src/backend/utils/adt/numutils.c index b93096f288..c4f2d5cfb1 100644 --- src/backend/utils/adt/numutils.c +++ src/backend/utils/adt/numutils.c @@ -18,6 +18,7 @@ #include #include +#include "../common/d2s_intrinsics.h" #include "common/int.h" #include "utils/builtins.h" #include "port/pg_bitutils.h" @@ -39,50 +40,45 @@ static const char DIGIT_TABLE[200] = "90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; /* - * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + * Adapted from + * https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/ */ static inline int decimalLength32(const uint32 v) { - int t; - static const uint32 PowersOfTen[] = { - 1, 10, 100, - 1000, 1, 10, - 100, 1000, 1, - 10 + /* + * Each element in the array, when added to a number with MSB that is the + * array index, will produce a number whose top 32 bits are its decimal + * length, so that can now be had using: + * + * 1 CLZ instruction, + * 1 table lookup, + * 1 add, and + * 1 shift + * + */ + static const uint64 digit_transitions[32] = { + 4294967295, 8589934582, 8589934582, 8589934582, 12884901788, + 12884901788, 12884901788, 17179868184, 17179868184, 17179868184, + 21474826480, 21474826480, 21474826480, 21474826480, 25769703776, + 25769703776, 25769703776, 30063771072, 30063771072, 30063771072, + 34349738368, 34349738368, 34349738368, 34349738368, 38554705664, + 38554705664, 38554705664, 41949672960, 41949672960, 41949672960, + 42949672960, 42949672960 }; - /* - * Compute base-10 logarithm by dividing the base-2 logarithm by a - * good-enough approximation of the base-2 logarithm of 10 - */ - t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096; - return t + (v >= PowersOfTen[t]); + return (v + digit_transitions[pg_leftmost_one_pos32(v)]) >> 32; } static inline int decimalLength64(const uint64 v) { - int t; - static const uint64 PowersOfTen[] = { - UINT64CONST(1), UINT64CONST(10), - UINT64CONST(100), UINT64CONST(1000), - UINT64CONST(1), UINT64CONST(10), - UINT64CONST(100), UINT64CONST(1000), - UINT64CONST(1), UINT64CONST(10), - UINT64CONST(100), UINT64CONST(1000), - UINT64CONST(1), UINT64CONST(10), - UINT64CONST(100), UINT64CONST(1000), - UINT64CONST(1), UINT64CONST(10), - UINT64CONST(100), UINT64CONST(1000) - }; - - /* - * Compute base-10 logarithm by dividing the base-2 logarithm by a - * good-enough approximation of the base-2 logarithm of 10 - */ - t = (pg_leftmost_one_pos64(v) + 1) * 1233 / 4096; - return t + (v >= PowersOfTen[t]); + if (v >> 32 == 0) + return decimalLength32(v); + else if (div1e8(v) >> 32 == 0) + return 8 + decimalLength32(div1e8(v)); + else + return 16 + decimalLength32(div1e8(div1e8(v))); } /* -- 2.31.1
Re: [PATCH] Allow multiple recursive self-references
On Thu, Jul 15, 2021 at 09:18:00AM +0200, Denis Hirn wrote: > > The patch does not apply on Head anymore, could you rebase and post a patch. > > Sure thing. Here's the new patch. > > Best wishes, Thanks for updating this. In the next version of the patch, would you be so kind as to include updated documentation of the feature and at least one regression test of same? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use singular number when appropriate
On Tue, Jun 15, 2021 at 09:37:11AM +0200, Laurenz Albe wrote: > On Tue, 2021-06-15 at 04:59 +0000, David Fetter wrote: > > I thought about using the dual, but wasn't sure how many languages > > support it. > > I think none of the languages for which we cater uses the dual. But > I guess you were joking, since the tests are not translated ... I was. > > if (fail_count == 0 && fail_ignore_count == 0) > > snprintf(buf, sizeof(buf), > > -_(" All %d tests passed. "), > > -success_count); > > +_(" %s %d test%s passed. "), > > +success_count == 1 ? "The" : "All", > > +success_count, > > + success_count == 1 ? "" : "s"); > > ... and that wouldn't be translatable. Thanks, will rearrange. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Use singular number when appropriate
Hi, I thought about using the dual, but wasn't sure how many languages support it. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 1fd595576c5972d8604adf77d1959008daebacb0 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Mon, 14 Jun 2021 21:51:20 -0700 Subject: [PATCH v1] Singular number when appropriate --- src/test/regress/pg_regress.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git src/test/regress/pg_regress.c src/test/regress/pg_regress.c index 05296f7ee1..d6f9e829e8 100644 --- src/test/regress/pg_regress.c +++ src/test/regress/pg_regress.c @@ -2641,25 +2641,31 @@ regression_main(int argc, char *argv[], */ if (fail_count == 0 && fail_ignore_count == 0) snprintf(buf, sizeof(buf), - _(" All %d tests passed. "), - success_count); + _(" %s %d test%s passed. "), + success_count == 1 ? "The" : "All", + success_count, + success_count == 1 ? "" : "s"); else if (fail_count == 0) /* fail_count=0, fail_ignore_count>0 */ snprintf(buf, sizeof(buf), - _(" %d of %d tests passed, %d failed test(s) ignored. "), + _(" %d of %d test%s passed, %d failed test%s ignored. "), success_count, success_count + fail_ignore_count, - fail_ignore_count); + success_count == 1 ? "" : "s", + fail_ignore_count, + fail_ignore_count == 1 ? "" : "s"); else if (fail_ignore_count == 0) /* fail_count>0 && fail_ignore_count=0 */ snprintf(buf, sizeof(buf), - _(" %d of %d tests failed. "), + _(" %d of %d test%s failed. "), fail_count, - success_count + fail_count); + success_count + fail_count, + fail_count == 1 ? "" : "s"); else /* fail_count>0 && fail_ignore_count>0 */ snprintf(buf, sizeof(buf), - _(" %d of %d tests failed, %d of these failures ignored. "), + _(" %d of %d test%s failed, %d of these failures ignored. "), fail_count + fail_ignore_count, success_count + fail_count + fail_ignore_count, + fail_count + fail_ignore_count == 1 ? "" : "s", fail_ignore_count); putchar('\n'); -- 2.31.1
Re: pg_receivewal makes a bad daemon
t anything with an incompatible license will upset people who are accustomed to ensuring that we have what legally amounts to an MIT license clean distribution, but I'm thinking that option is at least worth discussing, even if the immediate consensus is, "libreadline is bad enough. We went to a lot of trouble to purge that other stuff back in the bad old days. Let's not make that mistake again." Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Make some column descriptions easier to distinguish visually
Folks, I was writing up a query on pg_constraint, and the columns whose descriptions I've changed here were pretty hard to puzzle out, as they were only distinct up to the difference between F and P, which isn't always easy to see. Please find attached a patch to disambiguate them. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 4202229a9319fa2f307b2761696775a2c5905fcd Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 5 May 2021 15:48:53 -0700 Subject: [PATCH v1] Clarify some column descriptions in pg_constraint To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.31.1" This is a multi-part message in MIME format. --2.31.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git doc/src/sgml/catalogs.sgml doc/src/sgml/catalogs.sgml index 492ed348b3..2c501f21f7 100644 --- doc/src/sgml/catalogs.sgml +++ doc/src/sgml/catalogs.sgml @@ -2664,7 +2664,7 @@ SCRAM-SHA-256$iteration count: (references pg_operator.oid) - If a foreign key, list of the equality operators for PK = FK comparisons + If a foreign key, list of the equality operators for establishing whether primary key columns are equal to the corresponding foreign key columns (PK = FK) @@ -2674,7 +2674,7 @@ SCRAM-SHA-256$iteration count: (references pg_operator.oid) - If a foreign key, list of the equality operators for PK = PK comparisons + If a foreign key, list of the equality operators for establishing whether primary key columns are equal (PK = PK) @@ -2684,7 +2684,7 @@ SCRAM-SHA-256$iteration count: (references pg_operator.oid) - If a foreign key, list of the equality operators for FK = FK comparisons + If a foreign key, list of the equality operators for establishing whether foreign key columns are equal (FK = FK) --2.31.1--
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
On Tue, Apr 27, 2021 at 12:33:25PM +0300, Aleksander Alekseev wrote: > Hi David, > > > I noticed that $subject completes with already valid constraints, > > please find attached a patch that fixes it. I noticed that there are > > other places constraints can be validated, but didn't check whether > > similar bugs exist there yet. > > There was a typo in the patch ("... and and not convalidated"). I've fixed > it. Otherwise the patch passes all the tests and works as expected: > > eax=# create table foo (x int); > CREATE TABLE > eax=# alter table foo add constraint bar check (x < 3) not valid; > ALTER TABLE > eax=# alter table foo add constraint baz check (x <> 5) not valid; > ALTER TABLE > eax=# alter table foo validate constraint ba > bar baz > eax=# alter table foo validate constraint bar; > ALTER TABLE Sorry about that typo, and thanks for poking at this! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
Folks, I noticed that $subject completes with already valid constraints, please find attached a patch that fixes it. I noticed that there are other places constraints can be validated, but didn't check whether similar bugs exist there yet. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From e191928111a7500c3a3bc84558797fe86451c24b Mon Sep 17 00:00:00 2001 From: David Fetter Date: Mon, 26 Apr 2021 17:17:15 -0700 Subject: [PATCH v1] tab-completion of ALTER TABLE...VALIDATE CONSTRAINT To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.30.2" This is a multi-part message in MIME format. --2.30.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit ...now chooses only among constraints that aren't already valid. diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c index 7c493b..2252b7d3ac 100644 --- src/bin/psql/tab-complete.c +++ src/bin/psql/tab-complete.c @@ -785,6 +785,15 @@ static const SchemaQuery Query_for_list_of_collations = { " and pg_catalog.quote_ident(c1.relname)='%s'"\ " and pg_catalog.pg_table_is_visible(c1.oid)" +/* the silly-looking length condition is just to eat up the current word */ +#define Query_for_nonvalid_constraint_of_table \ +"SELECT pg_catalog.quote_ident(conname) "\ +" FROM pg_catalog.pg_class c1, pg_catalog.pg_constraint con "\ +" WHERE c1.oid=conrelid and and not convalidated"\ +" and (%d = pg_catalog.length('%s'))"\ +" and pg_catalog.quote_ident(c1.relname)='%s'"\ +" and pg_catalog.pg_table_is_visible(c1.oid)" + #define Query_for_all_table_constraints \ "SELECT pg_catalog.quote_ident(conname) "\ " FROM pg_catalog.pg_constraint c "\ @@ -2118,11 +2127,16 @@ psql_completion(const char *text, int start, int end) * If we have ALTER TABLE ALTER|DROP|RENAME|VALIDATE CONSTRAINT, * provide list of constraints */ - else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME|VALIDATE", "CONSTRAINT")) + else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME", "CONSTRAINT")) { completion_info_charp = prev3_wd; COMPLETE_WITH_QUERY(Query_for_constraint_of_table); } + else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT")) + { + completion_info_charp = prev3_wd; + COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table); + } /* ALTER TABLE ALTER [COLUMN] */ else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny) || Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny)) --2.30.2--
Re: popcount
On Tue, Mar 23, 2021 at 10:51:08AM +0100, Peter Eisentraut wrote: > On 21.03.21 02:31, David Fetter wrote: > > > I have now read the entire internet on what a suitable name for this > > > function could be. I think the emerging winner is BIT_COUNT(), which > > > already exists in MySQL, and also in Python (int.bit_count()) and Java > > > (Integer.bitCount()). > > > > Thanks for doing this tedious work. Please find attached the next > > version of the patch. > > committing, with some polishing Thanks! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: popcount
On Sat, Mar 20, 2021 at 09:01:25PM +0100, Peter Eisentraut wrote: > On 18.03.21 13:51, John Naylor wrote: > > Hi David, > > > > Just a nitpick: > > > > +SET bytea_output TO hex; > > > > Since we don't see the string in the output, I don't immediately see the > > reason to change the output format here? That's how I got it to work. If there's a way to make it go without that, I'd be delighted to learn what it is :) > > Aside from that, this patch works as expected, and is ready for committer. > > I have now read the entire internet on what a suitable name for this > function could be. I think the emerging winner is BIT_COUNT(), which > already exists in MySQL, and also in Python (int.bit_count()) and Java > (Integer.bitCount()). Thanks for doing this tedious work. Please find attached the next version of the patch. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 9483d41941b5daa44e46e6fb164846647d716406 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v5] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.30.2" This is a multi-part message in MIME format. --2.30.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 68fe6a95b4..066431fd3c 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4030,6 +4030,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + bit_count + +bit_count ( bytes bytea ) +bigint + + +Counts the number of bits set in a binary string. + + +bit_count('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + bit_count + +bit_count ( bits bit ) +bigint + + +Counts the bits set in a bit string. + + +bit_count(B'101010101010101010') +9 + + + @@ -4869,6 +4903,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index e259531f60..feb00eccf9 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'bit_count', prorettype => 'int8', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3876,6 +3879,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'bit_count', prorettype => 'int8', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 2235866244..c9c6c73422 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -36,6 +36,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1878,3 +1879,30 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit string. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + /* There's really no chance of an overflow here because + * to get to INT64_MAX set bits, an object would have to be + * an exbibyte long, exceeding what PostgreSQL can currently + * store by a factor of 2^28 + */ + int64 popcount; + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len; + + p = VARBITS(arg1); + len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE; + + popcount = pg_popcount((char *)p, len); + + PG_RETURN_INT64(popcount); +} diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c index 0bc345aa4d..95
Re: WIP: document the hook system
On Wed, Mar 10, 2021 at 09:38:39AM -0500, David Steele wrote: > On 3/9/21 12:20 PM, Bruce Momjian wrote: > > On Sat, Mar 6, 2021 at 08:32:43PM -0500, Tom Lane wrote: > > > I think that the best you should hope for here is that people are > > > willing to add a short, not-too-detailed para to a markup-free > > > plain-text README file that lists all the hooks. As soon as it > > > gets any more complex than that, either the doco aspect will be > > > ignored, or there simply won't be any more hooks. > > > > > > (I'm afraid I likewise don't believe in the idea of carrying a test > > > module for each hook. Again, requiring that is a good way to > > > ensure that new hooks just won't happen.) > > > > Agreed. If you document the hooks too much, it allows them to drift > > away from matching the code, which makes the hook documentation actually > > worse than having no hook documentation at all. > > There's doesn't seem to be agreement on how to proceed here, so closing. > > David, if you do decide to proceed with a README then it would probably be > best to create a new thread/entry. Thanks for the work on this and the helpful feedback! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [PATCH] pg_permissions
On Sat, Mar 06, 2021 at 08:03:17PM +0100, Joel Jacobson wrote: > Hi, > > It's easy to answer the question... > >- What permissions are there on this specific object? > > ...but to answer the question... > >- What permissions are there for a specific role in the database? > > you need to manually query all relevant pg_catalog or > information_schema.*_privileges views, > which is a O(n) mental effort, while the first question is mentally O(1). > > I think this can be improved by providing humans a single pg_permissions > system view > which can be queried to answer the second question. This should save a lot of > keyboard punches. > > Demo: > > SELECT * FROM pg_permissions WHERE 'joel' IN (grantor,grantee); >regclass | obj_desc | grantor | grantee | > privilege_type | is_grantable > --+-+-+-++-- > pg_namespace | schema foo | joel| joel| USAGE > | f > pg_namespace | schema foo | joel| joel| CREATE > | f > pg_class | table foo.bar | joel| joel| INSERT > | f > pg_class | table foo.bar | joel| joel| SELECT > | f > pg_class | table foo.bar | joel| joel| UPDATE > | f > pg_class | table foo.bar | joel| joel| DELETE > | f > pg_class | table foo.bar | joel| joel| TRUNCATE > | f > pg_class | table foo.bar | joel| joel| REFERENCES > | f > pg_class | table foo.bar | joel| joel| TRIGGER > | f > pg_attribute | column baz of table foo.bar | joel| joel| SELECT > | f > pg_attribute | column baz of table foo.bar | joel| joel| UPDATE > | f > (11 rows) > > All catalogs with _aclitem columns have been included in the view. > > I think a similar one for ownerships would be nice too. > But I'll let you digest this one first to see if the concept is fruitful. +1 for both this and the ownerships view. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: document the hook system
On Fri, Feb 12, 2021 at 08:02:51PM +0300, Anastasia Lubennikova wrote: > On 17.01.2021 16:53, Magnus Hagander wrote: > > On Fri, Jan 15, 2021 at 8:28 AM Peter Eisentraut > > wrote: > > > On 2020-12-31 04:28, David Fetter wrote: > > > > This could probably use a lot of filling in, but having it in the > > > > actual documentation beats needing to know folklore even to know > > > > that the capability is there. > > > This patch seems quite short of a state where one could begin to > > > evaluate it. Documenting the hooks better seems a worthwhile goal. I > > > think the question is whether we can develop documentation that is > > > genuinely useful by itself without studying the relevant source code. > > > This submission does not address that question. > > Even just having a list of available hooks would be a nice improvement > > though :) > > > > But maybe it's something that belongs better in a README file instead, > > since as you say it's unlikely to be properly useful without looking > > at the source anyway. But just a list of hooks and a *very* high > > overview of where each of them hooks in would definitely be useful to > > have somewhere, I think. Having to find with "grep" whether there may > > or may not exist a hook for approximately what it is you're looking > > for is definitely a process to improve on. > > > +1 for README. > Hooks are intended for developers and can be quite dangerous without proper > understanding of the internal code. > > I also want to remind about a readme gathered by mentees [1]. It was done > under a PostgreSQL license, so we can use it. > By the way, is there any agreement on the plain-text format of PostrgeSQL > README files or we can use md? > > [1] https://github.com/AmatanHead/psql-hooks/blob/master/Detailed.md This is much more thorough than what I've done so far, and a much better document in terms of pointing to actual hunks of the source for context. I'm -1 on making a README alone. These are public APIs, and as such, the fact of their existence shouldn't be a mystery discoverable only by knowing that there's something to look for in the source tree and then running an appropriate grep command to find the current ones Would a document simply listing current hooks and pointing to something along the lines of README_hooks in src work better? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [patch] bit XOR aggregate functions
On Sat, Mar 06, 2021 at 09:03:25PM +0100, Vik Fearing wrote: > On 3/6/21 9:00 PM, David Fetter wrote: > > On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote: > >> On 3/6/21 8:55 PM, David Fetter wrote: > >>> On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote: > >>>> On 10.02.21 06:42, Kyotaro Horiguchi wrote: > >>>>> We already had CREATE AGGREATE at the time, so BIT_XOR can be > >>>>> thought as it falls into the same category with BIT_AND and > >>>>> BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation? > >>>> > >>>> I think the use of BIT_XOR is quite separate from BIT_AND and > >>>> BIT_OR. The latter give you an "all" or "any" result of the bits > >>>> set. BIT_XOR will return 1 or true if an odd number of inputs are 1 > >>>> or true, which isn't useful by itself. But it can be used as a > >>>> checksum, so it seems pretty reasonable to me to add it. Perhaps > >>>> the use case could be pointed out in the documentation. > >>> > >>> If this is the only use case, is there some way to refuse to execute > >>> it if it doesn't contain an unambiguous ORDER BY, as illustrated > >>> below? > >>> > >>> SELECT BIT_XOR(b ORDER BY a, c).../* works */ > >>> SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */ > >>> SELECT BIT_XOR(b) FROM... /* errors out */ > >> > >> > >> Why would such an error be necessary, or even desirable? > > > > Because there is no way to ensure that the results remain consistent > > from one execution to the next without such a guarantee. > > I think one of us is forgetting how XOR works. Oops. You're right. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [patch] bit XOR aggregate functions
On Sat, Mar 06, 2021 at 08:57:46PM +0100, Vik Fearing wrote: > On 3/6/21 8:55 PM, David Fetter wrote: > > On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote: > >> On 10.02.21 06:42, Kyotaro Horiguchi wrote: > >>> We already had CREATE AGGREATE at the time, so BIT_XOR can be > >>> thought as it falls into the same category with BIT_AND and > >>> BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation? > >> > >> I think the use of BIT_XOR is quite separate from BIT_AND and > >> BIT_OR. The latter give you an "all" or "any" result of the bits > >> set. BIT_XOR will return 1 or true if an odd number of inputs are 1 > >> or true, which isn't useful by itself. But it can be used as a > >> checksum, so it seems pretty reasonable to me to add it. Perhaps > >> the use case could be pointed out in the documentation. > > > > If this is the only use case, is there some way to refuse to execute > > it if it doesn't contain an unambiguous ORDER BY, as illustrated > > below? > > > > SELECT BIT_XOR(b ORDER BY a, c).../* works */ > > SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */ > > SELECT BIT_XOR(b) FROM... /* errors out */ > > > Why would such an error be necessary, or even desirable? Because there is no way to ensure that the results remain consistent from one execution to the next without such a guarantee. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [patch] bit XOR aggregate functions
On Wed, Mar 03, 2021 at 03:30:15PM +0100, Peter Eisentraut wrote: > On 10.02.21 06:42, Kyotaro Horiguchi wrote: > > We already had CREATE AGGREATE at the time, so BIT_XOR can be > > thought as it falls into the same category with BIT_AND and > > BIT_OR, that is, we may have BIT_XOR as an intrinsic aggregation? > > I think the use of BIT_XOR is quite separate from BIT_AND and > BIT_OR. The latter give you an "all" or "any" result of the bits > set. BIT_XOR will return 1 or true if an odd number of inputs are 1 > or true, which isn't useful by itself. But it can be used as a > checksum, so it seems pretty reasonable to me to add it. Perhaps > the use case could be pointed out in the documentation. If this is the only use case, is there some way to refuse to execute it if it doesn't contain an unambiguous ORDER BY, as illustrated below? SELECT BIT_XOR(b ORDER BY a, c).../* works */ SELECT BIT_XOR(b) OVER (ORDER BY a, c)... /* works */ SELECT BIT_XOR(b) FROM... /* errors out */ Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Public APIs
Hi, While I was looking at the Citus code base for a project at work, I noticed a really ugly thing. It was a UDF called alter_columnar_table_set(). It's clearly there because our current DDL is a few bricks shy of a load, as others have phrased such things, when it comes to accommodating the table AM API. A similar problem exists when it comes to changing any GUC ending in "_libraries". There is no way provided to change an existing value in place, only to overwrite it. This is maybe OK if you'll only ever have between 0 and 1 .SOs loaded, but it gets to be a potentially serious problem if, for example, you have two files in conf.d that set one. Then, we get surprises caused by questions extension implementers really shouldn't be saddled with, like "what order do such files get included in, and what happens when a new one appears?" The general issue, as I see it, is one that we can address by providing reference implementations, however tiny, pointless, and trivial, of each of our extension points https://wiki.postgresql.org/wiki/PostgresServerExtensionPoints. I learned about one, rendezvous variables, while writing this email. Being public APIs, these all really need to be documented in our official documentation, a thing I started on with the patch I'm working on for the hooks system. At a code level, this would be in the spirit of unit tests to ensure that those extension points keep working by putting them in, say, `make check-world` so as not to burden casual test runs. Separately, it would be reasonable to make some efforts to ensure that interactions among them are either safe or disallowed when attempted, whichever seems reasonable to do. We can't cover the entire combinatorial explosion, but adding a cross-check when someone has reported a problem we can reasonably anticipate could recur would be a big improvement. We could start with "black hole" implementations, as Andrew Dunstan, Michaël Paquier, and possibly others, have done, but actual working systems would expose more weak points. Would documenting these APIs be the right place to start? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Feedback on table expansion hook (including patch)
On Sat, Mar 06, 2021 at 01:09:10PM -0500, Tom Lane wrote: > Peter Eisentraut writes: > > On 07.05.20 10:11, Erik Nordström wrote: > >> I am looking for feedback on the possibility of adding a table expansion > >> hook to PostgreSQL (see attached patch). > > > Unlike the get_relation_info_hook, your proposed hook would *replace* > > expand_inherited_rtentry() rather than just tack on additional actions. > > That seems awfully fragile. Could you do with a hook that does > > additional things rather than replace a whole chunk of built-in code? > > I suppose Erik is assuming that he could call expand_inherited_rtentry > (or better, the previous hook occupant) when his special case doesn't > apply. But I'm suspicious that he'd still end up duplicating large > chunks of optimizer/util/inherit.c in order to carry out the special > case, since almost all of that is private/static functions. It > does seem like a more narrowly-scoped hook might be better. > > Would it be unreasonable of us to ask for a worked-out example making > use of the proposed hook? That'd go a long way towards resolving the > question of whether you can do anything useful without duplicating > lots of code. > > I've also been wondering, given the table-AM projects that are > going on, whether we shouldn't refactor things to give partitioned > tables a special access method, and then shove most of the planner > and executor's hard-wired partitioning logic into access method > callbacks. That would make it a lot more feasible for extensions > to implement custom partitioning-like behavior ... or so I guess. That seems pretty reasonable. I suspect that that process will expose bits of the planning and execution machinery that have gotten less isolated than they should be. More generally, and I'll start a separate thread on this, we should be working up to including a reference implementation, however tiny, of every extension point we supply in order to ensure that our APIs are at a minimum reasonably usable and remain so. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: regexp_positions()
On Sat, Feb 27, 2021 at 08:51:27PM +0100, Joel Jacobson wrote: > Hi, > > Finding all matches in a string is convenient using regexp_matches() with the > 'g' flag. > > But if instead wanting to know the start and end positions of the occurrences, > one would have to first call regexp_matches(...,'g') to get all matches, > and then iterate through the results and search using strpos() and length() > repeatedly to find all start and end positions. > > Assuming regexp_matches() internally already have knowledge of the > occurrences, > maybe we could add a regexp_ranges() function that returns a two-dimensional > array, > with all the [[start,end], ...] positions? Maybe an int4multirange, which would fit unless I'm misunderstanding g's meaning with respect to non-overlapping patterns, but that might be a little too cute and not easy ever to extend. Come to that, would a row structure that looked like (match, start, end) be useful? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Extensibility of the PostgreSQL wire protocol
On Mon, Feb 22, 2021 at 07:34:51AM -0500, Dave Cramer wrote: > On Fri, 19 Feb 2021 at 15:39, Álvaro Hernández wrote: > > > On 19/2/21 19:30, Jan Wieck wrote: > > > [...] > > > > > > I also am not sure if building a connection pool into a > > > background worker or postmaster is a good idea to begin with. > > > One of the important features of a pool is to be able to suspend > > > traffic and make the server completely idle to for example be > > > able to restart the postmaster without forcibly disconnecting > > > all clients. A pool built into a background worker cannot do > > > that. > > Yes, when did it become a good idea to put a connection pooler in > the backend??? It became a great idea when we noticed just how large and resource-intensive backends were, especially in light of applications' broad tendency to assume that they're free. While I agree that that's not a good assumption, it's one that's so common everywhere in computing that we really need to face up to the fact that it's not going away any time soon. Decoupling the parts that serve requests from the parts that execute queries also goes a long way toward things we've wanted for quite awhile like admission control systems and/or seamless zero-downtime upgrades. Separately, as the folks at AWS and elsewhere have mentioned, being able to pretend at some level to be a different RDBMS can only happen if we respond to its wire protocol. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: computing dT from an interval
On Mon, Feb 22, 2021 at 10:52:42AM -0500, Tom Lane wrote: > "Michael J. Baars" writes: > > So how do you compute the number of seconds in 8 years? > > IMO, that's a meaningless computation, because the answer is not fixed. > Before you claim otherwise, think about the every-four-hundred-years > leap year exception in the Gregorian rules. Besides, what if the > question is "how many seconds in 7 years"? Then it definitely varies > depending on the number of leap days included. > > What does make sense is timestamp subtraction, where the actual > endpoints of the interval are known. True. I'm not sure whether this is a bug or an infelicity we document, but at least in some parts of the world, this calculation doesn't comport with the calendar in place at the time: SELECT to_timestamp('1753', '') - to_timestamp('1752', ''); ?column? ══ 366 days (1 row) I'd like to imagine nobody will ever go mucking with the calendar to the extent the British did that year, but one never knows. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Extensions not dumped when --schema is used
On Thu, Feb 18, 2021 at 11:13:06AM +0100, Guillaume Lelarge wrote: > Le mar. 26 janv. 2021 à 13:42, Guillaume Lelarge a > écrit : > > > Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge > > a écrit : > > > >> Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud a > >> écrit : > >> > >>> On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge > >>> wrote: > >>> > > >>> > "Anytime soon" was a long long time ago, and I eventually completely > >>> forgot this, sorry. As nobody worked on it yet, I took a shot at it. See > >>> attached patch. > >>> > >>> Great! > >>> > >>> I didn't reviewed it thoroughly yet, but after a quick look it sounds > >>> sensible. I'd prefer to see some tests added, and it looks like a > >>> test for plpgsql could be added quite easily. > >>> > >>> > >> I tried that all afternoon yesterday, but failed to do so. My had still > >> hurts, but I'll try again though it may take some time. > >> > >> > > s/My had/My head/ .. > > > > > I finally managed to get a working TAP test for my patch. I have no idea if > it's good, and if it's enough. Anyway, new version of the patch attached. > > > -- > Guillaume. Thanks for doing this work! > diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml > index bcbb7a25fb..95d45fabfb 100644 > --- a/doc/src/sgml/ref/pg_dump.sgml > +++ b/doc/src/sgml/ref/pg_dump.sgml > @@ -215,6 +215,38 @@ PostgreSQL documentation > > > > + > + -e class="parameter">pattern > + --extension= class="parameter">pattern > + > + > +Dump only extensions matching +class="parameter">pattern. When this option is not > +specified, all non-system extensions in the target database will be > +dumped. Multiple schemas can be selected by writing multiple I think this should read "Multiple extensions". Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [Proposal] Page Compression for OLTP
On Tue, Feb 16, 2021 at 11:15:36PM +0800, chenhj wrote: > At 2021-02-16 21:51:14, "Daniel Gustafsson" wrote: > > >> On 16 Feb 2021, at 15:45, chenhj wrote: > > > >> I want to know whether this patch can be accepted by the community, that > >> is, whether it is necessary for me to continue working for this Patch. > >> If you have any suggestions, please feedback to me. > > > >It doesn't seem like the patch has been registered in the commitfest app so > >it > >may have been forgotten about, the number of proposed patches often outnumber > >the code review bandwidth. Please register it at: > > > > https://commitfest.postgresql.org/32/ > > > >..to make sure it doesn't get lost. > > > >-- > > >Daniel Gustafssonhttps://vmware.com/ > > > Thanks, I will complete this patch and registered it later. > Chen Huajun The simplest way forward is to register it now so it doesn't miss the window for the upcoming commitfest (CF), which closes at the end of this month. That way, everybody has the entire time between now and the end of the CF to review the patch, work on it, etc, and the CF bot will be testing it against the changing code base to ensure people know if such a change causes it to need a rebase. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Tid scan improvements
On Tue, Feb 16, 2021 at 10:22:41PM +1300, David Rowley wrote: > On Thu, 4 Feb 2021 at 23:51, David Rowley wrote: > > Updated patch attached. > > I made another pass over this patch and did a bit of renaming work > around the heap_* functions and the tableam functions. I think the new > names are a bit more aligned to the existing names. Thanks! I'm looking forward to making use of this :) Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Fuzz testing COPY FROM parsing
On Fri, Feb 05, 2021 at 12:45:30PM +0200, Heikki Linnakangas wrote: > Hi, > > I've been mucking around with COPY FROM lately, and to test it, I wrote some > tools to generate input files and load them with COPY FROM: > > https://github.com/hlinnaka/pgcopyfuzz Neat! The way it's already produced results is impressive. Looking at honggfuzz, I see it's been used for wire protocols, of which we have several. Does testing our wire protocols seem like a big lift? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Add SQL function for SHA1
On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote: > Hi all, > > SHA-1 is now an option available for cryptohashes, and like the > existing set of functions of SHA-2, I don't really see a reason why we > should not have a SQL function for SHA1. Attached is a patch doing > that. Thanks for doing this! While there are applications SHA1 is no longer good for, there are plenty where it's still in play. One I use frequently is git. While there are plans for creating an upgrade path to more cryptographically secure hashes, it will take some years before repositories have converted over. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: popcount
On Tue, Jan 19, 2021 at 07:58:12AM -0500, Robert Haas wrote: > On Tue, Jan 19, 2021 at 3:06 AM Peter Eisentraut > wrote: > > On 2021-01-18 16:34, Tom Lane wrote: > > > Peter Eisentraut writes: > > >> [ assorted nits ] > > > > > > At the level of bikeshedding ... I quite dislike using the name "popcount" > > > for these functions. I'm aware that some C compilers provide primitives > > > of that name, but I wouldn't expect a SQL programmer to know that; > > > without that context the name seems pretty random and unintuitive. > > > Moreover, it invites confusion with SQL's use of "pop" to abbreviate > > > "population" in the statistical aggregates, such as var_pop(). > > > > I was thinking about that too, but according to > > <https://en.wikipedia.org/wiki/Hamming_weight>, popcount is an accepted > > high-level term, with "pop" also standing for "population". > > Yeah, I am not sure that it's going to be good to invent our own > name for this, although maybe. But at least I think we should make > sure there are some good comments in an easily discoverable place. > Some people seem to think every programmer in the universe should > know what things like popcount() and fls() and ffs() and stuff like > that are, but it's far from obvious and I often have to refresh my > memory. Let's make it easy for someone to figure out, if they don't > know already. I went with count_set_bits() for the next version, and I hope that helps clarify what it does. > Like just a comment that says "this returns the number of 1 bits in > the integer supplied as an argument" or something can save somebody > a lot of trouble. You bring up an excellent point, which is that our builtin functions could use a lot more documentation directly to hand than they now have. For example, there's a lot of needless ambiguity created by function comments which leave it up in the air as to which positional argument does what in functions like string_to_array, which take multiple arguments. I'll try to get a patch in for the next CF with a fix for that, and a separate one that doesn't put it on people to use \df+ to find the comments we do provide. There have been proposals for including an optional space for COMMENT ON in DDL, but I suspect that those won't fly unless and until they make their way into the standard. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: popcount
On Mon, Jan 18, 2021 at 10:34:10AM -0500, Tom Lane wrote: > Peter Eisentraut writes: > > [ assorted nits ] fixed, I think. > At the level of bikeshedding ... I quite dislike using the name "popcount" > for these functions. I'm aware that some C compilers provide primitives > of that name, but I wouldn't expect a SQL programmer to know that; > without that context the name seems pretty random and unintuitive. > Moreover, it invites confusion with SQL's use of "pop" to abbreviate > "population" in the statistical aggregates, such as var_pop(). > > Perhaps something along the lines of count_ones() or count_set_bits() > would be more apropos. Done that way. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 92f3d5bf1718b1b52a5a4d792f5c75e0bcc7fb74 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v4] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index aa99665e2e..7626edeee7 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4030,6 +4030,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + count_set_bits + +count_set_bits ( bytes bytea ) +bigint + + +Counts the number of bits set in a binary string. + + +count_set_bits('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + count_set_bits + +count_set_bits ( bits bit ) +bigint + + +Counts the bits set in a bit string. + + +count_set_bits(B'101010101010101010') +9 + + + @@ -4869,6 +4903,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index b5f52d4e4a..416ee0c2fb 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'count_set_bits', prorettype => 'int8', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3865,6 +3868,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'count_set_bits', prorettype => 'int8', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 2235866244..c9c6c73422 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -36,6 +36,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1878,3 +1879,30 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit string. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + /* There's really no chance of an overflow here because + * to get to INT64_MAX set bits, an object would have to be + * an exbibyte long, exceeding what PostgreSQL can currently + * store by a factor of 2^28 + */ + int64 popcount; + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len; + + p = VARBITS(arg1); + len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE; + + popcount = pg_popcount((char *)p, len); + + PG_RETURN_INT64(popcount); +} diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c index 479ed9ae54..3f1179a0e8 100644 --- src/backend/utils/adt/varlena.c +++ src/backend/utils/adt/varlena.c @@ -3439,6 +3439,22 @@ bytea_overlay(bytea
Re: NOT VALID for Unique Indexes
On Thu, Jan 14, 2021 at 04:22:17PM +, Simon Riggs wrote: > As you may be aware the NOT VALID qualifier currently only applies to > CHECK and FK constraints, but not yet to unique indexes. I have had > customer requests to change that. This is a great feature! Not exactly on point with this, but in a pretty closely related context, is there some way we could give people the ability to declare at their peril that a constraint is valid without incurring the full scan that VALIDATE currently does? This is currently doable by fiddling directly with the catalog, which operation is broadly more dangerous and ill-advised. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: popcount
On Mon, Jan 11, 2021 at 03:50:54PM +0100, Peter Eisentraut wrote: > On 2020-12-30 17:41, David Fetter wrote: > > > The input may have more than 2 billion bits set to 1. The biggest possible > > > result should be 8 billion for bytea (1 GB with all bits set to 1). > > > So shouldn't this function return an int8? > > It does now, and thanks for looking at this. > > The documentation still reflects the previous int4 return type (in two > different spellings, too). Thanks for looking this over! Please find attached the next version with corrected documentation. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From f0f8e0639a4d08db7d454d5181aa9d98d31264a3 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v3] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 02a37658ad..1d86d610dd 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + popcount + +popcount ( bytes bytea ) +bigint + + +Counts the number of bits set in a binary string. + + +popcount('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + + + + + popcount + +popcount ( bits bit ) +bigint + + +Counts the bits set in a bit string. + + +popcount(B'101010101010101010') +9 + + + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index d7b55f57ea..2d53e0d46d 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3865,6 +3868,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 2235866244..a6a44f3f4f 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -36,6 +36,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1878,3 +1879,29 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit array. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len; + int8 popcount; + + /* + * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE) + * done to minimize branches and instructions. + */ + len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE; + p = VARBITS(arg1); + + popcount = pg_popcount((const char *)p, len); + + PG_RETURN_INT64(popcount); +} diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c index 4838bfb4ff..da3ba769c4 100644 --- src/backend/utils/adt/varlena.c +++ src/backend/utils/adt/varlena.c @@ -3433,6 +3433,22 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl) return result; } +/* + * popcount + */ +Datum +byteapopcount(PG_FUNCTION_ARGS) +{ + bytea *t1 = PG_GETARG_BYTEA_PP(0); + int len; + int8 result; + + len = VARSIZE_ANY_EXHDR(t1); + result = pg_popcount(VARDATA_ANY(t1), len); + + PG_RETURN_INT64(result); +} + /* * byteapos - * Return the position of the specified substring. diff --git src/test/regress/expected/bit.out
Re: Implement for window functions
On Fri, Jan 01, 2021 at 01:21:10PM -0800, Zhihong Yu wrote: > Krasiyan: > Happy New Year. > > For WinGetFuncArgInPartition(): > > + if (target > 0) > + step = 1; > + else if (target < 0) > + step = -1; > + else > + step = 0; > > When would the last else statement execute ? Since the above code is > for WINDOW_SEEK_CURRENT, I wonder why step should be 0. If it does actually need step to be one of those three choices, it might be shorter (well, less branchy) to write as step = (target > 0) - (target < 0); Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Tid scan improvements
On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote: > On Thu, Sep 05, 2019 at 01:06:56PM +1200, Edmund Horner wrote: > > So, I think we need to either get some help from someone familiar with > > heapam.c, or maybe shelve the patch. It has been work in progress for > > over a year now. > > Okay, still nothing has happened after two months. Once this is > solved a new patch submission could be done. For now I have marked > the entry as returned with feedback. I dusted off the last version of the patch without implementing the suggestions in this thread between here and there. I think this is a capability worth having, as I was surprised when it turned out that it didn't exist when I was looking to make an improvement to pg_dump. My idea, which I'll get back to when this is in, was to use special knowledge of heap AM tables to make it possible to parallelize dumps of large tables by working separately on each underlying file, of which there could be quite a few for a large one. Will try to understand the suggestions upthread better and implement same. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 96046239014de8a7dec62e2f60b5210deb1bd32a Mon Sep 17 00:00:00 2001 From: David Fetter Date: Thu, 31 Dec 2020 16:42:07 -0800 Subject: [PATCH v10] first cut To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit create mode 100644 src/include/executor/nodeTidrangescan.h create mode 100644 src/backend/executor/nodeTidrangescan.c create mode 100644 src/test/regress/expected/tidrangescan.out create mode 100644 src/test/regress/sql/tidrangescan.sql --2.29.2 Content-Type: text/x-patch; name="v10-0001-first-cut.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="v10-0001-first-cut.patch" diff --git src/include/access/tableam.h src/include/access/tableam.h index 387eb34a61..5776f8ba6e 100644 --- src/include/access/tableam.h +++ src/include/access/tableam.h @@ -218,6 +218,15 @@ typedef struct TableAmRoutine bool set_params, bool allow_strat, bool allow_sync, bool allow_pagemode); + /* + * Set the range of a scan. + * + * Optional callback: A table AM can implement this to enable TID range + * scans. + */ + void (*scan_setlimits) (TableScanDesc scan, + BlockNumber startBlk, BlockNumber numBlks); + /* * Return next tuple from `scan`, store in slot. */ @@ -875,6 +884,16 @@ table_rescan(TableScanDesc scan, scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false); } +/* + * Set the range of a scan. + */ +static inline void +table_scan_setlimits(TableScanDesc scan, + BlockNumber startBlk, BlockNumber numBlks) +{ + scan->rs_rd->rd_tableam->scan_setlimits(scan, startBlk, numBlks); +} + /* * Restart a relation scan after changing params. * diff --git src/include/catalog/pg_operator.dat src/include/catalog/pg_operator.dat index 9c6bf6c9d1..bb7193b9e7 100644 --- src/include/catalog/pg_operator.dat +++ src/include/catalog/pg_operator.dat @@ -237,15 +237,15 @@ oprname => '<', oprleft => 'tid', oprright => 'tid', oprresult => 'bool', oprcom => '>(tid,tid)', oprnegate => '>=(tid,tid)', oprcode => 'tidlt', oprrest => 'scalarltsel', oprjoin => 'scalarltjoinsel' }, -{ oid => '2800', descr => 'greater than', +{ oid => '2800', oid_symbol => 'TIDGreaterOperator', descr => 'greater than', oprname => '>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool', oprcom => '<(tid,tid)', oprnegate => '<=(tid,tid)', oprcode => 'tidgt', oprrest => 'scalargtsel', oprjoin => 'scalargtjoinsel' }, -{ oid => '2801', descr => 'less than or equal', +{ oid => '2801', oid_symbol => 'TIDLessEqOperator', descr => 'less than or equal', oprname => '<=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool', oprcom => '>=(tid,tid)', oprnegate => '>(tid,tid)', oprcode => 'tidle', oprrest => 'scalarlesel', oprjoin => 'scalarlejoinsel' }, -{ oid => '2802', descr => 'greater than or equal', +{ oid => '2802', oid_symbol => 'TIDGreaterEqOperator', descr => 'greater than or equal', oprname => '>=', oprleft => 'tid', oprright => 'tid', oprresult => 'bool', oprcom => '<=(tid,tid)', oprnegate => '<(tid,tid)', oprcode => 'tidge', oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' }, diff --git src/include/executor/nodeTidrangescan.h src/include/executor/nodeTidrangescan.h new file mode 100644 index
WIP: document the hook system
Please find attached :) This could probably use a lot of filling in, but having it in the actual documentation beats needing to know folklore even to know that the capability is there. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 5152b3f5255ec91b6ea34b76ea765a26d392d3ac Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 19:13:57 -0800 Subject: [PATCH v1] WIP: Document the hooks system To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Outline of something that should probably have more detail, but it's probably better than nothing at all. create mode 100644 doc/src/sgml/hooks.sgml --2.29.2 Content-Type: text/x-patch; name="v1-0001-WIP-Document-the-hooks-system.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="v1-0001-WIP-Document-the-hooks-system.patch" diff --git doc/src/sgml/filelist.sgml doc/src/sgml/filelist.sgml index 38e8aa0bbf..16f78b371a 100644 --- doc/src/sgml/filelist.sgml +++ doc/src/sgml/filelist.sgml @@ -58,6 +58,7 @@ + diff --git doc/src/sgml/hooks.sgml doc/src/sgml/hooks.sgml new file mode 100644 index 00..5891f74dc8 --- /dev/null +++ doc/src/sgml/hooks.sgml @@ -0,0 +1,321 @@ + + + + Hooks System + + + Hook + + + + This chapter explains the PostgreSQL + hooks system, a way to inject functionality in many different parts + of the database via function pointers to shared libraries. They simply + need to honor the hooks API, which consists roughly of the following: + + + + + Initialize a hook of the correct type for what you are doing, conventionally + using a function with signature: + +void _PG_init(void) + + which initializes whatever the hook code needs initialized. Here, you will cache + any previous hook(s) with code that looks like: + +prev_Foo = Foo_hook; +prev_Bar = Bar_hook; + + + + + + + Let your imagination run wild, but try very hard not to crash while doing so. + + + + + + Restore cached hooks in a function conventionally called + +void _PG_fini(void) + + with code that looks like: + +Foo_hook = prev_Foo; +Bar_hook = prev_Bar; + + + + + + + + + Available Hooks + + + The following backend hooks are available for your use: + + + + Analyze hook + + +Take control at the end of parse analysis. + + + + + Authentication hook + + +Take control in ClientAuthentication. + + + + + Logging hook + + +Intercept messages before they are sent to the server log. + + + + + Executor hooks + + +The following hooks control the executor: + + + +Executor Start Hook + + + Take control in ExecutorStart(). + + + + +Executor Run Hook + + + Take control in ExecutorRun(). + + + + +Executor Finish Hook + + + Take control in ExecutorFinish(). + + + + +Executor End Hook + + + Take control in ExecutorEnd(). + + + + +Executor Check Permissions Hook + + + Take control in ExecutorCheckRTPerms(). + + + + + + + Explain hook + + +Take control of EXPLAIN. + + + +Explain One Query + + + Take control in ExplainOneQuery(). + + + + +Explain Get Index Name + + + Take control in explain_get_index_name(). + + + + + + + Function Call Hooks + + +Needs Function Manager + + + Look around and decide whether the function manager hook is needed. + + + + +Function Manager + + + Take control at start, end, or abort of a ROUTINE. + + + + + + Shared Memory Startup + + +Take control of chunks of shared memory at startup. + + + + + OpenSSL TLS Init + + +Take control in initialization of OpenSSL. + + + + + Get Attribute Average Width + + +Take control in get_attavgwidth(). + + + + + Object Access Hooks + + +Take control just before or just after CREATE, +DROP, ALTER, +namespace search, function execution, and TRUNCATE. + + + + + Path (Optimizer) Hooks + + +Take control during query planning. + + + +Set Relation Pathlist + + + Take control in set_rel_pathlist(). + + + + +Set Join Pathlist + + + Take control in set_join_pathlist(). + + + + +Join Search + + + Take control in join_search(). + + + + + + Get Relation Info Hook + + +Take control in get_relation_info(). + + + + + Planner Hooks + + +Take control in parts of the planner inten
Re: Let people set host(no)ssl settings from initdb
On Wed, Dec 30, 2020 at 03:00:17PM -0500, Tom Lane wrote: > David Fetter writes: > > On Wed, Dec 30, 2020 at 08:24:06PM +0100, David Fetter wrote: > >> On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote: > >>> I have looked at the patch of this thread, and I doubt that it is a > >>> good idea to put more burden into initdb for that. I agree that > >>> being able to reject easily non-SSL connections in pg_hba.conf is a > >>> bit of a hassle now, but putting more logic into initdb does not seem > >>> the right course to me. Perhaps we could consider an idea like > >>> Peter's to have a sslmode=require on the server side and ease the > >>> generation of HBA rules.. > > >> Please find attached the rebased patch. > >> > >> Peter's suggestion seems a little more subtle to me than requiring TLS > >> on the server side in that what people generally want to do is > >> disallow clear text connections entirely. In those scenarios, people > >> would also want to set (and be able to change at runtime) some kind of > >> cryptographic policy, as SSH and TLS do. While I see this as a worthy > >> goal, it's a much bigger lift than an optional argument or two to > >> initdb, and requires a lot more discussion than it's had to date. > > FWIW, I still agree with what Michael says above. I do not think > that adding more options to initdb is a useful solution here. > In the first place, it's unlikely that we'll manage to cover many > people's exact requirements this way. In the second place, it's > very unclear where to stop adding options. In the third place, > I believe the vast majority of users don't invoke initdb "by hand" > anymore. The typical scenario is to go through a packager-provided > script, which almost certainly won't offer access to these additional > options. In the fourth place, many people won't know at initdb time > exactly what they should do, or they'll change their minds later. To that last, I suspect that there are folks in regulated industries who want to be able to show that they've deployed at some kind of minimal level of protection. If there's not a window during which a non-conforming pg_hba.conf is in play, that's easier to do. > The last two points suggest that what'd be more useful is some sort > of tool to modify an existing pg_hba.conf file. Or maybe even just > audit a file to see if it implements $desired-policy, such as > "no unencrypted network connections" or "no plaintext passwords". > (I kind of like the auditing-tool approach; it seems less scary > than something that actually rewrites the file.) Am I understanding correctly that you're suggesting we write up a formal specification of pg_hba.conf? That could be handy if we don't choose to export the parser the backend uses for it, for example because we want it to respond super quickly to HUPs, which might conflict with making it usable by things that weren't the backend. I agree that anything that does a write to pg_hba.conf needs to be approached with a good deal of caution. Audit tools such as you propose could spit out a suggestion that doesn't overwrite, although it could get a little hairy if it's intended to patch something that has an include directive in it. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Implement for window functions
On Wed, Dec 30, 2020 at 09:32:26PM +0200, Krasiyan Andreev wrote: > Hi, after latest committed patches about multirange datatypes, I get a > compilation error, Oh, right. I'd been meaning to send a patch to fix that. Here it is. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 7c8ae2cac7b2157f48e9e15c863eccde29a539b4 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Tue, 22 Dec 2020 19:47:35 -0800 Subject: [PATCH v2] Vik's patch To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 5021ac1ca9..0e877c48df 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -20373,6 +20373,14 @@ SELECT count(*) FROM sometable; about frame specifications. + + The functions lead, lag, + first_value, last_value, and + nth_value accept a null treatment option which is + RESPECT NULLS or IGNORE NULLS. + If this option is not specified, the default is RESPECT NULLS. + + When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. @@ -20386,14 +20394,9 @@ SELECT count(*) FROM sometable; -The SQL standard defines a RESPECT NULLS or -IGNORE NULLS option for lead, lag, -first_value, last_value, and -nth_value. This is not implemented in -PostgreSQL: the behavior is always the -same as the standard's default, namely RESPECT NULLS. -Likewise, the standard's FROM FIRST or FROM LAST -option for nth_value is not implemented: only the +The SQL standard defines a FROM FIRST or FROM LAST +option for nth_value. This is not implemented in +PostgreSQL: only the default FROM FIRST behavior is supported. (You can achieve the result of FROM LAST by reversing the ORDER BY ordering.) diff --git doc/src/sgml/ref/create_function.sgml doc/src/sgml/ref/create_function.sgml index 3c1eaea651..31e08c26b4 100644 --- doc/src/sgml/ref/create_function.sgml +++ doc/src/sgml/ref/create_function.sgml @@ -28,6 +28,7 @@ CREATE [ OR REPLACE ] FUNCTION { LANGUAGE lang_name | TRANSFORM { FOR TYPE type_name } [, ... ] | WINDOW +| TREAT NULLS | IMMUTABLE | STABLE | VOLATILE | [ NOT ] LEAKPROOF | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT | [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER @@ -293,6 +294,17 @@ CREATE [ OR REPLACE ] FUNCTION + + TREAT NULLS + + + TREAT NULLS indicates that the function is able + to handle the RESPECT NULLS and IGNORE + NULLS options. Only window functions may specify this. + + + + IMMUTABLE STABLE diff --git doc/src/sgml/syntax.sgml doc/src/sgml/syntax.sgml index d66560b587..103595d21b 100644 --- doc/src/sgml/syntax.sgml +++ doc/src/sgml/syntax.sgml @@ -1766,6 +1766,8 @@ FROM generate_series(1,10) AS s(i); The syntax of a window function call is one of the following: +function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER window_name +function_name (expression , expression ... ) [ RESPECT NULLS | IGNORE NULLS ] OVER ( window_definition ) function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER window_name function_name (expression , expression ... ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition ) function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name @@ -1779,6 +1781,18 @@ FROM generate_series(1,10) AS s(i); [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ frame_clause ] + + + + + The versions with RESPECT NULLS or IGNORE + NULLS only apply to true window functions, whereas the versions + with FILTER only apply to aggregate functions used as + window functions. + + + + The optional frame_clause can be one of diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index 139f4a08bd..50c7b47be2 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -9770,33 +9770,42 @@ proargtypes => 'int4', prosrc => 'window_ntile' }, { oid => '3106', descr => 'fetch the preceding row value', proname => 'lag', prokind => 'w', prorettype => 'anyelement', - proargtypes => 'anyelement', prosrc => 'window_lag' }, + proargtypes => 'anyelement', prosrc => 'window_lag', + pronulltreatment => 't' }, { oid => '3107', descr => 'fetch the Nth preceding row value', proname => 'lag', prokind => 'w', prorettype => 'anyelement', - proargtypes => 'anyelement int4',
Re: Let people set host(no)ssl settings from initdb
On Wed, Dec 30, 2020 at 08:24:06PM +0100, David Fetter wrote: > On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote: > > On Thu, Jul 02, 2020 at 04:02:21PM +0200, Daniel Gustafsson wrote: > > > The CF Patch Tester consider this patch to be malformed and is unable to > > > apply > > > and test it. Can you please submit a rebased version? > > > > I have looked at the patch of this thread, and I doubt that it is a > > good idea to put more burden into initdb for that. I agree that > > being able to reject easily non-SSL connections in pg_hba.conf is a > > bit of a hassle now, but putting more logic into initdb does not seem > > the right course to me. Perhaps we could consider an idea like > > Peter's to have a sslmode=require on the server side and ease the > > generation of HBA rules.. > > > > The patch has stalled for two months now without a rebase provided, so > > I am marking it as returned with feedback. > > Please find attached the rebased patch. > > Peter's suggestion seems a little more subtle to me than requiring TLS > on the server side in that what people generally want to do is > disallow clear text connections entirely. In those scenarios, people > would also want to set (and be able to change at runtime) some kind of > cryptographic policy, as SSH and TLS do. While I see this as a worthy > goal, it's a much bigger lift than an optional argument or two to > initdb, and requires a lot more discussion than it's had to date. *sigh* This time with patch actually attached. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 537e58136890d17d33699e9965d0518d4c8a1822 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 11 Dec 2019 18:55:03 -0800 Subject: [PATCH v5] Enable setting pg_hba.conf permissions from initdb To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit The new optional arguments are --auth-hostssl, --auth-hostnossl, --auth-hostgssenc, and --auth-hostnogssenc. diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml index 385ac25150..c63e2c316e 100644 --- doc/src/sgml/ref/initdb.sgml +++ doc/src/sgml/ref/initdb.sgml @@ -152,6 +152,50 @@ PostgreSQL documentation + + --auth-hostssl=authmethod + + +This option specifies the authentication method for users via +TLS remote connections used in pg_hba.conf +(hostssl lines). + + + + + + --auth-hostnossl=authmethod + + +This option specifies the authentication method for users via +non-TLS remote connections used in pg_hba.conf +(hostnossl lines). + + + + + + --auth-hostgssenc=authmethod + + +This option specifies the authentication method for users via +encrypted GSSAPI remote connections used in pg_hba.conf +(hostgssenc lines). + + + + + + --auth-hostnogssenc=authmethod + + +This option specifies the authentication method for users via +remote connections not using encrypted GSSAPI in pg_hba.conf +(hostnogssenc lines). + + + + --auth-local=authmethod diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample index b6de12b298..e132cf4db3 100644 --- src/backend/libpq/pg_hba.conf.sample +++ src/backend/libpq/pg_hba.conf.sample @@ -91,3 +91,15 @@ hostall all ::1/128 @authmethodhost@ @remove-line-for-nolocal@local replication all @authmethodlocal@ hostreplication all 127.0.0.1/32@authmethodhost@ hostreplication all ::1/128 @authmethodhost@ + +# hostnossl all all 0.0.0.0/0 @authmethodhostnossl@ +# hostnossl all all ::/0@authmethodhostnossl@ + +# hostssl all all 0.0.0.0/0 @authmethodhostssl@ +# hostssl all all ::/0@authmethodhostssl@ + +# hostnogssenc all all 0.0.0.0/0 @authmethodhostnogssenc@ +# hostnogssenc all all ::/0@authmethodhostnogssenc@ + +# hostgssenc all all 0.0.0.0/0 @authmethodhostgssenc@ +# hostgssenc all all ::/0@authmethodhostgssenc@ diff --git src/bin/init
Re: Let people set host(no)ssl settings from initdb
On Mon, Sep 07, 2020 at 11:57:58AM +0900, Michael Paquier wrote: > On Thu, Jul 02, 2020 at 04:02:21PM +0200, Daniel Gustafsson wrote: > > The CF Patch Tester consider this patch to be malformed and is unable to > > apply > > and test it. Can you please submit a rebased version? > > I have looked at the patch of this thread, and I doubt that it is a > good idea to put more burden into initdb for that. I agree that > being able to reject easily non-SSL connections in pg_hba.conf is a > bit of a hassle now, but putting more logic into initdb does not seem > the right course to me. Perhaps we could consider an idea like > Peter's to have a sslmode=require on the server side and ease the > generation of HBA rules.. > > The patch has stalled for two months now without a rebase provided, so > I am marking it as returned with feedback. Please find attached the rebased patch. Peter's suggestion seems a little more subtle to me than requiring TLS on the server side in that what people generally want to do is disallow clear text connections entirely. In those scenarios, people would also want to set (and be able to change at runtime) some kind of cryptographic policy, as SSH and TLS do. While I see this as a worthy goal, it's a much bigger lift than an optional argument or two to initdb, and requires a lot more discussion than it's had to date. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: popcount
On Wed, Dec 30, 2020 at 05:27:04PM +0100, Daniel Verite wrote: > David Fetter wrote: > > +Datum > +byteapopcount(PG_FUNCTION_ARGS) > +{ > + bytea *t1 = PG_GETARG_BYTEA_PP(0); > + int len, result; > + > + len = VARSIZE_ANY_EXHDR(t1); > + result = pg_popcount(VARDATA_ANY(t1), len); > + > + PG_RETURN_INT32(result); > +} > > The input may have more than 2 billion bits set to 1. The biggest possible > result should be 8 billion for bytea (1 GB with all bits set to 1). > So shouldn't this function return an int8? It does now, and thanks for looking at this. > There is a pre-existing function in core: bit_length(bytea) returning int4, > but it errors out for large values (in fact it's a SQL function that returns > octet_length($1)*8). > > For the varbit case, it doesn't seem like it can hold so many bits, although > I don't see the limit documented in > https://www.postgresql.org/docs/current/datatype-bit.html Out of an abundance of caution, I changed that one, too. I'd noticed earlier that ceil() doesn't need to be quite as wasteful as the way I had it earlier, and fixed that with help from Andrew Gierth. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 6d35100b3bc16c1c7d3bf08c62589f3e039cee76 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v2] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 5021ac1ca9..439a3a4a06 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + popcount + +popcount ( bytes bytea ) +integer + + +Counts the number of bits set in a binary string. + + +popcount('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + + + + + popcount + +popcount ( bits bit ) +int4 + + +Counts the bits set in a bit string. + + +popcount(B'101010101010101010') +9 + + + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index 139f4a08bd..82e3d50114 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3865,6 +3868,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'popcount', prorettype => 'int8', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 3c03459f51..a71c2b6bf4 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -36,6 +36,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1872,3 +1873,29 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit array. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len; + int8 popcount; + + /* + * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE) + * done to minimize branches and instructions. + */ + len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE; + p = VARBITS(arg1); + + popcount = pg_popcount((const char *)p, len); + + PG_RETURN_INT64(popcount); +} diff --git src/backen
popcount
Hi, Per request, I'd like to see about surfacing $Subject to SQL. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 28c9b7acad605197a8a242ff929bcce6f3100c91 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 30 Dec 2020 02:51:46 -0800 Subject: [PATCH v1] popcount To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.29.2" This is a multi-part message in MIME format. --2.29.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Now it's accessible to SQL for the BIT VARYING and BYTEA types. diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml index 5021ac1ca9..439a3a4a06 100644 --- doc/src/sgml/func.sgml +++ doc/src/sgml/func.sgml @@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); + + + + popcount + +popcount ( bytes bytea ) +integer + + +Counts the number of bits set in a binary string. + + +popcount('\xdeadbeef'::bytea) +24 + + + @@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); 101010001010101010 + + + + + popcount + +popcount ( bits bit ) +int4 + + +Counts the bits set in a bit string. + + +popcount(B'101010101010101010') +9 + + + diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat index 139f4a08bd..0e6d0636fa 100644 --- src/include/catalog/pg_proc.dat +++ src/include/catalog/pg_proc.dat @@ -1446,6 +1446,9 @@ { oid => '752', descr => 'substitute portion of string', proname => 'overlay', prorettype => 'bytea', proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' }, +{ oid => '8436', descr => 'count set bits', + proname => 'popcount', prorettype => 'int4', proargtypes => 'bytea', + prosrc => 'byteapopcount'}, { oid => '725', proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line', @@ -3865,6 +3868,9 @@ { oid => '3033', descr => 'set bit', proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4', prosrc => 'bitsetbit' }, +{ oid => '8435', descr => 'count set bits', + proname => 'popcount', prorettype => 'int4', proargtypes => 'bit', + prosrc => 'bitpopcount'}, # for macaddr type support { oid => '436', descr => 'I/O', diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c index 3c03459f51..e92d1a7f8f 100644 --- src/backend/utils/adt/varbit.c +++ src/backend/utils/adt/varbit.c @@ -29,6 +29,8 @@ *- */ +#include + #include "postgres.h" #include "access/htup_details.h" @@ -36,6 +38,7 @@ #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/varbit.h" @@ -1872,3 +1875,24 @@ bitgetbit(PG_FUNCTION_ARGS) else PG_RETURN_INT32(0); } + +/* + * bitpopcount + * + * Returns the number of bits set in a bit array. + * + */ +Datum +bitpopcount(PG_FUNCTION_ARGS) +{ + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + bits8 *p; + int len, popcount; + + len = ceil(VARBITLEN(arg1) / (float4)BITS_PER_BYTE); + p = VARBITS(arg1); + + popcount = (int)pg_popcount((const char *)p, len); + + PG_RETURN_INT32(popcount); +} diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c index 9300d19e0c..9e788babd3 100644 --- src/backend/utils/adt/varlena.c +++ src/backend/utils/adt/varlena.c @@ -3415,6 +3415,21 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl) return result; } +/* + * popcount + */ +Datum +byteapopcount(PG_FUNCTION_ARGS) +{ + bytea *t1 = PG_GETARG_BYTEA_PP(0); + int len, result; + + len = VARSIZE_ANY_EXHDR(t1); + result = pg_popcount(VARDATA_ANY(t1), len); + + PG_RETURN_INT32(result); +} + /* * byteapos - * Return the position of the specified substring. diff --git src/test/regress/expected/bit.out src/test/regress/expected/bit.out index a1fab7ebcb..c91ad3d0d5 100644 --- src/test/regress/expected/bit.out +++ src/test/regress/expected/bit.out @@ -681,6 +681,19 @@ SELECT overlay(B'0101011100' placing '001' from 20); 010101111 (1 row) +-- Popcount +SELECT popcount(B'0101011100'::bit(10)); + popcount +-- +5 +(1 row) + +SELECT popcount(B'11'::bit(10)); + popcount +-- + 10 +(1 row) + -- This table is intentionally left aroun
Re: range_agg
On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote: > Hi, > > This is not an ideal way to index multirages, but something we can > easily have. What sort of indexing improvements do you have in mind? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: \gsetenv
On Sun, Dec 20, 2020 at 10:42:40PM +0200, Heikki Linnakangas wrote: > On 20/12/2020 21:05, David Fetter wrote: > > We have plenty of ways to spawn shells and cause havoc, and we > > wouldn't be able to block them all even if we decided to put a bunch > > of pretty onerous restrictions on psql at this very late date. We have > > \set, backticks, \!, and bunches of things less obvious that could, > > even without a compromised server, cause real mischief. > > There is a big difference between having to trust the server or not. Yeah, > you could cause a lot of mischief if you let a user run arbitrary psql > scripts on your behalf. But that's no excuse for opening up a whole another > class of problems. I'm skittish about putting exploits out in public in advance of discussions about how to mitigate them, but I have constructed several that do pretty bad things using only hostile content in a server and the facilities `psql` already provides. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: \gsetenv
On Sun, Dec 20, 2020 at 01:07:12PM -0500, Tom Lane wrote: > David Fetter writes: > > On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote: > >> SELECT 'Calvin' AS foo \gset > >> \setenv FOO :foo > >> \! echo $FOO > >> Calvin > > > You're the second person who's mentioned this workaround, which goes > > to a couple of points I tried to make earlier: > > > - This is not by any means a new capability, just a convenience, and > > - In view of the fact that it's a very old capability, the idea that > > it has implications for controlling access or other parts of the > > space of threat models is pretty silly. > > This is essentially the same workaround as what we recommend for anyone > who's unhappy with the fix for CVE-2020-25696: do \gset into a non-special > variable and then copy to the special variable. The reason it seems okay > is that now it is clear that client-side logic intends the special > variable change to happen. Thus a compromised server cannot hijack your > client-side session all by itself. There's nonzero risk in letting the > server modify your PROMPT1, PATH, or whatever, but you took the risk > intentionally (and, presumably, it's necessary to your purposes). > > I tend to agree with you that the compromised-server argument is a little > bit of a stretch. Still, we did have an actual user complaining about > the case for \gset, and it's clear that in at least some scenarios this > sort of attack could be used to parlay a server compromise into additional > access. So we're not likely to undo the CVE-2020-25696 fix, and we're > equally unlikely to provide an unrestricted way to set environment > variables directly from the server. > > If we could draw a line between "safe" and "unsafe" environment > variables, I'd be willing to consider a patch that allows directly > setting only the former. But I don't see how to draw that line. > Most of the point of any such feature would have to be to affect > the behavior of shell commands subsequently invoked with \! ... > and we can't know what a given variable would do to those. So on > the whole I'm inclined to leave things as-is, where people have to > do the assignment manually. I suppose now's not a great time for this from an optics point of view. Taking on the entire security theater industry is way out of scope for the PostgreSQL project. We have plenty of ways to spawn shells and cause havoc, and we wouldn't be able to block them all even if we decided to put a bunch of pretty onerous restrictions on psql at this very late date. We have \set, backticks, \!, and bunches of things less obvious that could, even without a compromised server, cause real mischief. I believe that a more effective way to deal with this reality in a way that helps users is to put clear warnings in the documentation about the fact that psql programs are at least as capable as shell programs in that they are innately capable of doing anything that the psql's system user is authorized to do. Would a patch along that line help? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: \gsetenv
On Sun, Dec 20, 2020 at 02:26:14PM +0100, Fabien COELHO wrote: > Hello David, > > > We have \gset to set some parameters, but not ones in the environment, > > so I fixed this with a new analogous command, \gsetenv. I considered > > refactoring SetVariable to include environment variables, but for a > > first cut, I just made a separate function and an extra if. > > My 0.02€: ISTM that you do not really need that, it can already be achieved > with gset, so I would not bother to add a gsetenv. > > sh> psql > SELECT 'Calvin' AS foo \gset > \setenv FOO :foo > \! echo $FOO > Calvin Thanks! You're the second person who's mentioned this workaround, which goes to a couple of points I tried to make earlier: - This is not by any means a new capability, just a convenience, and - In view of the fact that it's a very old capability, the idea that it has implications for controlling access or other parts of the space of threat models is pretty silly. Having dispensed with the idea that there's a new attack surface here, I'd like to request that people at least have a look at it as a feature psql users might appreciate having. As the author, I obviously see it that way, but again as the author, it's not for me to make that call. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: \gsetenv
On Wed, Dec 16, 2020 at 05:30:13PM -0500, Tom Lane wrote: > David Fetter writes: > > We have \gset to set some parameters, but not ones in the environment, > > so I fixed this with a new analogous command, \gsetenv. > > In view of the security complaints we just had about \gset > (CVE-2020-25696), I cannot fathom why we'd consider adding another > way to cause similar problems. The RedHat site says, in part: the attacker can execute arbitrary code as the operating system account running psql This is true of literally everything that requires a login shell in order to use. I remember a "virus" my friend Keith McMillan wrote in TeX back in the 1994. You can download the PostScript file detailing the procedure, bearing in mind that PostScript also contains ways to execute arbitrary code if opened: ftp://ftp.cerias.purdue.edu/pub/doc/viruses/KeithMcMillan-PlatformIndependantVirus.ps That one got itself a remote code execution by fiddling with a person's .emacs, and it got Keith a master's degree in CS. I suspect that equally nasty things are possible when it comes to \i and \o in psql. It would be a terrible idea to hobble psql in the attempt to prevent such attacks. > We were fortunate enough to be able to close off the main security > risk of \gset without deleting the feature altogether ... but how > exactly would we distinguish "safe" from "unsafe" environment > variables? It kind of seems like anything that would be worth > setting at all would tend to fall into the "unsafe" category, > because the implications of setting it would be unclear. But *for > certain* we're not taking a patch that allows remotely setting PATH > and things like that. Would you be so kind as to explain what the actual problem is here that not doing this would mitigate? If people run code they haven't seen from a server they don't trust, neither psql nor anything else[1] can protect them from the consequences. Seeing what they're about to run is dead easy in this case because \gsetenv, like \gset and what in my view is the much more dangerous \gexec, is something anyone with the tiniest modicum of caution would run only after testing it with \g. > Besides which, you haven't bothered with even one word of positive > justification. What's the non-hazardous use case? Thanks for asking, and my apologies for not including it. I ran into a situation where we sometimes got a very heavily loaded and also well-protected PostgreSQL server. At times, just getting a shell on it could take a few tries. To mitigate situations like that, I used a method that's a long way from new, abstruse, or secret: have psql open in a long-lasting tmux or screen session. It could both access the database at a time when getting a new connection would be somewhere between difficult and impossible. The bit that's unlikely to be new was when I noticed that it could also shell out and send information off to other places, but only when I put together a pretty baroque procedure that involved using combinations of \gset, \o, and \!. All of the same things \gsetenv could do were doable with those, just less convenient, so I drafted up a patch in the hope that fewer others would find themselves jumping through the hoops I did to get that set up. Separately, I confess to some bafflement at the reasoning behind the CVE you referenced. By the time an attacker has compromised a database server, it's already game over. Code running on the compromised database is capable of doing much nastier things than crashing a client machine, and very likely has access to other high-value targets on its own say-so than said client does. Best, David. [1] search for "gods themselves" here: https://en.wikiquote.org/wiki/Friedrich_Schiller -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
\gsetenv
Hi, We have \gset to set some parameters, but not ones in the environment, so I fixed this with a new analogous command, \gsetenv. I considered refactoring SetVariable to include environment variables, but for a first cut, I just made a separate function and an extra if. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 04059e68ffcd8cf4052ccb6a013f0cf2e0095eb8 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 16 Dec 2020 13:17:32 -0800 Subject: [PATCH v1] Implement \gsetenv, similar to \gset To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.28.0" This is a multi-part message in MIME format. --2.28.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit In passing, make \setenv without arguments dump the environment in a way similar to what bare \set and \pset do. diff --git doc/src/sgml/ref/psql-ref.sgml doc/src/sgml/ref/psql-ref.sgml index 221a967bfe..97d5a2ed19 100644 --- doc/src/sgml/ref/psql-ref.sgml +++ doc/src/sgml/ref/psql-ref.sgml @@ -2297,6 +2297,49 @@ hello 10 + +\gsetenv [ prefix ] + + + + Sends the current query buffer to the server and stores the + query's output into environment variables. + The query to be executed must return exactly one row. Each column of + the row is stored into a separate variable, named the same as the + column. For example: + += SELECT 'hello' AS var1, 10 AS var2 +- \gsetenv += \! echo $var1 $var2 +hello 10 + + + + If you specify a prefix, + that string is prepended to the query's column names to create the + variable names to use: + += SELECT 'hello' AS var1, 10 AS var2 +- \gsetenv result_ += \! echo $result_var1 $result_var2 +hello 10 + + + + If a column result is NULL, the corresponding environment variable is unset + rather than being set. + + + If the query fails or does not return one row, + no variables are changed. + + + If the current query buffer is empty, the most recently sent query + is re-executed instead. + + + + \gx [ (option=value [...]) ] [ filename ] diff --git src/bin/psql/command.c src/bin/psql/command.c index 38b52d..8b8749aca6 100644 --- src/bin/psql/command.c +++ src/bin/psql/command.c @@ -94,7 +94,9 @@ static backslashResult process_command_g_options(char *first_option, const char *cmd); static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch); -static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_gset(PsqlScanState scan_state, + bool active_branch, + GSET_TARGET gset_target); static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_html(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_include(PsqlScanState scan_state, bool active_branch, @@ -346,7 +348,9 @@ exec_command(const char *cmd, else if (strcmp(cmd, "gexec") == 0) status = exec_command_gexec(scan_state, active_branch); else if (strcmp(cmd, "gset") == 0) - status = exec_command_gset(scan_state, active_branch); + status = exec_command_gset(scan_state, active_branch, GSET_TARGET_PSET); + else if (strcmp(cmd, "gsetenv") == 0) + status = exec_command_gset(scan_state, active_branch, GSET_TARGET_ENV); else if (strcmp(cmd, "h") == 0 || strcmp(cmd, "help") == 0) status = exec_command_help(scan_state, active_branch); else if (strcmp(cmd, "H") == 0 || strcmp(cmd, "html") == 0) @@ -1451,10 +1455,11 @@ exec_command_gexec(PsqlScanState scan_state, bool active_branch) } /* - * \gset [prefix] -- send query and store result into variables + * \gset[env] [prefix] -- send query and store result into variables */ static backslashResult -exec_command_gset(PsqlScanState scan_state, bool active_branch) +exec_command_gset(PsqlScanState scan_state, bool active_branch, + GSET_TARGET gset_target) { backslashResult status = PSQL_CMD_SKIP_LINE; @@ -1463,6 +1468,8 @@ exec_command_gset(PsqlScanState scan_state, bool active_branch) char *prefix = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, false); + pset.gset_target = gset_target; + if (prefix) pset.gset_prefix = prefix; else @@ -2275,39 +2282,8 @@ exec_command_setenv(PsqlScanState scan_state, bool active_branch, OT_NORMAL, NULL, false); char *envval = psql_scan_slash_option(scan_state,
Re: [HACKERS] [PATCH] Generic type subscripting
On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote: > I've pushed the core patch now. The jsonb parts now have to be > rebased onto this design, which I'm assuming Dmitry will tackle > (I do not intend to). It's not quite clear to me whether we have > a meeting of the minds on what the jsonb functionality should be, > anyway. Alexander seemed to be thinking about offering an option > to let the subscript be a jsonpath, but how would we distinguish > that from a plain-text field name? > > BTW, while reviewing the thread to write the commit message, > I was reminded of my concerns around the "is it a container" > business. As things stand, if type A has a typelem link to > type B, then the system supposes that A contains B physically; > this has implications for what's allowed in DDL, for example > (cf find_composite_type_dependencies() and other places). > We now have a feature whereby subscripting can yield a type > that is not contained in the source type in that sense. > I'd be happier if the "container" terminology were reserved for > that sort of physical containment, which means that I think a lot > of the commentary around SubscriptingRef is misleading. But I do > not have a better word to suggest offhand. Thoughts? Would this be something more along the lines of a "dependent type," or is that adding too much baggage? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: SELECT INTO deprecation
On Wed, Dec 02, 2020 at 12:58:36PM -0500, Stephen Frost wrote: > Greetings, > > * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: > > While reading about deprecating and removing various things in > > other threads, I was wondering about how deprecated SELECT INTO > > is. There are various source code comments about this, but the > > SELECT INTO reference page only contains soft language like > > "recommended". I'm proposing the attached patch to stick a more > > explicit deprecation notice right at the top. > > I don't see much value in this. Users already have 5 years to adapt > their code to new major versions of PG and that strikes me as plenty > enough time and is why we support multiple major versions of PG for > so long. Users who keep pace and update for each major version > aren't likely to have issue making this change since they're already > used to regularly updating their code for new major versions, while > others are going to complain no matter when we remove it and will > ignore any deprecation notices we put out there, so there isn't much > point in them. +1 for removing it entirely and including this prominently in the release notes. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: POC: postgres_fdw insert batching
On Wed, Nov 25, 2020 at 05:04:36AM +, tsunakawa.ta...@fujitsu.com wrote: > From: Tomas Vondra > > On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote: > > > OTOH, as for the name GetModifyBatchSize() you suggest, I think > > GetInsertBatchSize may be better. That is, this API deals with multiple > > records in a single INSERT statement. Your GetModifyBatchSize will be > > reserved for statement batching when libpq has supported batch/pipelining to > > execute multiple INSERT/UPDATE/DELETE statements, as in the following > > JDBC batch updates. What do you think? > > > > > > > I don't know. I was really only thinking about batching in the context > > of a single DML command, not about batching of multiple commands at the > > protocol level. IMHO it's far more likely we'll add support for batching > > for DELETE/UPDATE than libpq pipelining, which seems rather different > > from how the FDW API works. Which is why I was suggesting to use a name > > that would work for all DML commands, not just for inserts. > > Right, I can't imagine now how the interaction among the client, server core > and FDWs would be regarding the statement batching. So I'll take your > suggested name. > > > > Not sure, but I'd guess knowing whether batching is used would be > > useful. We only print the single-row SQL query, which kinda gives the > > impression that there's no batching. > > Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run. > > > > > Don't worry about this, too. GetMaxBulkInsertTuples() just returns a > > > value > > that was already saved in a struct in create_foreign_modify(). > > > > > > > Well, I do worry for two reasons. > > > > Firstly, the fact that in postgres_fdw the call is cheap does not mean > > it'll be like that in every other FDW. Presumably, the other FDWs might > > cache it in the struct and do the same thing, of course. > > > > But the fact that we're calling it over and over for each row kinda > > seems like we allow the value to change during execution, but I very > > much doubt the code is expecting that. I haven't tried, but assume the > > function first returns 10 and then 100. ISTM the code will allocate > > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there. > > That can't end well. Sure, we can claim it's a bug in the FDW extension, > > but it's also due to the API design. > > You worried about other FDWs than postgres_fdw. That's reasonable. I > insisted in other threads that PG developers care only about postgres_fdw, > not other FDWs, when designing the FDW interface, but I myself made the same > mistake. I made changes so that the executor calls GetModifyBatchSize() once > per relation per statement. Please pardon me for barging in late in this discussion, but if we're going to be using a bulk API here, wouldn't it make more sense to use COPY, except where RETURNING is specified, in place of INSERT? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Clarifying the ImportForeignSchema API
Folks, I noticed that the API document for IMPORT FOREIGN SCHEMA states in part: It should return a list of C strings, each of which must contain a CREATE FOREIGN TABLE command. These strings will be parsed and executed by the core server. A reasonable reading of the above is that it disallows statements other than CREATE FOREIGN TABLE, which seems overly restrictive for no reason I can discern. The list of C strings seems reasonable as a requirement, but I think it would be better to rephrase this along the lines of: It should return a list of C strings, each of which must contain a DDL command, for example CREATE FOREIGN TABLE. These strings will be parsed and executed by the core server in order to create the objects in the schema. as a foreign schema might need types (the case I ran across) or other database objects like CREATE EXTERNAL ROUTINE, when we dust off the implementation of that, to support it. I was unable to discern from my draft version of the spec whether statements other than CREATE FOREIGN TABLE are specifically disallowed, or whether it is intended to (be able to) contain CREATE ROUTINE MAPPING statements. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Bump default wal_level to logical
On Mon, Jun 08, 2020 at 11:10:38AM +0200, Magnus Hagander wrote: > On Mon, Jun 8, 2020 at 8:46 AM Michael Paquier wrote: > > > On Mon, Jun 08, 2020 at 11:59:14AM +0530, Amit Kapila wrote: > > > I think we should first do performance testing to see what is the > > > overhead of making this default. I think pgbench read-write at > > > various scale factors would be a good starting point. Also, we should > > > see how much additional WAL it generates as compared to current > > > default. > > > > +1. I recall that changing wal_level to logical has been discussed in > > the past and performance was the actual take to debate on. > > > > That was at least the discussion (long-going and multi-repeated) before we > upped it from minimal to replica. There were some pretty extensive > benchmarking done to prove that the difference was very small, and this was > weighed against the ability to take basic backups of the system (which > arguably is more important than being able to do logical replication). I'd argue this a different direction. Logical replication has been at fundamental to how a lot of systems operate since Slony came out for the very good reason that it was far and away the simplest way to accomplish a bunch of design goals. There are now, and have been for some years, both free and proprietary systems whose sole purpose is change data capture. PostgreSQL can play nicely with those systems with this switch flipped on by default. Looking into the future of PostgreSQL itself, there are things we've been unable to do thus far that logical replication makes tractable. These include: - Zero downtime version changes - Substantive changes to our on-disk representations between versions because upgrading in place places sharp limits on what we could do. > I agree that we should consider changing it *if* it does not come > with a substantial overhead, but that has to be shown. What overhead would be substantial enough to require more work than changing the default, and under what circumstances? I ask this because on a heavily loaded system, the kind where differences could be practical as opposed to merely statistically significant, statement logging at even the most basic level is a much bigger burden than the maxed-out WALs are. Any overhead those WALs might impose simply disappears in the noise. The difference is even more stark in systems subject to audit. > Of course, what would be even neater would be if it could be changed > so you don't have to bounce the server to change the wal_level. > That's a bigger change though, but perhaps it is now possible once > we have the "global barriers" in 13? As much as I would love to have this capability, I was hoping to keep the scope of this contained. As pointed out down-thread, there's lots more to doing this dynamically that just turning up the wal_level. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Bump default wal_level to logical
Hi, I'd like to propose $subject, as embodied in the attached patch. This makes it possible to discover and fulfill a need for logical replication that can arise at a time when bouncing the server has become impractical, i.e. when there is already high demand on it. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 8179603827107d4b4ec5fe5e893b274e839a0ac1 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Sun, 7 Jun 2020 21:31:50 -0700 Subject: [PATCH v1] Bump default wal_level to logical To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.26.2" This is a multi-part message in MIME format. --2.26.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml index aca8f73a50..08bb1d7fd7 100644 --- doc/src/sgml/config.sgml +++ doc/src/sgml/config.sgml @@ -2486,14 +2486,13 @@ include_dir 'conf.d' wal_level determines how much information is written to -the WAL. The default value is replica, which writes enough -data to support WAL archiving and replication, including running -read-only queries on a standby server. minimal removes all -logging except the information required to recover from a crash or -immediate shutdown. Finally, -logical adds information necessary to support logical -decoding. Each level includes the information logged at all lower -levels. This parameter can only be set at server start. +the WAL. The default value, logical, includes all +information necessary to support logical decoding. replica +writes enough data to support WAL archiving and replication, including running +read-only queries on a standby server. Finally, minimal +removes all logging except the information required to recover from a crash or +immediate shutdown. Each level includes the information logged at all lower levels. +This parameter can only be set at server start. In minimal level, no information is logged for diff --git src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c index 2f3e0a70e0..8d27a9c2a4 100644 --- src/backend/utils/misc/guc.c +++ src/backend/utils/misc/guc.c @@ -4651,7 +4651,7 @@ static struct config_enum ConfigureNamesEnum[] = NULL }, _level, - WAL_LEVEL_REPLICA, wal_level_options, + WAL_LEVEL_LOGICAL, wal_level_options, NULL, NULL, NULL }, diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample index ac02bd0c00..d637133db6 100644 --- src/backend/utils/misc/postgresql.conf.sample +++ src/backend/utils/misc/postgresql.conf.sample @@ -192,7 +192,7 @@ # - Settings - -#wal_level = replica # minimal, replica, or logical +#wal_level = logical # minimal, replica, or logical # (change requires restart) #fsync = on# flush data to disk for crash safety # (turning this off can cause --2.26.2--
Re: Default gucs for EXPLAIN
On Tue, Jun 02, 2020 at 09:28:48PM +0200, Vik Fearing wrote: > On 6/2/20 7:54 PM, David G. Johnston wrote: > > At this point, given the original goal of the patch was to try and > > grease a smoother path to changing the default for BUFFERS, and > > that people seem OK with doing just that without having this > > patch, I'd say we should just change the default and forget this > > patch. There hasn't been any other demand from our users for this > > capability and I also doubt that having BUFFERS on by default is > > going to bother people. > > What about WAL? Can we turn that one one by default, too? > > I often find having VERBOSE on helps when people don't qualify their > columns and I don't know the schema. We should turn that on by > default, as well. +1 for all on (except ANALYZE because it would be a foot cannon) by default. For those few to whom it really matters, there'd be OFF switches. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Default gucs for EXPLAIN
On Tue, May 26, 2020 at 02:49:46AM +, Nikolay Samokhvalov wrote: > On Mon, May 25, 2020 at 6:36 PM, Bruce Momjian < br...@momjian.us > wrote: > > I am not excited about this new feature. Why do it only for > > EXPLAIN? That is a log of GUCs. I can see this becoming a feature > > creep disaster. > > How about changing the default behavior, making BUFFERS enabled by > default? Those who don't need it, always can say BUFFERS OFF — the > say as for TIMING. +1 for changing the default of BUFFERS to ON. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: proposal: possibility to read dumped table's name from file
On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote: > Hi > > one my customer has to specify dumped tables name by name. After years and > increasing database size and table numbers he has problem with too short > command line. He need to read the list of tables from file (or from stdin). > > I wrote simple PoC patch > > Comments, notes, ideas? This seems like a handy addition. What I've done in cases similar to this was to use `grep -f` on the output of `pg_dump -l` to create a file suitable for `pg_dump -L`, or mash them together like this: pg_restore -L <(pg_dump -l /path/to/dumpfile | grep -f /path/to/listfile) -d new_db /path/to/dumpfile That's a lot of shell magic and obscure corners of commands to expect people to use. Would it make sense to expand this patch to handle other objects? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch
On Mon, May 25, 2020 at 09:43:32AM -0400, Tom Lane wrote: > Peter Eisentraut writes: > > One problem (other than perhaps performance, tbd.) is that this would no > > longer allow processing infinite timestamps, since numeric does not > > support infinity. It could be argued that running extract() on infinite > > timestamps isn't very useful, but it's something to consider explicitly. > > I wonder if it's time to fix that, ie introduce +-Infinity into numeric.c. > This isn't the first time we've seen issues with numeric not being a > superset of float, and it won't be the last. > > At first glance there's no free bits in the on-disk format for numeric, > but we could do something by defining the low-order bits of the header > word for a NaN to distinguish between real NaN and +/- infinity. > It looks like those bits should reliably be zero right now. +1 for adding +/- infinity to NUMERIC. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: factorial function/phase out postfix operators?
On Mon, May 18, 2020 at 10:03:13PM -0400, Tom Lane wrote: > Peter Eisentraut writes: > > What are the thoughts about then marking the postfix operator deprecated > > and eventually removing it? > > If we do this it'd require a plan. We'd have to also warn about the > feature deprecation in (at least) the CREATE OPERATOR man page, and > we'd have to decide how many release cycles the deprecation notices > need to stand for. > > If that's the intention, though, it'd be good to get those deprecation > notices published in v13 not v14. +1 for deprecating in v13. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Proposing WITH ITERATIVE
On Mon, Apr 27, 2020 at 10:44:04PM -0400, Jonah H. Harris wrote: > On Mon, Apr 27, 2020 at 10:32 PM David Fetter wrote: > > On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote: > > > Hey, everyone. > > > > > If there's any interest, I'll clean-up their patch and submit. Thoughts? > > > > Where's the current patch? > > > It’s private. Hence, why I’m inquiring as to interest. Have the authors agreed to make it available to the project under a compatible license? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Proposing WITH ITERATIVE
On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote: > Hey, everyone. > If there's any interest, I'll clean-up their patch and submit. Thoughts? Where's the current patch? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Commitfest 2020-03 Now in Progress
On Wed, Apr 08, 2020 at 12:36:37PM -0400, David Steele wrote: > On 4/1/20 10:09 AM, David Steele wrote: > > On 3/17/20 8:10 AM, David Steele wrote: > > > On 3/1/20 4:10 PM, David Steele wrote: > > > > The last Commitfest for v13 is now in progress! > > > > > > > > Current stats for the Commitfest are: > > > > > > > > Needs review: 192 > > > > Waiting on Author: 19 > > > > Ready for Committer: 4 > > > > Total: 215 > > > > > > Halfway through, here's where we stand. Note that a CF entry was > > > added after the stats above were recorded so the total has > > > increased. > > > > > > Needs review: 151 (-41) > > > Waiting on Author: 20 (+1) > > > Ready for Committer: 9 (+5) > > > Committed: 25 > > > Moved to next CF: 1 > > > Withdrawn: 4 > > > Returned with Feedback: 6 > > > Total: 216 > > > > After regulation time here's where we stand: > > > > Needs review: 111 (-40) > > Waiting on Author: 26 (+6) > > Ready for Committer: 11 (+2) > > Committed: 51 (+11) > > Moved to next CF: 2 (+1) > > Returned with Feedback: 10 (+4) > > Rejected: 1 > > Withdrawn: 5 (+1) > > Total: 217 > > > > We have one more patch so it appears that one in a completed state > > (committed, etc.) at the beginning of the CF has been moved to an > > uncompleted state. Or perhaps my math is just bad. > > > > The RMT has determined that the CF will be extended for one week so I'll > > hold off on moving and marking patches until April 8. > > The 2020-03 Commitfest is officially closed. > > Final stats are (for entire CF, not just from March 1 this time): > > Committed: 90. > Moved to next CF: 115. > Withdrawn: 8. > Rejected: 1. > Returned with Feedback: 23. > Total: 237. > > Good job everyone! Thanks so much for your hard work managing this one! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Let people set host(no)ssl settings from initdb
On Mon, Apr 06, 2020 at 10:12:16PM +, Cary Huang wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation:tested, passed > > Hi > I applied the patch > "v3-0001-Enable-setting-pg_hba.conf-permissions-from-initd.patch" and did > some verification with it. The intended feature works overall and I think it > is quite useful to support default auth methods for ssl and gss host types. I > have also found some minor things in the patch and would like to share as > below: > > > +"# CAUTION: Configuring the system for \"trust\" authentication\n" \ > > +"# allows any user who can reach the databse on the route specified\n" \ > > +"# to connect as any PostgreSQL user, including the database\n" \ > > +"# superuser. If you do not trust all the users who could\n" \ > > +"# reach the database on the route specified, use a more restrictive\n" \ > > +"# authentication method.\n" > > Found a typo: should be 'database' instead of 'databse' Fixed. > > * a sort of poor man's grep -v > > */ > > -#ifndef HAVE_UNIX_SOCKETS > > static char ** > > filter_lines_with_token(char **lines, const char *token) > > { > > @@ -461,7 +466,6 @@ filter_lines_with_token(char **lines, const char *token) > > > > return result; > > } > > -#endif > > I see that you have removed "#ifndef HAVE_UNIX_SOCKETS" around the > filter_lines_with_token() function definition so it would be always > available, which is used to remove the @tokens@ in case user does > not specify a default auth method for the new hostssl, hostgss > options. I think you should also remove the "#ifndef > HAVE_UNIX_SOCKETS" around its declaration as well so both function > definition and declaration would make sense. Fixed. Thanks very much for the review! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 3b49afec7f0b708285eadc10a097b4dc23d23927 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 11 Dec 2019 18:55:03 -0800 Subject: [PATCH v4] Enable setting pg_hba.conf permissions from initdb To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.25.2" This is a multi-part message in MIME format. --2.25.2 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit The new optional arguments are --auth-hostssl, --auth-hostnossl, --auth-hostgssenc, and --auth-hostnogssenc. diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml index a04a180165..ee65cf925d 100644 --- doc/src/sgml/ref/initdb.sgml +++ doc/src/sgml/ref/initdb.sgml @@ -154,6 +154,50 @@ PostgreSQL documentation + + --auth-hostssl=authmethod + + +This option specifies the authentication method for users via +TLS remote connections used in pg_hba.conf +(hostssl lines). + + + + + + --auth-hostnossl=authmethod + + +This option specifies the authentication method for users via +non-TLS remote connections used in pg_hba.conf +(hostnossl lines). + + + + + + --auth-hostgssenc=authmethod + + +This option specifies the authentication method for users via +encrypted GSSAPI remote connections used in pg_hba.conf +(hostgssenc lines). + + + + + + --auth-hostnogssenc=authmethod + + +This option specifies the authentication method for users via +remote connections not using encrypted GSSAPI in pg_hba.conf +(hostnogssenc lines). + + + + --auth-local=authmethod diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample index c853e36232..7d5189dd8f 100644 --- src/backend/libpq/pg_hba.conf.sample +++ src/backend/libpq/pg_hba.conf.sample @@ -87,3 +87,15 @@ hostall all ::1/128 @authmethodhost@ @remove-line-for-nolocal@local replication all @authmethodlocal@ hostreplication all 127.0.0.1/32@authmethodhost@ hostreplication all ::1/128 @authmethodhost@ + +# hostnossl all all 0.0.0.0/0 @authmethodhostnossl@ +# hostnossl all all ::/0@authmethodhostnossl@ + +# hostssl all all
Re: range_agg
On Mon, Mar 09, 2020 at 06:34:04PM -0700, Jeff Davis wrote: > On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote: > > Actually ... have you given any thought to just deciding that ranges > > and > > multiranges are the same type? > > It has come up in a number of conversations, but I'm not sure if it was > discussed on this list. > > > I think on the whole the advantages win, > > and I feel like that might also be the case here. > > Some things to think about: > > 1. Ranges are common -- at least implicitly -- in a lot of > applications/systems. It's pretty easy to represent extrernal data as > ranges in postgres, and also to represent postgres ranges in external > systems. But I can see multiranges causing friction around a lot of > common tasks, like displaying in a UI. If you only expect ranges, you > can add a CHECK constraint, so this is annoying but not necessarily a > deal-breaker. It could become well and truly burdensome in a UI or an API. The difference between one, as ranges are now, and many, as multi-ranges would be if we shoehorn them into the range type, are pretty annoying to deal with. > 2. There are existing client libraries[1] that support range types and > transform them to types within the host language. Obviously, those > would need to be updated to expect multiple ranges. The type systems that would support such types might get unhappy with us if we started messing with some of the properties like contiguousness. > 3. It seems like we would want some kind of base "range" type. When you > try to break a multirange down into constituent ranges, what type would > those pieces be? (Aside: how do you get the constituent ranges?) > > I'm thinking more about casting to see if there's a possible compromise > there. I think the right compromise is to recognize that the closure of a set (ranges) over an operation (set union) may well be a different set (multi-ranges). Other operations have already been proposed, complete with concrete use cases that could really make PostgreSQL stand out. That we don't have an obvious choice of "most correct" operation over which to close ranges makes it even bigger a potential foot-gun when we choose one arbitrarily and declare it to be the canonical one. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use compiler intrinsics for bit ops in hash
On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote: > Hi David, > > On Wed, Feb 26, 2020 at 9:56 PM David Fetter wrote: > > > > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote: > > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > > > > > > > Let's call it nascent code. I suspect there are places it could go, > > > > > > if > > > > > > I look for them. Also, it seemed silly to have one without the > > > > > > other. > > > > > > > > > > > > > > > > While not absolutely required, I'd like us to find at least one > > > > > place and start using it. (Clang also nags at me when we have > > > > > unused functions). > > > > > > > > Done in the expanded patches attached. > I see that you've found use of it in dynahash, thanks! > > The math in the new (from v4 to v6) patch is wrong: it yields > ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted > the restriction of "num greater than one" for ceil_log2() in this patch > set, but it's now _more_ problematic to base those functions on > pg_leftmost_one_pos(). > > I'm not comfortable with your changes to pg_leftmost_one_pos() to remove > the restriction on word being non-zero. Specifically > pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its > current callers (in HEAD) is harmed, this introduces muddy semantics: > > 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning > for a set bit in a zero word won't find it anywhere. > > 2. we can _try_ generalizing it to accommodate ceil_log2 by > extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In > that case, the extrapolation yields -1 for pg_leftmost_one_pos(0). > > I'm not convinced that others on the list will be comfortable with the > generalization suggested in 2 above. > > I've quickly put together a PoC patch on top of yours, which > re-implements ceil_log2 using LZCNT coupled with a CPUID check. > Thoughts? Per discussion on IRC with Andrew (RhodiumToad) Gierth: The runtime detection means there's always an indirect call overhead and no way to inline. This is counter to what using compiler intrinsics is supposed to do. It's better to rely on the compiler, because: (a) The compiler often knows whether the value can or can't be 0 and can therefore skip a conditional jump. (b) If you're targeting a recent microarchitecture, the compiler can just use the right instruction. (c) Even if the conditional branch is left in, it's not a big overhead. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
On Sat, Mar 07, 2020 at 06:45:44PM -0500, Tom Lane wrote: > David Fetter writes: > > There's another use case not yet covered here that could make this > > even more complex, we should probably plan for it: multi-ranges > > with weights. > > I'm inclined to reject that as completely out of scope. The core > argument for unifying multiranges with ranges, if you ask me, is to > make the data type closed under union. Weights are from some other > universe. I don't think they are. SQL databases are super useful because they do bags in addition to sets, so set union isn't the only, or maybe even the most important, operation over which ranges ought to be closed. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
On Sat, Mar 07, 2020 at 04:06:32PM -0500, Tom Lane wrote: > I wrote: > > However, what I'm on about right at the moment is that I don't think > > there should be any delta in that test at all. As far as I can see, > > the design idea here is that multiranges will be automatically created > > over range types, and the user doesn't need to do that. To my mind, > > that means that they're an implementation detail and should not show up as > > separately-owned objects, any more than an autogenerated array type does. > > Actually ... have you given any thought to just deciding that ranges and > multiranges are the same type? That is, any range can now potentially > contain multiple segments? That would eliminate a whole lot of the > tedious infrastructure hacking involved in this patch, and let you focus > on the actually-useful functionality. If we're changing range types rather than constructing a new multi-range layer atop them, I think it would be helpful to have some way to figure out quickly whether this new range type was contiguous. One way to do that would be to include a "range cardinality" in the data structure which be the number of left ends in it. One of the things I'd pictured doing with multiranges was along the lines of a "full coverage" constraint like "During a shift, there can be no interval that's not covered," which would correspond to a "range cardinality" of 1. I confess I'm getting a little twitchy about the idea of eliding the cases of "one" and "many", though. > Assuming that that's ok, it seems like we could consider the traditional > range functions like lower() and upper() to report on the first or last > range bound in a multirange --- essentially, they ignore any "holes" > that exist inside the range. And the new functions for multiranges > act much like array slicing, in that they give you back pieces of a range > that aren't actually of a distinct type. So new functions along the lines of lowers(), uppers(), opennesses(), etc.? I guess this could be extended as needs emerge. There's another use case not yet covered here that could make this even more complex, we should probably plan for it: multi-ranges with weights. For example, SELECT weighted_range_union(r) FROM (VALUES('[0,1)'::float8range), ('[0,3)'), '('[2,5)')) AS t(r) would yield something along the lines of: (([0,1),1), ([1,3),2), ([3,5),1)) and wedging that into the range type seems messy. Each range would then have a cardinality, and each range within would have a weight, all of which would be an increasingly heavy burden on the common case where there's just a single range. Enhancing a separate multirange type to have weights seems like a cleaner path forward. Given that, I'm -1 on mushing multi-ranges into a special case of ranges, or /vice versa/. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: proposal \gcsv
On Sat, Feb 29, 2020 at 11:59:22AM +0100, Pavel Stehule wrote: > so 29. 2. 2020 v 11:34 odesílatel Vik Fearing > napsal: > > > On 29/02/2020 06:43, Pavel Stehule wrote: > > > Hi > > > > > > I would to enhance \g command about variant \gcsv > > > > > > proposed command has same behave like \g, only the result will be every > > > time in csv format. > > > > > > It can helps with writing psql macros wrapping \g command. > > > > > > Options, notes? > > > > But then we would need \ghtml and \glatex etc. If we want a shortcut > > for setting a one-off format, I would go for \gf or something. > > > > \gf csv > > \gf html > > \gf latex > > > > usability of html or latex format in psql is significantly lower than csv > format. There is only one generic format for data - csv. Not exactly. There's a lot of uses for things along the lines of \gf json \gf yaml I'd rather add a new \gf that takes arguments, as it seems more extensible. For example, there are uses for \gf csv header if no header is the default, or \gf csv noheader if header is the default. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use compiler intrinsics for bit ops in hash
On Thu, Feb 27, 2020 at 02:41:49PM +0800, John Naylor wrote: > On Thu, Feb 27, 2020 at 1:56 PM David Fetter wrote: > > [v6 set] > > Hi David, > > In 0002, the pg_bitutils functions have a test (input > 0), and the > new callers ceil_log2_* and next_power_of_2_* have asserts. That seems > backward to me. To me, too, now that you mention it. My thinking was a little fuzzed by trying to accommodate platforms with intrinsics where clz is defined for 0 inputs. > I imagine some callers of bitutils will already know the value > 0, > and it's probably good to keep that branch out of the lowest level > functions. What do you think? I don't know quite how smart compilers and CPUs are these days, so it's unclear to me how often that branch would actually happen. Anyhow, I'll get a revised patch set out later today. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use compiler intrinsics for bit ops in hash
On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote: > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > > > Let's call it nascent code. I suspect there are places it could go, if > > > > I look for them. Also, it seemed silly to have one without the other. > > > > > > > > > > While not absolutely required, I'd like us to find at least one > > > place and start using it. (Clang also nags at me when we have > > > unused functions). > > > > Done in the expanded patches attached. > > These bit-rotted a little, so I've updated them. 05d8449e73694585b59f8b03aaa087f04cc4679a broke this patch set, so fix. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 2f8778f6133be23f6b7c375a39e9940ecc9fb063 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 29 Jan 2020 02:09:59 -0800 Subject: [PATCH v6 1/3] de-long-ify To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.24.1" This is a multi-part message in MIME format. --2.24.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c index 4b562d8d3f..482a569814 100644 --- a/src/backend/access/gist/gistbuildbuffers.c +++ b/src/backend/access/gist/gistbuildbuffers.c @@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer, IndexTuple item); static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer, IndexTuple *item); -static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); -static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum); +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); +static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum); -static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr); -static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr); +static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr); +static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr); /* @@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel) /* Initialize free page management. */ gfbb->nFreeBlocks = 0; gfbb->freeBlocksLen = 32; - gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long)); + gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64)); /* * Current memory context will be used for all in-memory data structures @@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer, /* * Select a currently unused block for writing to. */ -static long +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) { /* @@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) * Return a block# to the freelist. */ static void -gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum) +gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum) { int ndx; @@ -495,9 +495,9 @@ gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum) if (gfbb->nFreeBlocks >= gfbb->freeBlocksLen) { gfbb->freeBlocksLen *= 2; - gfbb->freeBlocks = (long *) repalloc(gfbb->freeBlocks, + gfbb->freeBlocks = (int64 *) repalloc(gfbb->freeBlocks, gfbb->freeBlocksLen * - sizeof(long)); + sizeof(uint64)); } /* Add blocknum to array */ @@ -755,7 +755,7 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate, */ static void -ReadTempFileBlock(BufFile *file, long blknum, void *ptr) +ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr) { if (BufFileSeekBlock(file, blknum) != 0) elog(ERROR, "could not seek temporary file: %m"); @@ -764,7 +764,7 @@ ReadTempFileBlock(BufFile *file, long blknum, void *ptr) } static void -WriteTempFileBlock(BufFile *file, long blknum, void *ptr) +WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr) { if (BufFileSeekBlock(file, blknum) != 0) elog(ERROR, "could not seek temporary file: %m"); diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index c46764bf42..4fc478640a 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -70,7 +70,7 @@ stat
Re: truncating timestamps on arbitrary intervals
On Wed, Feb 26, 2020 at 06:38:57PM +0800, John Naylor wrote: > On Wed, Feb 26, 2020 at 3:51 PM David Fetter wrote: > > > > I believe the following should error out, but doesn't. > > > > # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 > > 20:38:40'); > > date_trunc_interval > > ═ > > 2001-01-01 00:00:00 > > (1 row) > > You're quite right. I forgot to add error checking for > second-and-below units. I've added your example to the tests. (I > neglected to mention in my first email that because I chose to convert > the interval to the pg_tm struct (seemed easiest), it's not > straightforward how to allow multiple unit types, and I imagine the > use case is small, so I had it throw an error.) I suspect that this could be sanely expanded to span some sets of adjacent types in a future patch, e.g. year + month or hour + minute. > > Please find attached an update that I believe fixes the bug I found in > > a principled way. > > Thanks for that! I made a couple adjustments and incorporated your fix > into v3: While working on v1, I noticed the DTK_FOO macros already had > an idiom for bitmasking (see utils/datetime.h), Oops! Sorry I missed that. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use compiler intrinsics for bit ops in hash
On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > Let's call it nascent code. I suspect there are places it could go, if > > > I look for them. Also, it seemed silly to have one without the other. > > > > > > > While not absolutely required, I'd like us to find at least one > > place and start using it. (Clang also nags at me when we have > > unused functions). > > Done in the expanded patches attached. These bit-rotted a little, so I've updated them. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 90d3575b0716839b76953023bb63242b1d0698ef Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 29 Jan 2020 02:09:59 -0800 Subject: [PATCH v5 1/3] de-long-ify To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.24.1" This is a multi-part message in MIME format. --2.24.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c index 4b562d8d3f..482a569814 100644 --- a/src/backend/access/gist/gistbuildbuffers.c +++ b/src/backend/access/gist/gistbuildbuffers.c @@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer, IndexTuple item); static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer, IndexTuple *item); -static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); -static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum); +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); +static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum); -static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr); -static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr); +static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr); +static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr); /* @@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel) /* Initialize free page management. */ gfbb->nFreeBlocks = 0; gfbb->freeBlocksLen = 32; - gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long)); + gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64)); /* * Current memory context will be used for all in-memory data structures @@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer, /* * Select a currently unused block for writing to. */ -static long +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) { /* @@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) * Return a block# to the freelist. */ static void -gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum) +gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum) { int ndx; @@ -495,9 +495,9 @@ gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum) if (gfbb->nFreeBlocks >= gfbb->freeBlocksLen) { gfbb->freeBlocksLen *= 2; - gfbb->freeBlocks = (long *) repalloc(gfbb->freeBlocks, + gfbb->freeBlocks = (int64 *) repalloc(gfbb->freeBlocks, gfbb->freeBlocksLen * - sizeof(long)); + sizeof(uint64)); } /* Add blocknum to array */ @@ -755,7 +755,7 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate, */ static void -ReadTempFileBlock(BufFile *file, long blknum, void *ptr) +ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr) { if (BufFileSeekBlock(file, blknum) != 0) elog(ERROR, "could not seek temporary file: %m"); @@ -764,7 +764,7 @@ ReadTempFileBlock(BufFile *file, long blknum, void *ptr) } static void -WriteTempFileBlock(BufFile *file, long blknum, void *ptr) +WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr) { if (BufFileSeekBlock(file, blknum) != 0) elog(ERROR, "could not seek temporary file: %m"); diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index c46764bf42..4fc478640a 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -70,7 +70,7 @@ static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount); static void _SPI_error_callback(void *arg); static void _SPI_cursor_operation(Portal portal, - FetchDirection direction, long count
Re: truncating timestamps on arbitrary intervals
On Wed, Feb 26, 2020 at 10:50:19AM +0800, John Naylor wrote: > Hi, > > When analyzing time-series data, it's useful to be able to bin > timestamps into equally spaced ranges. date_trunc() is only able to > bin on a specified whole unit. Thanks for adding this very handy feature! > In the attached patch for the March > commitfest, I propose a new function date_trunc_interval(), which can > truncate to arbitrary intervals, e.g.: > > select date_trunc_interval('15 minutes', timestamp '2020-02-16 > 20:48:40'); date_trunc_interval > - > 2020-02-16 20:45:00 > (1 row) I believe the following should error out, but doesn't. # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40'); date_trunc_interval ═ 2001-01-01 00:00:00 (1 row) > With this addition, it might be possible to turn the existing > date_trunc() functions into wrappers. I haven't done that here because > it didn't seem practical at this point. For one, the existing > functions have special treatment for weeks, centuries, and millennia. I agree that turning it into a wrapper would be separate work. > Note: I've only written the implementation for the type timestamp > without timezone. Adding timezone support would be pretty simple, > but I wanted to get feedback on the basic idea first before making > it complete. I've also written tests and very basic documentation. Please find attached an update that I believe fixes the bug I found in a principled way. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 5e36c4c888c65e358d2f87d84b64bc14d52f2b39 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Tue, 25 Feb 2020 23:49:35 -0800 Subject: [PATCH v2] Add date_trunc_interval(interval, timestamp) To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.24.1" This is a multi-part message in MIME format. --2.24.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit per John Naylor diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ceda48e0fc..3863c222a2 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -6949,6 +6949,15 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); 2 days 03:00:00 + +date_trunc_interval(interval, timestamp) +timestamp +Truncate to specified precision; see + +date_trunc_interval('15 minutes', timestamp '2001-02-16 20:38:40') +2001-02-16 20:30:00 + + @@ -7818,7 +7827,7 @@ SELECT date_part('hour', INTERVAL '4 hours 3 minutes'); - date_trunc + date_trunc, date_trunc_interval date_trunc @@ -7902,6 +7911,21 @@ SELECT date_trunc('hour', INTERVAL '3 days 02:47:33'); Result: 3 days 02:00:00 + + +The function date_trunc_interval is +similar to the date_trunc, except that it +truncates to an arbitrary interval. + + + +Example: + +SELECT date_trunc_interval('5 minutes', TIMESTAMP '2001-02-16 20:38:40'); +Result: 2001-02-16 20:35:00 + + + diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 0b6c9d5ea8..ed742592af 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -30,6 +30,7 @@ #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" #include "parser/scansup.h" +#include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/datetime.h" @@ -3804,6 +3805,144 @@ timestamptz_age(PG_FUNCTION_ARGS) *-*/ +/* timestamp_trunc_interval() + * Truncate timestamp to specified interval. + */ +Datum +timestamp_trunc_interval(PG_FUNCTION_ARGS) +{ + Interval *interval = PG_GETARG_INTERVAL_P(0); + Timestamp timestamp = PG_GETARG_TIMESTAMP(1); + Timestamp result; + fsec_t ifsec, +tfsec; + uint32_t unit = 0, +popcount = 0; + enum TimeUnit { + us = 1 << 0, + ms = 1 << 1, + second = 1 << 2, + minute = 1 << 3, + hour = 1 << 4, + day= 1 << 5, + month = 1 << 6, + year = 1 << 7 +}; + + struct pg_tm it; + struct pg_tm tt; + + if (TIMESTAMP_NOT_FINITE(timestamp)) + PG_RETURN_TIMESTAMP(timestamp); + + if (interval2tm(*interval, , ) != 0) + elog(ERROR, "could not convert interval to tm"); + + if (timestamp2tm(timestamp, NULL, , , NULL, NULL) != 0) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("timestamp out of range"))); + + if (it.tm_year != 0) + { + tt.tm_year = it.tm_year * (tt.tm_year
Re: Error on failed COMMIT
On Mon, Feb 24, 2020 at 06:40:16PM +0100, Vik Fearing wrote: > On 24/02/2020 18:37, David Fetter wrote: > > > If we'd done this from a clean sheet of paper, it would have been the > > right decision. We're not there, and haven't been for decades. > > OTOH, it's never too late to do the right thing. Some right things take a lot of prep work in order to actually be right things. This is one of them. Defaulting to SERIALIZABLE isolation is another. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Error on failed COMMIT
On Mon, Feb 24, 2020 at 06:04:28PM +0530, Robert Haas wrote: > On Mon, Feb 24, 2020 at 1:56 PM Shay Rojansky wrote: > > As Dave wrote, the problem here isn't with the driver, but with framework > > or user-code which swallows the initial exception and allows code to > > continue to the commit. Npgsql (and I'm sure the JDBC driver too) does > > surface PostgreSQL errors as exceptions, and internally tracks the > > transaction status provided in the CommandComplete message. That means > > users have the ability - but not the obligation - to know about failed > > transactions, and some frameworks or user coding patterns could lead to a > > commit being done on a failed transaction. > > Agreed. All of that can be fixed in the driver, though. > > > If we think the current *user-visible* behavior is problematic (commit on > > failed transaction completes without throwing), then the only remaining > > question is where this behavior should be fixed - at the server or at the > > driver. As I wrote above, from the user's perspective it makes no > > difference - the change would be identical (and just as breaking) either > > way. So while drivers *could* implement the new behavior, what advantages > > would that have over doing it at the server? Some disadvantages do seem > > clear (repetition of the logic across each driver - leading to > > inconsistency across drivers, changing semantics at the driver by turning a > > non-error into an exception...). > > The advantage is that it doesn't cause a compatibility break. > > > > Well, it seems quite possible that there are drivers and applications > > > that don't have this issue; I've never had a problem with this behavior, > > > and I've been using PostgreSQL for something like two decades [...] > > > > If we are assuming that most user code is already written to avoid > > committing on failed transactions (by tracking transaction state etc.), > > then making this change at the server wouldn't affect those applications; > > the only applications affected would be those that do commit on failed > > transactions today, and it could be argued that those are likely to be > > broken today (since drivers today don't really expose the rollback in an > > accessible/discoverable way). > > libpq exposes it just fine, so I think you're overgeneralizing here. > > As I said upthread, I think one of the things that would be pretty > badly broken by this is psql -f something.sql, where something.sql > contains a series of blocks of the form "begin; something; something; > something; commit;". Right now whichever transactions succeed get > committed. With the proposed change, if one transaction block fails, > it'll merge with all of the following blocks. You may think that > nobody is doing this sort of thing, but I think people are, and that > they will come after us with pitchforks if we break it. I'm doing it, and I don't know about pitchforks, but I do know about suddenly needing to rewrite (and re-test, and re-integrate, and re-test some more) load-bearing code, and I'm not a fan of it. If we'd done this from a clean sheet of paper, it would have been the right decision. We're not there, and haven't been for decades. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Parallel copy
On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote: > On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote: > > On Thu, Feb 20, 2020 at 5:12 AM David Fetter wrote: > > > > > > On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote: > > > > This work is to parallelize the copy command and in particular "Copy > > > > from 'filename' Where ;" command. > > > > > > Apropos of the initial parsing issue generally, there's an interesting > > > approach taken here: https://github.com/robertdavidgraham/wc2 > > > > > > > Thanks for sharing. I might be missing something, but I can't figure > > out how this can help here. Does this in some way help to allow > > multiple workers to read and tokenize the chunks? > > I think the wc2 is showing that maybe instead of parallelizing the > parsing, we might instead try using a different tokenizer/parser and > make the implementation more efficient instead of just throwing more > CPUs on it. That was what I had in mind. > I don't know if our code is similar to what wc does, maytbe parsing > csv is more complicated than what wc does. CSV parsing differs from wc in that there are more states in the state machine, but I don't see anything fundamentally different. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Parallel copy
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote: > This work is to parallelize the copy command and in particular "Copy > from 'filename' Where ;" command. Apropos of the initial parsing issue generally, there's an interesting approach taken here: https://github.com/robertdavidgraham/wc2 Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Parallel copy
On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote: > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma wrote: > > > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila wrote: > > > This is something similar to what I had also in mind for this idea. I > > > had thought of handing over complete chunk (64K or whatever we > > > decide). The one thing that slightly bothers me is that we will add > > > some additional overhead of copying to and from shared memory which > > > was earlier from local process memory. And, the tokenization (finding > > > line boundaries) would be serial. I think that tokenization should be > > > a small part of the overall work we do during the copy operation, but > > > will do some measurements to ascertain the same. > > > > I don't think any extra copying is needed. > > I am talking about access to shared memory instead of the process > local memory. I understand that an extra copy won't be required. Isn't accessing shared memory from different pieces of execution what threads were designed to do? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Parallel copy
On Sat, Feb 15, 2020 at 06:02:06PM +0530, Amit Kapila wrote: > On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner wrote: > > > > On Sat, 15 Feb 2020 at 04:55, Amit Kapila wrote: > > > > > > On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner > > > wrote: > > > > > > ... > > > > > > > > Parsing rows from the raw input (the work done by CopyReadLine()) in a > > > > single process would accommodate line returns in quoted fields. I don't > > > > think there's a way of getting parallel workers to manage the > > > > in-quote/out-of-quote state required. > > > > > > > > > > AFAIU, the whole of this in-quote/out-of-quote state is manged inside > > > CopyReadLineText which will be done by each of the parallel workers, > > > something on the lines of what Thomas did in his patch [1]. > > > Basically, we need to invent a mechanism to allocate chunks to > > > individual workers and then the whole processing will be done as we > > > are doing now except for special handling for partial tuples which I > > > have explained in my previous email. Am, I missing something here? > > > > > The problem case that I see is the chunk boundary falling in the > > middle of a quoted field where > > - The quote opens in chunk 1 > > - The quote closes in chunk 2 > > - There is an EoL character between the start of chunk 2 and the closing > > quote > > > > When the worker processing chunk 2 starts, it believes itself to be in > > out-of-quote state, so only data between the start of the chunk and > > the EoL is regarded as belonging to the partial line. From that point > > on the parsing of the rest of the chunk goes off track. > > > > Some of the resulting errors can be avoided by, for instance, > > requiring a quote to be preceded by a delimiter or EoL. That answer > > fails when fields end with EoL characters, which happens often enough > > in the wild. > > > > Recovering from an incorrect in-quote/out-of-quote state assumption at > > the start of parsing a chunk just seems like a hole with no bottom. So > > it looks to me like it's best done in a single process which can keep > > track of that state reliably. > > > > Good point and I agree with you that having a single process would > avoid any such stuff. However, I will think some more on it and if > you/anyone else gets some idea on how to deal with this in a > multi-worker system (where we can allow each worker to read and > process the chunk) then feel free to share your thoughts. I see two pieces of this puzzle: an input format we control, and the ones we don't. In the former case, we could encode all fields with base85 (or something similar that reduces the input alphabet efficiently), then reserve bytes that denote delimiters of various types. ASCII has separators for file, group, record, and unit that we could use as inspiration. I don't have anything to offer for free-form input other than to agree that it looks like a hole with no bottom, and maybe we should just keep that process serial, at least until someone finds a bottom. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: allow frontend use of the backend's core hashing functions
On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote: > > On Feb 14, 2020, at 8:15 AM, David Fetter wrote: > > > > On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote: > >> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger > >> wrote: > >>> I have made these changes and rebased Robert’s patches but > >>> otherwise changed nothing. Here they are: > >> > >> Thanks. Anyone else have comments? I think this is pretty > >> straightforward and unobjectionable work so I'm inclined to press > >> forward with committing it fairly soon, but if someone feels > >> otherwise, please speak up. > > > > One question. It might be possible to make these functions faster > > using compiler intrinsics. Would those still be available to front-end > > code? > > Do you have a specific proposal that would preserve on-disk compatibility? I hadn't planned on changing the representation, just cutting instructions out of the calculation of same. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: allow frontend use of the backend's core hashing functions
On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote: > On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger > wrote: > > I have made these changes and rebased Robert’s patches but > > otherwise changed nothing. Here they are: > > Thanks. Anyone else have comments? I think this is pretty > straightforward and unobjectionable work so I'm inclined to press > forward with committing it fairly soon, but if someone feels > otherwise, please speak up. One question. It might be possible to make these functions faster using compiler intrinsics. Would those still be available to front-end code? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING
On Thu, Feb 13, 2020 at 11:13:32AM +, Alexey Bashtanov wrote: > Hello, > > Currently the documentation says that one can put "a list of table > expressions" > after FROM in UPDATE or after USING in DELETE. > However, "table expression" is defined as a complex of > FROM, WHERE, GROUP BY and HAVING clauses [1]. > The thing one can list in the FROM clause in a comma-separated manner > is called a table reference [2]. > SELECT reference does not use this term but explains what they could be [3]. > > Please could someone have a look at the patch attached? > It's not just pedantry but rather based on a real-life example of someone > reading and being not sure > whether e.g. joins can be used in there. Thanks for doing this! Speaking of examples, there should be more of them illustrating some of the cases you name. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Just for fun: Postgres 20?
On Wed, Feb 12, 2020 at 05:25:15PM +0100, Juan José Santamaría Flecha wrote: > On Wed, Feb 12, 2020 at 3:47 PM Tom Lane wrote: > > > > > Yeah; I don't think it's *that* unlikely for it to happen again. But > > my own principal concern about this mirrors what somebody else already > > pointed out: the one-major-release-per-year schedule is not engraved on > > any stone tablets. So I don't want to go to a release numbering system > > that depends on us doing it that way for the rest of time. > > > > > We could you use as version identifier, so people will not expect > correlative numbering. SQL Server is being released every couple of years > and they are using this naming shema. The problem would be releasing twice > the same year, but how likely would that be? We've released more than one major version in a year before, so we have a track record of that actually happening. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Internal key management system
On Mon, Feb 10, 2020 at 05:57:47PM -0800, Andres Freund wrote: > Hi, > > On 2020-02-08 12:07:57 -0500, Tom Lane wrote: > > For the same reason, I don't think that an "internal key management" > > feature in the core code is ever going to be acceptable. It has to > > be an extension. (But, as long as it's an extension, whether it's > > bringing its own crypto or relying on some other extension for that > > doesn't matter from the legal standpoint.) > > I'm not convinced by that. We have optional in-core functionality that > requires external libraries, and we add more cases, if necessary. Take for example libreadline, without which our CLI is at best dysfunctional. > > > Sure, I know the code is currently calling ooenssl functions. I > > > was responding to Masahiko-san's message that we might > > > eventually merge this openssl code into our tree. > > > > No. This absolutely, positively, will not happen. There will > > never be crypto functions in our core tree, because then there'd > > be no recourse for people who want to use Postgres in countries > > with restrictions on crypto software. It's hard enough for them > > that we have such code in contrib --- but at least they can remove > > pgcrypto and be legal. If it's in src/common then they're stuck > > Isn't that basically a problem of the past by now? > > Partially due to changed laws (e.g. France, which used to be a > problematic case), It's less of a problem than it once was, but it's not exactly gone yet. https://en.wikipedia.org/wiki/Restrictions_on_the_import_of_cryptography > but also because it's basically futile to have > import restrictions on cryptography by now. Just about every larger > project contains significant amounts of cryptographic code and it's > entirely impractical to operate anything interfacing with network > without some form of transport encryption. And just about all open > source distribution mechanism have stopped separating out crypto > code a long time ago. That's true. We have access to legal counsel. Maybe it's worth asking them how best to include cryptographic functionality, "how" being the question one asks when one wants to get a positive response. > I however do agree that we should strive to not introduce > cryptographic code into the pg source tree - nobody here seems to > have even close to enough experience to maintaining / writing that. +1 for not turning ourselves into implementers of cryptographic primitives. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use compiler intrinsics for bit ops in hash
On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > 1. Is ceil_log2_64 dead code? > > > > Let's call it nascent code. I suspect there are places it could go, if > > I look for them. Also, it seemed silly to have one without the other. > > > > While not absolutely required, I'd like us to find at least one > place and start using it. (Clang also nags at me when we have > unused functions). Done in the expanded patches attached. > > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote: > > > 4. It seems like you *really* would like an operation like LZCNT in x86 > > > (first appearing in Haswell) that is well defined on zero input. ISTM > > > the alternatives are: > > > > > >a) Special case 1. That seems straightforward, but the branching cost > > >on a seemingly unlikely condition seems to be a lot of performance > > >loss > > > > > >b) Use architecture specific intrinsic (and possibly with CPUID > > >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ > > >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to > > >instructions that are well defined on zero in most ISA's other than > > >x86, so maybe we can get away with special-casing x86? > > i. We can detect LZCNT instruction by checking one of the > "extended feature" (EAX=8001) bits using CPUID. Unlike the > "basic features" (EAX=1), extended feature flags have been more > vendor-specific, but fortunately it seems that the feature bit > for LZCNT is the same [1][2]. > > ii. We'll most likely still need to provide a fallback > implementation for processors that don't have LZCNT (either > because they are from a different vendor, or an older Intel/AMD > processor). I wonder if simply checking for 1 is "good enough". > Maybe a micro benchmark is in order? I'm not sure how I'd run one on the architectures we support. What I've done here is generalize our implementation to be basically like LZCNT and TZCNT at the cost of a brief branch that might go away at runtime. > 2. (My least favorite) use inline asm (a la our popcount > implementation). Yeah, I'd like to fix that, but I kept the scope of this one relatively narrow. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 5fcaa74146206e4de05ca8cbd863aca20bba94bf Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 29 Jan 2020 02:09:59 -0800 Subject: [PATCH v4 1/2] de-long-ify To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.24.1" This is a multi-part message in MIME format. --2.24.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c index 4b562d8d3f..482a569814 100644 --- a/src/backend/access/gist/gistbuildbuffers.c +++ b/src/backend/access/gist/gistbuildbuffers.c @@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer, IndexTuple item); static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer, IndexTuple *item); -static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); -static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum); +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); +static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum); -static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr); -static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr); +static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr); +static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr); /* @@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel) /* Initialize free page management. */ gfbb->nFreeBlocks = 0; gfbb->freeBlocksLen = 32; - gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long)); + gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64)); /* * Current memory context will be used for all in-memory data structures @@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer, /* * Select a currently unused block for writing to. */ -static long +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) { /* @@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBui
Re: Increase psql's password buffer size
On Tue, Jan 21, 2020 at 07:05:47PM +0100, David Fetter wrote: > On Tue, Jan 21, 2020 at 10:23:59AM -0500, Bruce Momjian wrote: > > On Tue, Jan 21, 2020 at 04:19:13PM +0100, David Fetter wrote: > > > On Tue, Jan 21, 2020 at 10:12:52AM -0500, Bruce Momjian wrote: > > > > I think we should be using a macro to define the maximum length, rather > > > > than have 100 used in various places. > > > > > > It's not just 100 in some places. It's different in different places, > > > which goes to your point. > > > > > > How about using a system that doesn't meaningfully impose a maximum > > > length? The shell variable is a const char *, so why not just > > > re(p)alloc as needed? > > > > Uh, how do you know how big to make the buffer that receives the read? > > You can start at any size, possibly even 100, and then increase the > size in a loop along the lines of (untested) [and unworkable] I should have tested the code, but my point about using rep?alloc() remains. Best, David. Working code: int main(int argc, char **argv) { size_t my_size = 2, curr_size = 0; char *buf; int c; buf = (char *) malloc(my_size); printf("Enter a nice, long string.\n"); while( (c = getchar()) != '\0' ) { buf[curr_size++] = c; if (curr_size == my_size) { my_size *= 2; buf = (char *) realloc(buf, my_size); } } printf("The string %s is %zu bytes long.\n", buf, curr_size); } -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Minor issues in .pgpass
On Tue, Jan 21, 2020 at 03:27:50PM +0900, Fujii Masao wrote: > Hi, > > When I was researching the maximum length of password in PostgreSQL > to answer the question from my customer, I found that there are two > minor issues in .pgpass file. > > (1) If the length of a line in .pgpass file is larger than 319B, >libpq silently treats each 319B in the line as a separate >setting line. This seems like a potentially serious bug. For example, a truncated password could get retried enough times to raise intruder alarms, and it wouldn't be easy to track down. > (2) The document explains that a line beginning with # is treated >as a comment in .pgpass. But as far as I read the code, >there is no code doing such special handling. This is a flat-out bug, as it violates a promise the documentation has made. >Also if the length of that "comment" line is larger than 319B, >the latter part of the line can be treated as valid setting. > You may think that these unexpected behaviors are not so harmful > in practice because "usually" the length of password setting line is > less than 319B and the hostname beginning with # is less likely to be > used. But the problem exists. And there are people who want to use > large password or to write a long comment (e.g., with multibyte > characters like Japanese) in .pgass, so these may be more harmful > in the near future. > > For (1), I think that we should make libpq warn if the length of a line > is larger than 319B, and throw away the remaining part beginning from > 320B position. Whether to enlarge the length of a line should be > a separate discussion, I think. Agreed. > For (2), libpq should treat any lines beginning with # as comments. Would it make sense for lines starting with whitespace and then # to be treated as comments, too, e.g.: # Please don't treat this as a parameter ? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Increase psql's password buffer size
On Tue, Jan 21, 2020 at 10:23:59AM -0500, Bruce Momjian wrote: > On Tue, Jan 21, 2020 at 04:19:13PM +0100, David Fetter wrote: > > On Tue, Jan 21, 2020 at 10:12:52AM -0500, Bruce Momjian wrote: > > > I think we should be using a macro to define the maximum length, rather > > > than have 100 used in various places. > > > > It's not just 100 in some places. It's different in different places, > > which goes to your point. > > > > How about using a system that doesn't meaningfully impose a maximum > > length? The shell variable is a const char *, so why not just > > re(p)alloc as needed? > > Uh, how do you know how big to make the buffer that receives the read? You can start at any size, possibly even 100, and then increase the size in a loop along the lines of (untested) my_size = 100; my_buf = char[my_size]; curr_size = 0; while (c = getchar() != '\0') { my_buf[curr_size++] = c; if (curr_size == my_size) /* If we want an absolute maximum, this'd be the place to test for it. */ { my_size *= 2; repalloc(my_buf, my_size); } } Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate