Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-13 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 8:47 AM James Coleman  wrote:
> It seems like the consensus over at another discussion on this topic
> [1] is that we ought to go ahead and print the zeros [for machine
> readable output formats], even though that creates some interesting
> scenarios like the fact that disk sorts will print 0 for memory even
> though that's not true.

What about having it print -1 for memory in this case instead? That's
still not ideal, but machine readable EXPLAIN output ought to
consistently show the same information per node, even when the answer
is in some sense indeterminate. That seems to be the standard that
we've settled on.

It might be worth teaching the JSON format to show a JSON null or
something instead. Not sure about that, though.

-- 
Peter Geoghegan




Re: Reducing WaitEventSet syscall churn

2020-07-13 Thread Thomas Munro
On Sun, Jul 12, 2020 at 7:22 AM Tom Lane  wrote:
> [...complaints about 0005 and 0006...] We already have
> PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection
> state change events, and I don't see why those aren't sufficient.

I think Horiguchi-san's general idea of using event callbacks for this
sounds much more promising than my earlier idea of exposing a change
counter (that really was terrible).  If it can be done with existing
events then that's even better.  Perhaps he and/or I can look into
that for the next CF.

In the meantime, here's a rebase of the more straightforward patches
in the stack.  These are the ones that deal only with fixed sets of
file descriptors, and they survive check-world on Linux,
Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least
check on macOS and Windows (my CI recipes need more work to get
check-world working on those two).  There's one user-visible change
that I'd appreciate feedback on: I propose to drop the FATAL error
when the postmaster goes away, to make things more consistent.  See
below for more on that.

Responding to earlier review from Horiguchi-san:

On Tue, Mar 10, 2020 at 12:20 PM Kyotaro Horiguchi
 wrote:
> > 0002: "Use a long lived WaitEventSet for WaitLatch()."
> >
> > In the last version, I had a new function WaitMyLatch(), but that
> > didn't help with recoveryWakeupLatch.  Switching between latches
> > doesn't require a syscall, so I didn't want to have a separate WES and
> > function just for that.  So I went back to using plain old
> > WaitLatch(), and made it "modify" the latch every time before waiting
> > on CommonWaitSet.  An alternative would be to get rid of the concept
> > of latches other than MyLatch, and change the function to
> > WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> > for which I didn't want to have a separate WES, so I just set that
> > flag every time.  Thoughts?
>
> It is surely an improvement from that we create a full-fledged WES
> every time. The name CommonWaitSet gives an impression as if it is
> used for a variety of waitevent sets, but it is used only by
> WaitLatch. So I would name it LatchWaitSet. With that name I won't be
> surprised by that the function is pointing WL_LATCH_SET by index 0
> without any explanation when calling ModifyWaitSet.

Ok, I changed it to LatchWaitSet.  I also replaced the index 0 with a
symbolic name LatchWaitSetLatchPos, to make that clearer.

> @@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
> ReleaseExternalFD();
>  #elif defined(WAIT_USE_KQUEUE)
> close(set->kqueue_fd);
> -   ReleaseExternalFD();
> +   if (set->kqueue_fd >= 0)
> +   {
> +   close(set->kqueue_fd);
> +   ReleaseExternalFD();
> +   }
>
> Did you forget to remove the close() outside the if block?
> Don't we need the same amendment for epoll_fd with kqueue_fd?

Hmm, maybe I screwed that up when resolving a conflict with the
ReleaseExternalFD() stuff.  Fixed.

> WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
> is given), the function returns immediately.". But now the function
> reacts to latch even if WL_LATCH_SET is not set. I think actually it
> is alwys set so I think we need to modify Assert and function comment
> following the change.

It seems a bit silly to call WaitLatch() if you don't want to wait for
a latch, but I think we can keep that comment and logic by assigning
set->latch = NULL when you wait without WL_LATCH_SET.  I tried that in
the attached.

> > 0004: "Introduce RemoveWaitEvent()."
> >
> > We'll need that to be able to handle connections that are closed and
> > reopened under the covers by libpq (replication, postgres_fdw).  We
> > also wanted this on a couple of other threads for multiplexing FDWs,
> > to be able to adjust the wait set dynamically for a proposed async
> > Append feature.
> >
> > The implementation is a little naive, and leaves holes in the
> > "pollfds" and "handles" arrays (for poll() and win32 implementations).
> > That could be improved with a bit more footwork in a later patch.
> >
> > XXX The Windows version is based on reading documentation.  I'd be
> > very interested to know if check-world passes (especially
> > contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> > appveyor.yml fu is not yet strong enough.
>
> I didn't find the documentation about INVALID_HANDLE_VALUE in
> lpHandles. Could you give me the URL for that?

I was looking for how you do the equivalent of Unix file descriptor -1
in a call to poll(), and somewhere I read that INVALID_HANDLE_VALUE
has the right effect.  I can't find that reference now.  Apparently it
works because that's the pseudo-handle value -1 that is returned by
GetCurrentProcess(), and waiting for your own process waits forever so
it's a suitable value for holes in an array of event handles.  We
should probably call GetCurrentProcess() instead, but really that is
just stand-in code: we sh

Re: Improvements in Copy From

2020-07-13 Thread vignesh C
On Tue, Jul 14, 2020 at 11:13 AM David Rowley  wrote:
>
> On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
> >
> > On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > > that is not being used, it can be removed.
> >
> > This was raised in [1]. We decided not to remove it.
>
> I just added a comment to the function to mention why we want to keep
> the parameter. I hope that will save any wasted time proposing its
> removal in the future.
>
> FWIW, the function is inlined.  Removing it will gain us nothing
> performance-wise anyway.
>
> David
>
> > [1] 
> > https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef

Thanks David for pointing it out, as this has been discussed and
concluded no point in discussing the same thing again. This patch has
a couple of other improvements which can still be taken forward. I
will remove this change and post a new patch to retain the other
issues that were fixed.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: replication_origin and replication_origin_lsn usage on subscriber

2020-07-13 Thread Dilip Kumar
On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila  wrote:
>
> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar  wrote:
> >
> > On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila  wrote:
> > >
> > > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek  wrote:
> > > >
> > > >
> > > > If we were to support the origin forwarding, then strictly speaking we
> > > > need everything only at commit time from correctness perspective,
> > > >
> > >
> > > Okay.  Anyway streaming mode is optional, so in such cases, we can keep 
> > > it 'off'
> > >
> > > > but
> > > > ideally origin_id would be best sent with first message as it can be
> > > > used to filter out changes at decoding stage rather than while we
> > > > process the commit so having it set early improves performance of 
> > > > decoding.
> > > >
> > >
> > > Yeah, makes sense.  So, we will just send origin_id (with first
> > > streaming start message) and leave others.
> >
> > So IIUC, currently we are sending the latest origin_id which is set
> > during the commit time.  So in our case, while we start streaming we
> > will send the origin_id of the latest change in the current stream
> > right?
> >
>
> It has to be sent only once with the first start message not with
> consecutive start messages.

Okay,  so do you mean to say that with the first start message we send
the origin_id of the latest change?  because during the transaction
lifetime, the origin id can be changed.  Currently, we send the
origin_id of the latest WAL i.e. origin id of the commit.  so I think
it will be on a similar line if with every stream_start we send the
origin_id of the latest change in that stream.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Pavel Stehule
po 13. 7. 2020 v 15:33 odesílatel vignesh C  napsal:

> On Mon, Jul 13, 2020 at 1:51 PM Pavel Stehule 
> wrote:
> >> Can this be changed to dump objects and data based on the filter
> >> expressions from the filter file.
> >
> >
> > I am sorry, I don't understand. This should work for data from specified
> by filter without any modification.
> >
> I meant can this:
> printf(_("  --filter=FILENAMEread object name filter
> expressions from file\n"));
> be changed to:
> printf(_("  --filter=FILENAMEdump objects and data based
> on the filter expressions from the filter file\n"));
>

done in today patch

Pavel


> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>


Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Pavel Stehule
po 13. 7. 2020 v 20:00 odesílatel Justin Pryzby 
napsal:

> On Mon, Jul 13, 2020 at 08:15:42AM +0200, Pavel Stehule wrote:
> > > Do you want to add any more documentation ?
> > >
> >
> > done
>
> Thanks - I think the documentation was maybe excessive.  See attached.
>

I merged your patch - thank you

new patch with doc changes and text of help change requested by Vignesh
attached

Regards

Pavel



> --
> Justin
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7a37fd8045..c9502dca13 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -755,6 +755,50 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Read objects filters from the specified file.
+If you use "-" as a filename, the filters are read from stdin.
+The lines of this file must have the following format:
+
+(+|-)[tnfd] objectname
+
+   
+
+   
+The first character specifies whether the object is to be included
+(+) or excluded (-), and the
+second character specifies the type of object to be filtered:
+t (table),
+n (schema),
+f (foreign table),
+d (table data).
+   
+
+   
+With the following filter file, the dump would include table
+mytable1 and data from foreign table
+some_foreign_table, but exclude data
+from table mytable2.
+
++t mytable1
++f some_foreign_table
+-d mytable2
+
+   
+
+   
+The --filter option works just like the other
+options to include or exclude tables, schemas, table data, or foreign
+tables, and both forms may be combined.  Note that there are no options
+to exclude a specific foreign table or to include a specific table's
+data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e758b5c50a..778ae2d0fc 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -290,6 +290,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(TableInfo *tbinfo);
+static void read_patterns_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -364,6 +365,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -603,6 +605,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_patterns_from_file(optarg, &dopt);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1022,6 +1028,8 @@ help(const char *progname)
 			 "   access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAMEdump objects and data based on the filter expressions\n"
+			 "   from the filter file\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "   include data of foreign tables on foreign\n"
@@ -18597,3 +18605,211 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse reloptions array");
 }
+
+#define		FILTER_INITIAL_LINE_SIZE		1024
+#define		PG_GETLINE_EXTEND_LINE_SIZE		1024
+
+/*
+ * getline is originally GNU function, and should not be everywhere still.
+ * Use own reduced implementation.
+ */
+static size_t
+pg_getline(char **lineptr, size_t *n, FILE *fp)
+{
+	size_t		total_chars = 0;
+
+	while (!feof(fp) && !ferror(fp))
+	{
+		char	   *str;
+		size_t		chars;
+
+		str = fgets(*lineptr + total_chars,
+	*n - total_chars,
+	fp);
+
+		if (ferror(fp))
+			return -1;
+
+		if (str)
+		{
+			chars = strlen(str);
+			total_chars += chars;
+
+			if (chars > 0 && str[chars - 1] == '\n')
+return total_chars;
+
+			/* check, if there is good enough space for next content */
+			if (*n - total_chars < 2)
+			{
+*n += PG_GETLINE_EXTEND_LINE_SIZE;
+*lineptr = pg_realloc(*lineptr, *n);
+			}
+		}
+		else
+			break;
+	}
+
+	if (ferror(fp))
+		return -1;
+
+	return total_chars > 0 ? total_chars : -1;
+}
+
+/*
+ * Print error message and exit.
+ */
+static void
+exit_invalid_filter_format(FILE *fp, char *filename, char *me

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Andrey M. Borodin
Hi!

> 14 июля 2020 г., в 02:12, Robert Haas  написал(а):
> 
> So I have these questions:
> 
> - Do people think it would me smart/good/useful to include something
> like this in PostgreSQL?
> 
> - If so, how? I would propose a new contrib module that we back-patch
> all the way


My 0.05₽.

At Yandex we used to fix similar corruption things with our pg_dirty_hands 
extension [0].
But then we developed our internal pg_heapcheck module (unfortunately we did 
not publish it) and incorporated aggressive recovery into heapcheck.

Now when community has official heapcheck I think it worth to keep detection 
and fixing tools together.

Best regards, Andrey Borodin.

[0] https://github.com/dsarafan/pg_dirty_hands/blob/master/src/pg_dirty_hands.c





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-13 Thread Peter Geoghegan
On Thu, Jul 9, 2020 at 12:06 PM Jonathan S. Katz  wrote:
> On 7/2/20 11:47 AM, James Coleman wrote:
> > It seems like the consensus over at another discussion on this topic
> > [1] is that we ought to go ahead and print the zeros [for machine
> > readable output formats], even though that creates some interesting
> > scenarios like the fact that disk sorts will print 0 for memory even
> > though that's not true.
> >
> > The change has already been made and pushed for hash disk spilling, so
> > I think we ought to use Justin's patch here.
>
> Do people agree with James analysis? From the RMT perspective, we would
> like to get this open item wrapped up for the next beta, given[1] is now
> resolved.

Tomas, Justin: Ping? Can we get an update on this?

Just for the record, David Rowley fixed the similar hashagg issue in
commit 40efbf8706cdd96e06bc4d1754272e46d9857875. I don't see any
reason for the delay here.

Thanks
-- 
Peter Geoghegan




RE: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM

2020-07-13 Thread k.jami...@fujitsu.com
On Tuesday, July 14, 2020 3:01 AM (GMT+9), Bossart, Nathan wrote:

Hi Nathan,

>On 7/13/20, 11:02 AM, "Justin Pryzby"  wrote:
>> Should bin/vacuumdb support this?
>
>Yes, it should.  I've added it in v5 of the patch.

Thank you for the updated patch. I've joined as a reviewer.
I've also noticed that you have incorporated Justin's suggested vacuumdb support
in the recent patch, but in my opinion it'd be better to split them for better 
readability.
According to the cfbot, patch applies cleanly and passes all the tests.

[Use Case]
>The main use case I'm targeting is when the level of bloat or
>transaction ages of a relation and its TOAST table have significantly
>diverged.  In these scenarios, it could be beneficial to be able to
>vacuum just one or the other, especially if the tables are large.
>...
>I reused the existing VACOPT_SKIPTOAST option to implement
>SECONDARY_RELATION_CLEANUP.  This option is currently only used for
>autovacuum.

Perhaps this has not gathered much attention yet because it's not experienced
by many, but I don't see any problem with the additional options on manual
VACUUM on top of existing autovacuum cleanups. And I think this is useful
for the special use case mentioned, especially that toast table access is not
in public as per role limitation.

[Docs]
I also agree with "TOAST_TABLE_CLEANUP" and just name the options after the
respective proposed relation types in the future.

+MAIN_RELATION_CLEANUP
+
+ 
+  Specifies that VACUUM should attempt to process the
+  main relation.  This is normally the desired behavior and is the default.
+  Setting this option to false may be useful when it is necessary to only
+  vacuum a relation's corresponding TOAST table.

Perhaps it's just my own opinion, but I think the word "process" is vague for
a beginner in postgres reading the documents. OTOH, I know it's also used
in the source code, so I guess it's just the convention. And "process" is
intuititve as "processing tables". Anyway, just my 2 cents & isn't a strong
opinion.

Also, there's an extra space between the 1st and 2nd sentences.


+TOAST_TABLE_CLEANUP
+
+ 
+  Specifies that VACUUM should attempt to process the
+  corresponding TOAST table for each relation, if one
+  exists.  This is normally the desired behavior and is the default.
+  Setting this option to false may be useful when it is necessary to only
+  vacuum the main relation.  This option cannot be disabled when the
+  FULL option is used.

Same comments as above, & extra spaces in between the sentences. 

@@ -1841,9 +1865,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams 
*params)
/*
 * Remember the relation's TOAST relation for later, if the caller asked
 * us to process it.  In VACUUM FULL, though, the toast table is
-* automatically rebuilt by cluster_rel so we shouldn't recurse to it.
+* automatically rebuilt by cluster_rel, so we shouldn't recurse to it
+* unless MAIN_RELATION_CLEANUP is disabled.

The additional last line is a bit confusing (and may be unnecessary/unrelated).
To clarify this thread on VACUUM FULL and my understanding of revised 
vacuum_rel below,
we allow MAIN_RELATION_CLEANUP option to be disabled (skip processing main 
relation)
and TOAST_TABLE_CLEANUP should be disabled because cluster_rel() will process 
the 
toast table anyway.
Is my understanding correct? If yes, then maybe "unless" should be "even if" 
instead,
or we can just remove the line.

 static bool
-vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
+vacuum_rel(Oid relid,
+  RangeVar *relation,
+  VacuumParams *params,
+  bool processing_toast_table)
{
...
+   boolprocess_toast;
...

-   if (!(params->options & VACOPT_SKIPTOAST) && !(params->options & 
VACOPT_FULL))
+   process_toast = (params->options & VACOPT_TOAST_CLEANUP) != 0;
+
+   if ((params->options & VACOPT_FULL) != 0 &&
+   (params->options & VACOPT_MAIN_REL_CLEANUP) != 0)
+   process_toast = false;
+
+   if (process_toast)
toast_relid = onerel->rd_rel->reltoastrelid;
else
toast_relid = InvalidOid;
...

 * Do the actual work --- either FULL or "lazy" vacuum
+*
+* We skip this part if we're processing the main relation and
+* MAIN_RELATION_CLEANUP has been disabled.
 */
-   if (params->options & VACOPT_FULL)
+   if ((params->options & VACOPT_MAIN_REL_CLEANUP) != 0 ||
+   processing_toast_table)
...
if (toast_relid != InvalidOid)
-   vacuum_rel(toast_relid, NULL, params);
+   vacuum_rel(toast_relid, NULL, params, true);



>I've attached v3 of the patch, which removes the restriction on
>ANALYZE with MAIN_RELATION_CLEANUP disabled.

I've also confirmed those through regression + tap test in my own env
and they'

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Amit Langote
On Tue, Jul 14, 2020 at 2:02 PM vignesh C  wrote:
> On Tue, Jul 14, 2020 at 7:26 AM Amit Langote  wrote:
> > In CopyLoadRawBuf(), we could also change the condition if
> > (cstate->raw_buf_index < cstate->raw_buf_len) to if (BUF_BYTES > 0),
> > which looks clearer.
> >
> > Also, if we are going to use the macro more generally, let's make it
> > look less localized.  For example, rename it to RAW_BUF_BYTES similar
> > to RAW_BUF_SIZE and place their definitions close by.  It also seems
> > like a good idea to make 'cstate' a parameter for clarity.
> >
> > Attached v6.
> >
>
> Thanks for making the changes.
>
> -   if (cstate->raw_buf_index < cstate->raw_buf_len)
> +   if (RAW_BUF_BYTES(cstate) > 0)
> {
> /* Copy down the unprocessed data */
> -   nbytes = cstate->raw_buf_len - cstate->raw_buf_index;
> +   nbytes = RAW_BUF_BYTES(cstate);
> memmove(cstate->raw_buf, cstate->raw_buf +
> cstate->raw_buf_index,
> nbytes);
> }
>
> One small improvement could be to change it like below to reduce few
> more instructions:
> static bool
> CopyLoadRawBuf(CopyState cstate)
> {
> int nbytes = RAW_BUF_BYTES(cstate);
> int inbytes;
>
> /* Copy down the unprocessed data */
> if (nbytes > 0)
> memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
> nbytes);
>
> inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
>   1, RAW_BUF_SIZE - nbytes);
> nbytes += inbytes;
> cstate->raw_buf[nbytes] = '\0';
> cstate->raw_buf_index = 0;
> cstate->raw_buf_len = nbytes;
> return (inbytes > 0);
> }

Sounds fine to me.  Although CopyLoadRawBuf() does not seem to a
candidate for rigorous code optimization as it does not get called
that often.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: replication_origin and replication_origin_lsn usage on subscriber

2020-07-13 Thread Amit Kapila
On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar  wrote:
>
> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila  wrote:
> >
> > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek  wrote:
> > >
> > >
> > > If we were to support the origin forwarding, then strictly speaking we
> > > need everything only at commit time from correctness perspective,
> > >
> >
> > Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 
> > 'off'
> >
> > > but
> > > ideally origin_id would be best sent with first message as it can be
> > > used to filter out changes at decoding stage rather than while we
> > > process the commit so having it set early improves performance of 
> > > decoding.
> > >
> >
> > Yeah, makes sense.  So, we will just send origin_id (with first
> > streaming start message) and leave others.
>
> So IIUC, currently we are sending the latest origin_id which is set
> during the commit time.  So in our case, while we start streaming we
> will send the origin_id of the latest change in the current stream
> right?
>

It has to be sent only once with the first start message not with
consecutive start messages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements in Copy From

2020-07-13 Thread David Rowley
On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
>
> On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > that is not being used, it can be removed.
>
> This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined.  Removing it will gain us nothing
performance-wise anyway.

David

> [1] 
> https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef




Re: TAP tests and symlinks on Windows

2020-07-13 Thread Michael Paquier
On Fri, Jul 10, 2020 at 07:58:02AM -0400, Andrew Dunstan wrote:
> After much frustration and gnashing of teeth here's a patch that allows
> almost all the TAP tests involving symlinks to work as expected on all
> Windows build environments, without requiring an additional Perl module.
> I have tested this on a system that is very similar to that running
> drongo and fairywren, with both msys2 and MSVC builds.

Thanks Andrew for looking at the part with MSYS.  The tests pass for
me with MSVC.  The trick with mklink is cool.  I have not considered
that, and the test code gets simpler.

+   my $cmd = qq{mklink /j "$newname" "$oldname"};
+   if ($Config{osname} eq 'msys')
+   {
+   # need some indirection on msys
+   $cmd = qq{echo '$cmd' | \$COMSPEC /Q};
+   }
+   note("dir_symlink cmd: $cmd");
+   system($cmd);
From the quoting perspective, wouldn't it be simpler to build an array
with all those arguments and call system() with @cmd?

+# Create files that look like temporary relations to ensure they are ignored
+# in a tablespace.
+my @tempRelationFiles = qw(t888_888 t88_88_vm.1);
This variable conflicts with a previous declaration, creating a
warning.

+   skip "symlink check not implemented on Windows", 1
+ if ($windows_os);
opendir(my $dh, "$pgdata/pg_tblspc") or die;
I think that this would be cleaner with a SKIP block.

+Portably create a symlink for a director. On Windows this creates a junction.
+Elsewhere it just calls perl's builtin symlink.
s/director/directory/
s/junction/junction point/

   
 The TAP tests require the Perl module IPC::Run.
 This module is available from CPAN or an operating system package.
+On Windows, Win32API::File is also required .
   
This part should be backpatched IMO.

Some of the indentation is weird, this needs a cleanup with perltidy.
--
Michael


signature.asc
Description: PGP signature


Re: replication_origin and replication_origin_lsn usage on subscriber

2020-07-13 Thread Dilip Kumar
On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila  wrote:
>
> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek  wrote:
> >
> > Hi,
> >
> > On 09/07/2020 14:34, Amit Kapila wrote:
> > > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek  wrote:
> > >>
> > >> On 09/07/2020 13:10, Amit Kapila wrote:
> > >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila  
> > >>> wrote:
> > 
> >  During logical decoding, we send replication_origin and
> >  replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> >  we send values for these two but never used on the subscriber side.
> >  Though we have provided a function (logicalrep_read_origin) to read
> >  these two values but that is not used in code anywhere.
> > 
> > >>
> > >> We don't use the origin message anywhere really because we don't support
> > >> origin forwarding in the built-in replication yet. That part I left out
> > >> intentionally in the original PG10 patchset as it's mostly useful for
> > >> circular replication detection when you want to replicate both ways.
> > >> However that's relatively useless without also having some kind of
> > >> conflict detection which would be another huge pile of code and I
> > >> expected we would end up not getting logical replication in PG10 at all
> > >> if I tried to push conflict detection as well :)
> > >>
> > >
> > > Fair enough.  However, without tests and more documentation about this
> > > concept, it is likely that future development might break it.  It is
> > > good that you and others who know this part well are there to respond
> > > but still, the more documentation and tests would be preferred.
> > >
> >
> > Honestly that part didn't even need to be committed given it's unused.
> > Protocol supports versioning so it could have been added at later time.
> >
> > >>>
> > >>> For the purpose of decoding in-progress transactions, I think we can
> > >>> send replication_origin in the first 'start' message as it is present
> > >>> with each WAL record, however replication_origin_lsn is only logged at
> > >>> commit time, so can't send it before commit.  The
> > >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> > >>> but it is not clear how and when that function can be used.  Do we
> > >>> really need replication_origin_lsn before we decode the commit record?
> > >>>
> > >>
> > >> That's the SQL interface, C interface does not require that and I don't
> > >> think we need to do that.
> > >>
> > >
> > > I think when you are saying SQL interface, you referred to
> > > pg_replication_origin_xact_setup() but I am not sure which C interface
> > > you are referring to in the above sentence?
> > >
> >
> > All the stuff pg_replication_origin_xact_setup does internally.
> >
> > >> The existing apply code sets the
> > >> replorigin_session_origin_lsn only when processing commit message IIRC.
> > >>
> > >
> > > That's correct.  However, we do send it via 'begin' callback which
> > > won't be possible with the streaming of in-progress transactions.  Do
> > > we need to send this origin related information (origin, origin_lsn)
> > > while streaming of in-progress transactions?  If so, when?  As far as
> > > I can see, the origin_id can be sent with the first 'start' message.
> > > The origin_lsn and origin_commit can be sent with the last 'start' of
> > > streaming commit if we want but not sure if that is of use.  If we
> > > need to send origin_lsn earlier than that then we need to record it
> > > with other WAL records (other than Commit WAL record).
> > >
> >
> > If we were to support the origin forwarding, then strictly speaking we
> > need everything only at commit time from correctness perspective,
> >
>
> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 
> 'off'
>
> > but
> > ideally origin_id would be best sent with first message as it can be
> > used to filter out changes at decoding stage rather than while we
> > process the commit so having it set early improves performance of decoding.
> >
>
> Yeah, makes sense.  So, we will just send origin_id (with first
> streaming start message) and leave others.

So IIUC, currently we are sending the latest origin_id which is set
during the commit time.  So in our case, while we start streaming we
will send the origin_id of the latest change in the current stream
right?  I think we will always have to remember the latest origin id
in top-level ReorderBufferTXN as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2020-07-13 Thread Pavel Stehule
út 14. 7. 2020 v 0:37 odesílatel David G. Johnston <
david.g.johns...@gmail.com> napsal:

> On Mon, Jul 13, 2020 at 2:12 AM Pavel Stehule 
> wrote:
>
>> I am reading this patch. I don't think so text for domains and types are
>> correct (or minimally it is little bit messy)
>> This case is a little bit more complex - domains are not subset of
>> relations. But relations (in Postgres) extends types.
>>
>
> Yeah, though in further working on this I dislike the saying "A composite
> type is a relation" (see Glossary and probably other spots).  That a table
> auto-creates a separate composite type, and depends on it, manifests a
> certain link between the two but the type that represents the table is not
> a relation as it doesn't hold data, it is just a definition.  If a
> composite type were a relation then whatever argument you use to justify
> that would seem to apply to non-composite types as well.
>
> I'm attaching version 2 as a plain diff (complete) instead of a patch.
>
> New with this version is the addition of tests for drop domain and drop
> type, and related documentation changes.  Notably pointing out the fact
> that DROP TYPE drops all types, including domains.
>
> To recap, the interesting relation related behaviors these tests
> demonstrate are:
>
> A non-failure while performing a DROP "relation" IF EXISTS command means
> that a subsequent CREATE "relation" command will not fail due to the name
> already existing (other failures are of course possible).
>
> In the presence of multiple schemas a failure of a DROP "relation" IF
> EXISTS command does not necessarily mean that an corresponding CREATE
> "relation" command would fail - the found entry could belong to a non-first
> schema on the search_path while the creation will place the newly created
> object always on the first schema.
>
> The plain meaning of the opposite of "DROP IF EXISTS" (i.e., it's not an
> error if the specified object doesn't exist, just move on) is not what
> actually happens but rather we provide an additional test related to
> namespace occupation that is now documented.
>
> The latter two items are explicitly documented while the first is implicit
> and self-evident.
>

I think so now all changes are correct and valuable. I''l mark this patch
as ready for commit

Thank you for patch

Regards

Pavel

>
> David J.
>
>


Re: Improvements in Copy From

2020-07-13 Thread David Rowley
On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

David

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef




Re: ALTER TABLE validate foreign key dependency problem

2020-07-13 Thread David Rowley
On Mon, 13 Jul 2020 at 08:13, Simon Riggs  wrote:
>
> On Sun, 12 Jul 2020 at 05:51, David Rowley  wrote:
>
>>
>> > I also considered that we could just delay all foreign key validations
>> > until phase 3, but I ended up just doing then only when a rewrite is
>> > pending.
>>
>> I still wonder if it's best to delay the validation of the foreign key
>> regardless of if there's a pending table rewrite, but the patch as it
>> is now only delays if there's a pending rewrite.
>
>
> Consistency seems the better choice, so I agree we should validate later in 
> all cases. Does changing that have any other effects?

Thanks for having a look here.

I looked at this again and noticed it wasn't just FOREIGN KEY
constraints. CHECK constraints were being validated at the wrong time
too.

I did end up going with unconditionally moving the validation until
phase 3. I've pushed fixed back to 9.5

David




Re: Improvements in Copy From

2020-07-13 Thread vignesh C
On Wed, Jul 1, 2020 at 6:16 PM vignesh C  wrote:
> Attached patch has the changes for the same.
> Thoughts?
>

Added a commitfest entry for this:
https://commitfest.postgresql.org/29/2642/

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Stale external URL in doc?

2020-07-13 Thread Kyotaro Horiguchi
At Tue, 14 Jul 2020 15:40:41 +1200, Thomas Munro  wrote 
in 
> On Tue, Jul 14, 2020 at 3:27 PM Kyotaro Horiguchi
>  wrote:
> > A. I didn't find the files gb-18030-2000.xml and windows-949-2000.xml
> >   in the ICU site.  We have our own copy in our repository so it's not
> >   a serious problem but I'm not sure what we should do for this.
> 
> The patch I posted earlier fixes that problem (their source repository moved).

- $(DOWNLOAD) 
https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
+ $(DOWNLOAD) 
https://raw.githubusercontent.com/unicode-org/icu-data/master/charset/data/xml/$(@F)

Wow. The URL works and makes no difference in related map files.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread vignesh C
On Tue, Jul 14, 2020 at 7:26 AM Amit Langote  wrote:
>
> Good idea, thanks.
>
> In CopyLoadRawBuf(), we could also change the condition if
> (cstate->raw_buf_index < cstate->raw_buf_len) to if (BUF_BYTES > 0),
> which looks clearer.
>
> Also, if we are going to use the macro more generally, let's make it
> look less localized.  For example, rename it to RAW_BUF_BYTES similar
> to RAW_BUF_SIZE and place their definitions close by.  It also seems
> like a good idea to make 'cstate' a parameter for clarity.
>
> Attached v6.
>

Thanks for making the changes.

-   if (cstate->raw_buf_index < cstate->raw_buf_len)
+   if (RAW_BUF_BYTES(cstate) > 0)
{
/* Copy down the unprocessed data */
-   nbytes = cstate->raw_buf_len - cstate->raw_buf_index;
+   nbytes = RAW_BUF_BYTES(cstate);
memmove(cstate->raw_buf, cstate->raw_buf +
cstate->raw_buf_index,
nbytes);
}

One small improvement could be to change it like below to reduce few
more instructions:
static bool
CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;

/* Copy down the unprocessed data */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);

inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
  1, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
cstate->raw_buf_len = nbytes;
return (inbytes > 0);
}

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Fix header identification

2020-07-13 Thread Michael Paquier
On Mon, Jul 13, 2020 at 09:31:58AM -0700, Jesse Zhang wrote:
> PFA a patch that fixes up the identification for 4 header files.

Thanks, Jesse.  Applied.
--
Michaelx


signature.asc
Description: PGP signature


Re: min_safe_lsn column in pg_replication_slots view

2020-07-13 Thread Kyotaro Horiguchi
At Mon, 13 Jul 2020 13:52:12 -0400, Alvaro Herrera  
wrote in 
> On 2020-Jul-13, Alvaro Herrera wrote:
> 
> > A much more sensible answer is to initialize the segno to the segment
> > currently being written, as in the attached.
> 
> Ran the valgrind test locally and it passes.  Pushed it now.

-   if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+   if (segno < *logSegNo)

Oops! Thank you for fixing it!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: GSSENC'ed connection stalls while reconnection attempts.

2020-07-13 Thread Kyotaro Horiguchi
At Mon, 13 Jul 2020 11:08:09 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Fri, 10 Jul 2020 12:01:10 -0400, Tom Lane  wrote 
> > intgl> >> I'm disinclined to mess with that, because (a) I don't think it's 
> > the
> >> actual source of the problem, and (b) it would affect way more than
> >> just GSS mode.
> 
> > If I did that in pqFlush your objection would be right, but
> > pqPacketSend is defined as "RETURNS: STATUS_ERROR if the write fails"
> > so not doing that is just wrong. (pqSendSome reported write failure in
> > this case.) For other parts in authentication code, I don't think it
> > doesn't affect badly because authentication should proceed without any
> > read/write overlapping.
> 
> As the comment for pqSendSome says, we report a write failure immediately
> only if we also cannot read.  I don't really see a reason why the behavior
> described there isn't fine during initial connection as well.  If you feel
> that the comment for pqPacketSend needs adjustment, we can do that.

I'm fine with that.

> In any case, I'm quite against changing pqPacketSend's behavior because
> "it's only used during initial connection"; there is nothing about the
> function that restricts it to that case.

That sounds fair enough.

> Bottom line here is that I'm suspicious of changing the behavior of
> the read/write code on the strength of a bug in the GSS state management
> logic.  If there's a reason to change the read/write code, we should be
> able to demonstrate it without the GSS bug.

Agreed to separate the change from this issue.  I also don't think
that change in behavior dramatically improve the situation since we
should have had a bunch of trouble when a write actually failed in the
normal case.

I'm going to post a patch to change the comment of pqPacketSend.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Editing errors in the comments of tableam.h and heapam.c

2020-07-13 Thread Michael Paquier
On Mon, Jul 13, 2020 at 09:52:12PM +0900, Michael Paquier wrote:
> Yeah.  We also recommend to look at tableam.h in the docs, so while
> usually I just bother fixing comments on HEAD, it would be better to
> back-patch this one.

Committed and back-patched down to 12.  Thanks, Suzuki-san.
--
Michael


signature.asc
Description: PGP signature


Re: max_slot_wal_keep_size and wal_keep_segments

2020-07-13 Thread Fujii Masao



On 2020/07/13 16:01, Kyotaro Horiguchi wrote:

At Mon, 13 Jul 2020 14:14:30 +0900, Fujii Masao  
wrote in



On 2020/07/09 13:47, Kyotaro Horiguchi wrote:

At Thu, 9 Jul 2020 00:37:57 +0900, Fujii Masao
 wrote in



On 2020/07/02 2:18, David Steele wrote:

On 7/1/20 10:54 AM, Alvaro Herrera wrote:

On 2020-Jul-01, Fujii Masao wrote:


On 2020/07/01 12:26, Alvaro Herrera wrote:

On 2020-Jun-30, Fujii Masao wrote:


When I talked about max_slot_wal_keep_size as new feature in v13
at the conference, I received the question like "Why are the units of
setting values in max_slot_wal_keep_size and wal_keep_segments
different?"
from audience. That difference looks confusing for users and
IMO it's better to use the same unit for them. Thought?


Do we still need wal_keep_segments for anything?


Yeah, personally I like wal_keep_segments because its setting is very
simple and no extra operations on replication slots are necessary.


Okay.  In that case I +1 the idea of renaming to wal_keep_size.

+1 for renaming to wal_keep_size.


I attached the patch that renames wal_keep_segments to wal_keep_size.

It fails on 019_replslot_limit.pl for uncertain reason to me..


I could not reproduce this...


Sorry for the ambiguity.  The patch didn't applied on the file, and I
noticed that the reason is the wording change from master to
primary. So no problem in the latest patch.


@@ -11323,7 +11329,7 @@ do_pg_stop_backup(char *labelfile, bool
waitforarchive, TimeLineID *stoptli_p)
 * If archiving is enabled, wait for all the required WAL files to be
 * archived before returning. If archiving isn't enabled, the required
 * WAL
 * needs to be transported via streaming replication (hopefully with
-* wal_keep_segments set high enough), or some more exotic mechanism 
like
+ * wal_keep_size set high enough), or some more exotic mechanism like
 * polling and copying files from pg_wal with script. We have no
 * knowledge
Isn't this time a good chance to mention replication slots?


+1 to do that. But I found there are other places where replication
slots
need to be mentioned. So I think it's better to do this as separate
patch.


Agreed.


-   "ALTER SYSTEM SET wal_keep_segments to 8; SELECT pg_reload_conf();");
+ "ALTER SYSTEM SET wal_keep_size to '128MB'; SELECT
pg_reload_conf();");
wal_segment_size to 1MB here so, that conversion is not correct.
(However, that test works as long as it is more than
max_slot_wal_keep_size so it's practically no problem.)


So I changed 128MB to 8MB. Is this OK?
I attached the updated version of the patch upthread.


That change looks find by me.


Thanks for the review!

The patch was no longer applied cleanly because of recent commit.
So I updated the patch. Attached.

Barring any objection, I will commit this patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9cdff4864..3d973a5c8e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11249,7 +11249,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
ppx
  extended means
   that max_wal_size is exceeded but the files are
   still retained, either by the replication slot or
-  by wal_keep_segments.
+  by wal_keep_size.
  
 
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b353c61683..3384a5197d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3124,7 +3124,7 @@ include_dir 'conf.d'
 checkpoints. This is a soft limit; WAL size can exceed
 max_wal_size under special circumstances, such as
 heavy load, a failing archive_command, or a high
-wal_keep_segments setting.
+wal_keep_size setting.
 If this value is specified without units, it is taken as megabytes.
 The default is 1 GB.
 Increasing this parameter can increase the amount of time needed for
@@ -3751,21 +3751,21 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows

   
 
-  
-   wal_keep_segments (integer)
+  
+   wal_keep_size (integer)

-wal_keep_segments configuration 
parameter
+wal_keep_size configuration 
parameter




-Specifies the minimum number of past log file segments kept in the
+Specifies the minimum size of past log file segments kept in the
 pg_wal
 directory, in case a standby server needs to fetch them for streaming
-replication. Each segment is normally 16 megabytes. If a standby
+replication. If a standby
 server connected to the sending server falls behind by more than
-wal_keep_segments segments, the sending server 
might remove
-a WAL segment still needed by the standby, in which case 

Re: Stale external URL in doc?

2020-07-13 Thread Thomas Munro
On Tue, Jul 14, 2020 at 3:27 PM Kyotaro Horiguchi
 wrote:
> A. I didn't find the files gb-18030-2000.xml and windows-949-2000.xml
>   in the ICU site.  We have our own copy in our repository so it's not
>   a serious problem but I'm not sure what we should do for this.

The patch I posted earlier fixes that problem (their source repository moved).




Re: Stale external URL in doc?

2020-07-13 Thread Kyotaro Horiguchi
It is found to be a time capsule full of worms..

At Tue, 14 Jul 2020 09:00:11 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > Use of uninitialized value $b1_lower in printf at convutils.pm line 560.
> 
> Mmm. I see the same, too. I'm looking into that.

There are three easy-to-fix issues:

1. The script set utilized undef as zeros, so most of them are fixed
  by using zero for undefs.

2. Some Japanese-related converter scripts seem to be affected by a
  change of regexp greediness and easily fixed.

3. I got a certificate error for ssl.icu-project.org and found that
  the name is changed to icu-project.org. 

And one issue that I'm not sure how we shold treat this:

A. I didn't find the files gb-18030-2000.xml and windows-949-2000.xml
  in the ICU site.  We have our own copy in our repository so it's not
  a serious problem but I'm not sure what we should do for this.

  I found CP949.TXT for windows-949-2000.xml but the former is missing
  mappings for certaion code ranges (c9xx and fexx).


The attached is the fix for 1 to 3 above. It doesn't contain changes
in .map files.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1544d47d847607cf2b9a449d586ebf53f1bc241a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 14 Jul 2020 12:02:57 +0900
Subject: [PATCH v1] Fix conversion-table generator scripts

convutils.pm utilized implicit convertion of undefined value into an
integer zero. Some of conversion scripts are susceptible to regexp
greediness. Fix all of them.  Follow the changes of ICU site's
configuration.

This change yields one significant difference in resulting map files
for UHC. The mappings no longer have mappings for characters in the
range c9xx and fexx.
---
 src/backend/utils/mb/Unicode/Makefile |  2 +-
 .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl   |  4 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl |  3 +-
 .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl |  4 +-
 src/backend/utils/mb/Unicode/convutils.pm | 56 ++-
 5 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 9084f03009..4645441b64 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -122,7 +122,7 @@ euc-jis-2004-std.txt sjis-0213-2004-std.txt:
 	$(DOWNLOAD) http://x0213.org/codetable/$(@F)
 
 gb-18030-2000.xml windows-949-2000.xml:
-	$(DOWNLOAD) https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
+	echo "The source for (@F) no longer exists"
 
 GB2312.TXT:
 	$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
index 092a5b44f5..62500efc6d 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -24,7 +24,7 @@ my @all;
 
 while (my $line = <$in>)
 {
-	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
+	if ($line =~ /^0x(\w+)[ \t]*U\+(\w+)\+(\w+)[ \t]*#(.*)$/)
 	{
 
 		# combined characters
@@ -45,7 +45,7 @@ while (my $line = <$in>)
 			l  => $.
 		  };
 	}
-	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
+	elsif ($line =~ /^0x(\w+)[ \t]*U\+(\w+)[ \t]*#(.*)$/)
 	{
 
 		# non-combined characters
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
index 1d88c0296e..d8bed27e1b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
@@ -80,7 +80,8 @@ foreach my $i (@$ct932)
 	}
 }
 
-foreach my $i (@mapping)
+# extract only SJIS characers
+foreach my $i (grep defined $_->{sjis}, @mapping)
 {
 	my $sjis = $i->{sjis};
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
index b516e91306..025b0d2798 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -24,7 +24,7 @@ my @mapping;
 
 while (my $line = <$in>)
 {
-	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
+	if ($line =~ /^0x(\w+)[ \t]*U\+(\w+)\+(\w+)[ \t]*#(.*)$/)
 	{
 
 		# combined characters
@@ -45,7 +45,7 @@ while (my $line = <$in>)
 			l  => $.
 		  };
 	}
-	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
+	elsif ($line =~ /^0x(\w+)[ \t]*U\+(\w+)[ \t]*#(.*)$/)
 	{
 
 		# non-combined characters
diff --git a/src/backend/utils/mb/Unicode/convutils.pm b/src/backend/utils/mb/Unicode/convutils.pm
index 2f64a12ea1..9d97061c6f 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -380,7 +380,8 @@ sub print_radix_table
 	  {
 		header  => "Dummy map, for invalid values",
 		min_idx => 0,
-		max_idx => $widest_range
+		max_idx => $widest_range,
+		label => "dummy map"
 	  };
 
 	###
@@

Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread Fujii Masao




On 2020/07/14 11:02, movead...@highgo.ca wrote:



When checking each tuple visibility, we always have to get the CSN
corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
there was the suggestion that CSN should be stored in the tuple header
or somewhere (like hint bit) to avoid the overhead by very frequehntly
lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
But this patch doesn't seem to adopt that idea. So did you confirm that
such performance overhead by lookup for CSN SLRU is negligible?

This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline.


This is good news! When I read the past discussions about CSN, my impression
was that the performance overhead by CSN SLRU lookup might become one of
show-stopper for CSN. So I was worring about this issue...



And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary.


Yes, we would need more tests in several cases.



Of course I know that idea has big issue, i.e., there is no enough space
to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
be able to replace XMIN or XMAX with CSN corresponding to them. But
it means that we have to struggle with one more wraparound issue
(CSN wraparound issue). So it's not easy to adopt that idea...



Sorry if this was already discussed and concluded...

I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?


Probably you can find the discussion by searching with the keywords
"CSN" and "hint bit". For example,

https://www.postgresql.org/message-id/CAPpHfdv7BMwGv=ofug3s-jgvfkqhi79pr_zk1wsk-13oz+c...@mail.gmail.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Amit Kapila
On Mon, Jul 13, 2020 at 9:50 PM Peter Geoghegan  wrote:
>
> On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost  wrote:
> > Yes, increasing work_mem isn't unusual, at all.
>
> It's unusual as a way of avoiding OOMs!
>
> > Eh?  That's not at all what it looks like- they were getting OOM's
> > because they set work_mem to be higher than the actual amount of memory
> > they had and the Sort before the GroupAgg was actually trying to use all
> > that memory.  The HashAgg ended up not needing that much memory because
> > the aggregated set wasn't actually that large.  If anything, this shows
> > exactly what Jeff's fine work here is (hopefully) going to give us- the
> > option to plan a HashAgg in such cases, since we can accept spilling to
> > disk if we end up underestimate, or take advantage of that HashAgg
> > being entirely in memory if we overestimate.
>
> I very specifically said that it wasn't a case where something like
> hash_mem would be expected to make all the difference.
>
> > Having looked back, I'm not sure that I'm really in the minority
> > regarding the proposal to add this at this time either- there's been a
> > few different comments that it's too late for v13 and/or that we should
> > see if we actually end up with users seriously complaining about the
> > lack of a separate way to specify the memory for a given node type,
> > and/or that if we're going to do this then we should have a broader set
> > of options covering other nodes types too, all of which are positions
> > that I agree with.
>
> By proposing to do nothing at all, you are very clearly in a small
> minority. While (for example) I might have debated the details with
> David Rowley a lot recently, and you couldn't exactly say that we're
> in agreement, our two positions are nevertheless relatively close
> together.
>
> AFAICT, the only other person that has argued that we should do
> nothing (have no new GUC) is Bruce, which was a while ago now. (Amit
> said something similar, but has since softened his opinion [1]).
>

To be clear, my vote for PG13 is not to do anything till we have clear
evidence of regressions.  In the email you quoted, I was trying to say
that due to parallelism we might not have the problem for which we are
planning to provide an escape-hatch or hash_mem GUC.  I think the
reason for the delay in getting to the agreement is that there is no
clear evidence for the problem (user-reported cases or results of some
benchmarks like TPC-H) unless I have missed something.

Having said that, I understand that we have to reach some conclusion
to close this open item and if the majority of people are in-favor of
escape-hatch or hash_mem solution then we have to do one of those.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-13 Thread Fujii Masao




On 2020/07/14 9:08, Masahiro Ikeda wrote:

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.


Thanks for updating the patch!


+1
I'm interested in these patches and now studying them. While checking
the behaviors of the patched PostgreSQL, I got three comments.

1. We can access to the foreign table even during recovery in the HEAD.
But in the patched version, when I did that, I got the following error.
Is this intentional?

ERROR:  cannot assign TransactionIds during recovery

2. With the patch, when INSERT/UPDATE/DELETE are executed both in
local and remote servers, 2PC is executed at the commit phase. But
when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are
executed in local and INSERT/UPDATE/DELETE are executed in remote,
2PC is NOT executed. Is this safe?

3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE
are executed. But it's not reset even when those queries are canceled by
ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread movead...@highgo.ca

>When checking each tuple visibility, we always have to get the CSN
>corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
>there was the suggestion that CSN should be stored in the tuple header
>or somewhere (like hint bit) to avoid the overhead by very frequehntly
>lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
>But this patch doesn't seem to adopt that idea. So did you confirm that
>such performance overhead by lookup for CSN SLRU is negligible?
This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline. 
And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary. 

>Of course I know that idea has big issue, i.e., there is no enough space
>to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
>be able to replace XMIN or XMAX with CSN corresponding to them. But
>it means that we have to struggle with one more wraparound issue
>(CSN wraparound issue). So it's not easy to adopt that idea...

>Sorry if this was already discussed and concluded...
I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-07-13 Thread Bharath Rupireddy
>
> You could get a backend's PID using PQbackendPID and then kill it by
> calling pg_terminate_backend() to kill the remote backend to automate
> scenario of remote backend being killed.
>

I already added the test case in v2 patch itself(added one more test
case in v3 patch), using the similar approach.

>
> > For, one of the Ashutosh's review comments from [3] "the fact that the
> > same connection may be used by multiple plan nodes", I tried to have
> > few use cases where there exist joins on two foreign tables on the
> > same remote server, in a single query, so essentially, the same
> > connection was used for multiple plan nodes. In this case we avoid
> > retrying for the second GetConnection() request for the second foreign
> > table, with the check entry->xact_depth <= 0 , xact_depth after the
> > first GetConnection() and the first remote query will become 1 and we
> > don't hit the retry logic and seems like we are safe here. Please add
> > If I'm missing something here.
> >
> > Request the community to consider the patch for further review if the
> > overall idea seems beneficial.
>
> I think this idea will be generally useful if your work on dropping
> stale connection uses idle_connection_timeout or something like that
> on the remote server.

Assuming we use idle_connection_timeout or some other means(as it is
not yet finalized, I will try to respond in that mail chain) to drop
stale/idle connections from the local backend, I think we have two
options 1) deleting that cached entry from the connection cache
entirely using disconnect_pg_server() and hash table remove. This
frees up some space and we don't have to deal with the connection
invalidations and don't have to bother on resetting cached entry's
other parameters such as xact_depth, have_prep_stmt etc. 2) or we
could just drop the connections using disconnect_pg_server(), retain
the hash entry, reset other parameters, and deal with the
invalidations. This is like, we maintain unnecessary info in the
cached entry, where we actually don't have a connection at all and
keep holding some space for cached entry.

IMO, if we go with option 1, then it will be good.

Anyways, this connection retry feature will not have any dependency on
the idle_connection_timeout or dropping stale connection feature,
because there can be many reasons where remote backends go away/get
killed.

If I'm not sidetracking - if we use something like
idle_session_timeout [1] on the remote server, this retry feature will
be very useful.

>
> About the patch. It seems we could just catch the error from
> begin_remote_xact() in GetConnection() and retry connection if the
> error is "bad connection". Retrying using PQreset() might be better
> than calling PQConnect* always.
>

Thanks for the comment, it made life easier. Added the patch with the
changes. Please take a look at the v3 patch and let me know if still
something needs to be done.

[1] -
https://www.postgresql.org/message-id/763A0689-F189-459E-946F-F0EC4458980B%40hotmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v3-Retry-Cached-Remote-Connections-For-postgres_fdw.patch
Description: Binary data


Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Amit Langote
On Mon, Jul 13, 2020 at 11:58 PM Bharath Rupireddy
 wrote:
> > I had one small comment:
> > +{
> > +   int copied_bytes = 0;
> > +
> > +#define BUF_BYTES  (cstate->raw_buf_len - cstate->raw_buf_index)
> > +#define DRAIN_COPY_RAW_BUF(cstate, dest, nbytes)\
> > +   do {\
> > +   memcpy((dest), (cstate)->raw_buf +
> > (cstate)->raw_buf_index, (nbytes));\
> > +   (cstate)->raw_buf_index += (nbytes);\
> > +   } while(0)
> >
> > BUF_BYTES could be used in CopyLoadRawBuf function also.
> >
>
> Thanks Vignesh for the find out. I changed and attached the v5 patch.
> The regression tests(make check and make check-world) ran
> successfully.

Good idea, thanks.

In CopyLoadRawBuf(), we could also change the condition if
(cstate->raw_buf_index < cstate->raw_buf_len) to if (BUF_BYTES > 0),
which looks clearer.

Also, if we are going to use the macro more generally, let's make it
look less localized.  For example, rename it to RAW_BUF_BYTES similar
to RAW_BUF_SIZE and place their definitions close by.  It also seems
like a good idea to make 'cstate' a parameter for clarity.

Attached v6.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


CopyFrom-binary-buffering_v6.patch
Description: Binary data


Re: Ideas about a better API for postgres_fdw remote estimates

2020-07-13 Thread Bruce Momjian
On Mon, Jul  6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:
> > Yeah, thinking about it as a function that inspects partial planner
> > results, it might be useful for other purposes besides postgres_fdw.
> > As I said before, I don't think this necessarily has to be bundled as
> > part of postgres_fdw.  That still doesn't make it part of EXPLAIN.
> 
> Providing it as a function rather than through EXPLAIN does make a bit
> more sense if we're going to skip things like the lookups you mention
> above.  I'm still inclined to have it be a part of core rather than
> having it as postgres_fdw though.  I'm not completely against it being
> part of postgres_fdw... but I would think that would really be
> appropriate if it's actually using something in postgres_fdw, but if
> everything that it's doing is part of core and nothing related
> specifically to the postgres FDW, then having it as part of core makes
> more sense to me.  Also, having it as part of core would make it more
> appropriate for other tools to look at and adding that kind of
> inspection capability for partial planner results could be very
> interesting for tools like pgAdmin and such.

I agree the statistics extraction should probably be part of core. 
There is the goal if FDWs returning data, and returning the data
quickly.  I think we can require all-new FDW servers to get improved
performance.  I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet.  I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.

On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well.  Getting multiple costs for a query goes in that direction.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Fujii Masao




On 2020/07/14 9:41, Robert Haas wrote:

On Mon, Jul 13, 2020 at 6:15 PM Tom Lane  wrote:

Yeah, I don't care for that either.  That's a pretty huge violation of our
normal back-patching rules, and I'm not convinced that it's justified.


I think that our normal back-patching rules are based primarily on the
risk of breaking things, and a new contrib module carries a pretty
negligible risk of breaking anything that works today. I wouldn't
propose to back-patch something on those grounds just as a way of
delivering a new feature more quickly, but that's not the intention
here. At least in my experience, un-VACUUM-able tables have gotten
several orders of magnitude more common since Andres put those changes
in. As far as I can recall, EDB has not had this many instances of
different customers reporting the same problem since the 9.3-era
multixact issues. So far, this does not rise to that level, but it is
by no means a negligible issue, either. I believe it deserves to be
taken quite seriously, especially because the existing options for
helping customers with this kind of problem are so limited.

Now, if this goes into v14, we can certainly stick it up on github, or
put it out there in some other way for users to download,
self-compile, and install, but that seems noticeably less convenient
for people who need it, and I'm not clear what the benefit to the
project is.


But updating this tool can fit to the release schedule and
policy of PostgreSQL?

While investigating the problem by using this tool, we may want to
add new feature into the tool because it's necessary for the investigation.
But users would need to wait for next minor version release, to use this
new feature.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jul 13, 2020 at 8:58 PM Tom Lane  wrote:
>> The more that I think about it, the more I think that the proposed
>> functions are tools for wizards only, and so I'm getting hesitant
>> about having them in contrib at all.  We lack a better place to
>> put them, but that doesn't mean they should be there.

> I understand that it's not too great when we give people access to
> sharp tools and they hurt themselves with said tools. But this is open
> source. That's how it goes.

I think you're attacking a straw man.  I'm well aware of how open source
works, thanks.  What I'm saying is that contrib is mostly seen to be
reasonably harmless stuff.  Sure, you can overwrite data you didn't want
to with adminpack's pg_file_write.  But that's the price of having such a
capability at all, and in general it's not hard for users to understand
both the uses and risks of that function.  That statement does not apply
to the functions being proposed here.  It doesn't seem like they could
possibly be safe to use without very specific expert advice --- and even
then, we're talking rather small values of "safe".  So I wish we had some
other way to distribute them than via contrib.

regards, tom lane




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
On Mon, Jul 13, 2020 at 9:10 PM Andres Freund  wrote:
> > What if clog has been truncated so that the xmin can't be looked up?
>
> That's possible, but probably only in cases where xmin actually
> committed.

Isn't that the normal case? I'm imagining something like:

- Tuple gets inserted. Transaction commits.
- VACUUM processes table.
- Mischievous fairies mark page all-visible in the visibility map.
- VACUUM runs lots more times, relfrozenxid advances, but without ever
looking at the page in question, because it's all-visible.
- clog is truncated, rendering xmin no longer accessible.
- User runs VACUUM disabling page skipping, gets ERROR.
- User deletes offending tuple.
- At this point, I think the tuple is both invisible and unprunable?
- Fairies happy, user sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
On Mon, Jul 13, 2020 at 8:58 PM Tom Lane  wrote:
> The more that I think about it, the more I think that the proposed
> functions are tools for wizards only, and so I'm getting hesitant
> about having them in contrib at all.  We lack a better place to
> put them, but that doesn't mean they should be there.

Also, I want to clarify that in a typical situation in which a
customer is facing this problem, I don't have any access to their
system. I basically never touch customer systems directly. Typically,
the customer sends us log files and a description of the problem and
their goals, and we send them back advice or instructions. So it's
impractical to imagine that this can be something where you have to
know the secret magic wizard password to get access to it. We'd just
have to give the customers who need to use this tool said password,
and then the jig is up - they can redistribute that password to all
the non-wizards on the Internet, if they so choose.

I understand that it's not too great when we give people access to
sharp tools and they hurt themselves with said tools. But this is open
source. That's how it goes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Andres Freund
Hi,

On 2020-07-13 20:47:10 -0400, Robert Haas wrote:
> On Mon, Jul 13, 2020 at 6:38 PM Andres Freund  wrote:
> > Not fully, I'm afraid. Afaict it doesn't currently tell you the item
> > pointer offset, just the block numer, right? We probably should extend
> > it to also include the offset...
>
> Oh, I hadn't realized that limitation. That would be good to fix.

Yea. And it'd even be good if we were to to end up implementing your
suggestion below about continuing vacuuming other tuples.


> It would be even better, I think, if we could have VACUUM proceed with
> the rest of vacuuming the table, emitting warnings about each
> instance, instead of blowing up when it hits the first bad tuple, but
> I think you may have told me sometime that doing so would be, uh, less
> than straightforward.

Yea, it's not that simple to implement. Not impossible either.


> We probably should refuse to update relfrozenxid/relminmxid when this
> is happening, but I *think* it would be better to still proceed with
> dead tuple cleanup as far as we can, or at least have an option to
> enable that behavior. I'm not positive about that, but not being able
> to complete VACUUM at all is a FAR more urgent problem than not being
> able to freeze, even though in the long run the latter is more severe.

I'm hesitant to default to removing tuples once we've figured out that
something is seriously wrong. Could easy enough make us plow ahead and
delete valuable data on other tuples, even if we'd already detected
there's a problem. But I also see the problem you raise. That's not
academic, a number of multixact corruption issues the checks detected
IIRC weren't guaranteed to be caught.


> > > 2. In some other, similar situations, e.g. where the tuple data is
> > > garbled, it's often possible to get out from under the problem by
> > > deleting the tuple at issue. But I think that doesn't necessarily fix
> > > anything in this case.
> >
> > Huh, why not? That worked in the cases I saw.
>
> I'm not sure I've seen a case where that didn't work, but I don't see
> a reason why it couldn't happen. Do you think the code is structured
> in such a way that a deleted tuple is guaranteed to be pruned even if
> the XID is old?

I think so, leaving aside some temporary situations perhaps.


> What if clog has been truncated so that the xmin can't be looked up?

That's possible, but probably only in cases where xmin actually
committed.

Greetings,

Andres Freund




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
On Mon, Jul 13, 2020 at 8:58 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Oh, I hadn't realized that limitation. That would be good to fix. It
> > would be even better, I think, if we could have VACUUM proceed with
> > the rest of vacuuming the table, emitting warnings about each
> > instance, instead of blowing up when it hits the first bad tuple, but
> > I think you may have told me sometime that doing so would be, uh, less
> > than straightforward. We probably should refuse to update
> > relfrozenxid/relminmxid when this is happening, but I *think* it would
> > be better to still proceed with dead tuple cleanup as far as we can,
> > or at least have an option to enable that behavior. I'm not positive
> > about that, but not being able to complete VACUUM at all is a FAR more
> > urgent problem than not being able to freeze, even though in the long
> > run the latter is more severe.
>
> +1 for proceeding in this direction, rather than handing users tools
> that they *will* hurt themselves with.
>
> The more that I think about it, the more I think that the proposed
> functions are tools for wizards only, and so I'm getting hesitant
> about having them in contrib at all.  We lack a better place to
> put them, but that doesn't mean they should be there.

It's not an either/or; it's a both/and. To recover from this problem,
you need to:

1. Be able to tell which tuples are affected.
2. Do something about it.

I think there are a number of strategies that we could pursue around
either of those things, and there are better and worse ways of
accomplishing them, but having one without the other isn't too great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Tom Lane
Robert Haas  writes:
> Oh, I hadn't realized that limitation. That would be good to fix. It
> would be even better, I think, if we could have VACUUM proceed with
> the rest of vacuuming the table, emitting warnings about each
> instance, instead of blowing up when it hits the first bad tuple, but
> I think you may have told me sometime that doing so would be, uh, less
> than straightforward. We probably should refuse to update
> relfrozenxid/relminmxid when this is happening, but I *think* it would
> be better to still proceed with dead tuple cleanup as far as we can,
> or at least have an option to enable that behavior. I'm not positive
> about that, but not being able to complete VACUUM at all is a FAR more
> urgent problem than not being able to freeze, even though in the long
> run the latter is more severe.

+1 for proceeding in this direction, rather than handing users tools
that they *will* hurt themselves with.

The more that I think about it, the more I think that the proposed
functions are tools for wizards only, and so I'm getting hesitant
about having them in contrib at all.  We lack a better place to
put them, but that doesn't mean they should be there.

regards, tom lane




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
On Mon, Jul 13, 2020 at 6:38 PM Andres Freund  wrote:
> Not fully, I'm afraid. Afaict it doesn't currently tell you the item
> pointer offset, just the block numer, right? We probably should extend
> it to also include the offset...

Oh, I hadn't realized that limitation. That would be good to fix. It
would be even better, I think, if we could have VACUUM proceed with
the rest of vacuuming the table, emitting warnings about each
instance, instead of blowing up when it hits the first bad tuple, but
I think you may have told me sometime that doing so would be, uh, less
than straightforward. We probably should refuse to update
relfrozenxid/relminmxid when this is happening, but I *think* it would
be better to still proceed with dead tuple cleanup as far as we can,
or at least have an option to enable that behavior. I'm not positive
about that, but not being able to complete VACUUM at all is a FAR more
urgent problem than not being able to freeze, even though in the long
run the latter is more severe.

> > 2. In some other, similar situations, e.g. where the tuple data is
> > garbled, it's often possible to get out from under the problem by
> > deleting the tuple at issue. But I think that doesn't necessarily fix
> > anything in this case.
>
> Huh, why not? That worked in the cases I saw.

I'm not sure I've seen a case where that didn't work, but I don't see
a reason why it couldn't happen. Do you think the code is structured
in such a way that a deleted tuple is guaranteed to be pruned even if
the XID is old? What if clog has been truncated so that the xmin can't
be looked up?

> xmax is among the problematic cases IIRC, so yes, it'd be good to fix
> that.

Thanks for the input.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
On Mon, Jul 13, 2020 at 6:15 PM Tom Lane  wrote:
> Yeah, I don't care for that either.  That's a pretty huge violation of our
> normal back-patching rules, and I'm not convinced that it's justified.

I think that our normal back-patching rules are based primarily on the
risk of breaking things, and a new contrib module carries a pretty
negligible risk of breaking anything that works today. I wouldn't
propose to back-patch something on those grounds just as a way of
delivering a new feature more quickly, but that's not the intention
here. At least in my experience, un-VACUUM-able tables have gotten
several orders of magnitude more common since Andres put those changes
in. As far as I can recall, EDB has not had this many instances of
different customers reporting the same problem since the 9.3-era
multixact issues. So far, this does not rise to that level, but it is
by no means a negligible issue, either. I believe it deserves to be
taken quite seriously, especially because the existing options for
helping customers with this kind of problem are so limited.

Now, if this goes into v14, we can certainly stick it up on github, or
put it out there in some other way for users to download,
self-compile, and install, but that seems noticeably less convenient
for people who need it, and I'm not clear what the benefit to the
project is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transactions involving multiple postgres foreign servers, take 2

2020-07-13 Thread Masahiro Ikeda

I've attached the latest version patches. I've incorporated the review
comments I got so far and improved locking strategy.


Thanks for updating the patch!
I have three questions about the v23 patches.


1. messages related to user canceling

In my understanding, there are two messages
which can be output when a user cancels the COMMIT command.

A. When prepare is failed, the output shows that
   committed locally but some error is occurred.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user 
request
DETAIL:  The transaction has already committed locally, but might not 
have been committed on the foreign server.

ERROR:  server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
CONTEXT:  remote SQL command: PREPARE TRANSACTION 
'fx_1020791818_519_16399_10'

```

B. When prepare is succeeded,
   the output show that committed locally.

```
postgres=*# COMMIT;
^CCancel request sent
WARNING:  canceling wait for resolving foreign transaction due to user 
request
DETAIL:  The transaction has already committed locally, but might not 
have been committed on the foreign server.

COMMIT
```

In case of A, I think that "committed locally" message can confuse user.
Because although messages show committed but the transaction is 
"ABORTED".


I think "committed" message means that "ABORT" is committed locally.
But is there a possibility of misunderstanding?

In case of A, it's better to change message for user friendly, isn't it?


2. typo

Is trasnactions in fdwxact.c typo?


3. FdwXactGetWaiter in fdwxact.c return unused value

FdwXactGetWaiter is called in FXRslvLoop function.
It returns *waitXid_p, but FXRslvloop doesn't seem to
use *waitXid_p. Do we need to return it?


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Stale external URL in doc?

2020-07-13 Thread Kyotaro Horiguchi
At Mon, 13 Jul 2020 11:36:17 +0200, Daniel Gustafsson  wrote 
in 
> > On 11 Jul 2020, at 05:25, Thomas Munro  wrote:
> 
> > Is it OK that I see the following warning many times when running
> > "make" under src/backend/utils/mb/Unicode?  It looks like this code is
> > from commit 1de9cc0d.  Horiguchi-san, do you think something changed
> > (input data format, etc) since you wrote it, or maybe some later
> > changes just made our perl scripts more picky about warnings?
> > 
> >  Use of uninitialized value $val in printf at convutils.pm line 612.
> 
> Confirmed here as well, combined with the below ones for a few of the files:
> 
> Use of uninitialized value in hash element at convutils.pm line 448.
> Use of uninitialized value $b1root in printf at convutils.pm line 558.
> Use of uninitialized value $b1_lower in printf at convutils.pm line 560.

Mmm. I see the same, too. I'm looking into that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Andres Freund
Hi,

On 2020-07-13 17:12:18 -0400, Robert Haas wrote:
> 1. There's nothing to identify the tuple that has the problem, and no
> way to know how many more of them there might be. Back-patching
> b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> part of this.

Not fully, I'm afraid. Afaict it doesn't currently tell you the item
pointer offset, just the block numer, right? We probably should extend
it to also include the offset...


> 2. In some other, similar situations, e.g. where the tuple data is
> garbled, it's often possible to get out from under the problem by
> deleting the tuple at issue. But I think that doesn't necessarily fix
> anything in this case.

Huh, why not? That worked in the cases I saw.


> Therefore, one of my colleagues has - at my request - created a couple
> of functions called heap_force_kill() and heap_force_freeze() which
> take an array of TIDs. The former truncates them all to dead line
> pointers. The latter resets the infomask and xmin to make the xmin
> frozen. (It should probably handle the xmax too; not sure that the
> current version does that, but it's easily fixed if not.)

xmax is among the problematic cases IIRC, so yes, it'd be good to fix
that.

Greetings,

Andres Freund




Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2020-07-13 Thread David G. Johnston
On Mon, Jul 13, 2020 at 2:12 AM Pavel Stehule 
wrote:

> I am reading this patch. I don't think so text for domains and types are
> correct (or minimally it is little bit messy)
> This case is a little bit more complex - domains are not subset of
> relations. But relations (in Postgres) extends types.
>

Yeah, though in further working on this I dislike the saying "A composite
type is a relation" (see Glossary and probably other spots).  That a table
auto-creates a separate composite type, and depends on it, manifests a
certain link between the two but the type that represents the table is not
a relation as it doesn't hold data, it is just a definition.  If a
composite type were a relation then whatever argument you use to justify
that would seem to apply to non-composite types as well.

I'm attaching version 2 as a plain diff (complete) instead of a patch.

New with this version is the addition of tests for drop domain and drop
type, and related documentation changes.  Notably pointing out the fact
that DROP TYPE drops all types, including domains.

To recap, the interesting relation related behaviors these tests
demonstrate are:

A non-failure while performing a DROP "relation" IF EXISTS command means
that a subsequent CREATE "relation" command will not fail due to the name
already existing (other failures are of course possible).

In the presence of multiple schemas a failure of a DROP "relation" IF
EXISTS command does not necessarily mean that an corresponding CREATE
"relation" command would fail - the found entry could belong to a non-first
schema on the search_path while the creation will place the newly created
object always on the first schema.

The plain meaning of the opposite of "DROP IF EXISTS" (i.e., it's not an
error if the specified object doesn't exist, just move on) is not what
actually happens but rather we provide an additional test related to
namespace occupation that is now documented.

The latter two items are explicitly documented while the first is implicit
and self-evident.

David J.


drop-if-exists-docs-and-tasks-v2.diff
Description: Binary data


Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 2:12 PM Robert Haas  wrote:
> 1. There's nothing to identify the tuple that has the problem, and no
> way to know how many more of them there might be. Back-patching
> b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> part of this.

I am in favor of backpatching such changes in cases where senior
community members feel that it could help with hypothetical
undiscovered data corruption issues -- if they're willing to take
responsibility for the change. It certainly wouldn't be the first
time. A "defense in depth" mindset seems like the right one when it
comes to data corruption bugs. Early detection is really important.

> Moreover, not everyone is as
> interested in an extended debugging exercise as they are in getting
> the system working again, and VACUUM failing repeatedly is a pretty
> serious problem.

That's absolutely consistent with my experience. Most users want to
get back to business as usual now, while letting somebody else do the
hard work of debugging.

> Therefore, one of my colleagues has - at my request - created a couple
> of functions called heap_force_kill() and heap_force_freeze() which
> take an array of TIDs.

> So I have these questions:
>
> - Do people think it would me smart/good/useful to include something
> like this in PostgreSQL?

I'm in favor of it.

> - If so, how? I would propose a new contrib module that we back-patch
> all the way, because the VACUUM errors were back-patched all the way,
> and there seems to be no advantage in making people wait 5 years for a
> new version that has some kind of tooling in this area.

I'm in favor of it being *possible* to backpatch tooling that is
clearly related to correctness in a fundamental way. Obviously this
would mean that we'd be revising our general position on backpatching
to allow some limited exceptions around corruption. I'm not sure that
this meets that standard, though. It's hardly something that we can
expect all that many users to be able to use effectively.

I may be biased, but I'd be inclined to permit it in the case of
something like amcheck, or pg_visibility, on the grounds that they're
more or less the same as the new VACUUM errcontext instrumentation you
mentioned. The same cannot be said of something like this new
heap_force_kill() stuff.

> - Any ideas for additional things we should include, or improvements
> on the sketch above?

Clearly you should work out a way of making it very hard to
accidentally (mis)use. For example, maybe you make the functions check
for the presence of a sentinel file in the data directory.


--
Peter Geoghegan




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Tom Lane
Stephen Frost  writes:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> - If so, how? I would propose a new contrib module that we back-patch
>> all the way, because the VACUUM errors were back-patched all the way,
>> and there seems to be no advantage in making people wait 5 years for a
>> new version that has some kind of tooling in this area.

> While I agree that this would be a good and useful new contrib module to
> have, I don't think it would be appropriate to back-patch it into PG
> formally.

Yeah, I don't care for that either.  That's a pretty huge violation of our
normal back-patching rules, and I'm not convinced that it's justified.

No objection to adding it as a new contrib module.

regards, tom lane




Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> - Do people think it would me smart/good/useful to include something
> like this in PostgreSQL?

Absolutely, yes.

> - If so, how? I would propose a new contrib module that we back-patch
> all the way, because the VACUUM errors were back-patched all the way,
> and there seems to be no advantage in making people wait 5 years for a
> new version that has some kind of tooling in this area.

While I agree that this would be a good and useful new contrib module to
have, I don't think it would be appropriate to back-patch it into PG
formally.

Unfortunately, that gets into the discussion that's cropped up on a few
other threads of late- that we don't have a good place to put extensions
which are well maintained/recommended by core PG hackers, and which are
able to work with lots of different versions of PG, and are versioned
and released independently of PG (and, ideally, built for all the
versions of PG that we distribute through our packages).

Given the lack of such a place today, I'd at least suggest starting with
proposing it as a new contrib module for v14.

> - Any ideas for additional things we should include, or improvements
> on the sketch above?

Not right off-hand, but will think about it, there could certainly be a
lot of very interesting tools in such a toolbox.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pg_dump bug for extension owned tables

2020-07-13 Thread Fabrízio de Royes Mello
On Mon, Jul 13, 2020 at 5:05 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
>
>
> On 7/13/20 2:46 PM, Fabrízio de Royes Mello wrote:
> >
> >
> > >
> > > yeah, that's the fix I came up with too. The only thing I added was
> > > "Assert(tbinfo->attgenerated);" at about line 2097.
> > >
> >
> > Cool.
> >
> > >
> > > Will wait for your TAP tests.
> > >
> >
> > Actually I've sent it already but it seems to have gone to the
> > moderation queue.
> >
> > Anyway attached with your assertion and TAP tests.
> >
> >
>
>
>
> Thanks, that all seems fine. The TAP test changes are a bit of a pain in
> the neck before release 11, so I think I'll just do those back that far,
> but the main fix for all live branches.
>

Sounds good to me.

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: Proposal: Automatic partition creation

2020-07-13 Thread Anastasia Lubennikova

On 06.07.2020 17:59, Justin Pryzby wrote:

I think you'd want to have an
ALTER command for that (we would use that to change tables between
daily/monthly based on their current size).  That should also support setting
the MODULUS of a HASH partitioned table, to allow changing the size of its
partitions (currently, the user would have to more or less recreate the table
and move all its data into different partitions, but that's not ideal).

New syntax fits to the ALTER command as well.

ALTER TABLE tbl
PARTITION BY HASH (number)
USING (partition_desc)

In simple cases (i.e. range partitioning granularity), it will simply 
update

the rule of bound generation, saved in the catalog. More complex hash
partitions will require some rebalancing. Though, the syntax is pretty
straightforward for all cases. In the next versions, we can also add a
CONCURRENTLY keyword to cover partitioning of an existing 
non-partitioned table

with data.


I don't know if it's important for anyone, but it would be interesting to think
about supporting sub-partitioning: partitions which are themselvese partitioned.
Like something => something_ => something__MM => something__MM_DD.
You'd need to specify how to partition each layer of the heirarchy.  In the
most general case, it could be different partition strategy.


I suppose it will be a natural extension of this work. Now we need to 
ensure
that the proposed syntax is extensible. Greenplum syntax, which I choose 
as an

example, provides subpartition syntax as well.


If you have a callback function for partition renaming, I think you'd want to
pass it not just the current name of the partition, but also the "VALUES" used
in partition creation.  Like (2020-04-05)TO(2020-05-06).  Maybe instead, we'd
allow setting a "format" to use to construct the partition name.  Like
"child.foo_bar_%Y_%m_%d".  Ideally, the formats would be fixed-length
(zero-padded, etc), so failures with length can happen at "parse" time of the
statement and not at "run" time of the creation.  You'd still have to handle
the case that the name already exists but isn't a partition (or is a partition
by doesn't handle the incoming tuple for some reason).


In callback design, I want to use the best from pg_pathman's 
set_init_callback().

The function accepts jsonb argument, which contains all the data about the
parent table, bounds, and so on. This information can be used to 
construct name

for the partition and generate RENAME statement.


Also, maybe your "configuration" syntax would allow specifying other values.
Maybe including a retention period (as an INTERVAL for RANGE tables).  That's
useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE.

In this version, I got rid of the 'configuration' keyword. Speaking of
retention, I think that it would be hard to cover all use-cases with a
declarative syntax. While it is relatively easy to implement deletion 
within a

callback function. See rotation_callback example in pg_pathman [1].

[1] 
https://github.com/postgrespro/pg_pathman/blob/79e11d94a147095f6e131e980033018c449f8e2e/sql/pathman_callbacks.sql#L107 



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Proposal: Automatic partition creation

2020-07-13 Thread Anastasia Lubennikova

On 06.07.2020 13:45, Anastasia Lubennikova wrote:
The previous discussion of automatic partition creation [1] has 
addressed static and dynamic creation of partitions and ended up with 
several syntax proposals.

In this thread, I want to continue this work.

...
[1] 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre


Syntax proposal v2, that takes into account received feedback.

I compared the syntax of other databases. You can find an overview here 
[1]. It

seems that there is no industry standard, so every DBMS has its own
implementation. I decided to rely on a Greenplum syntax, as the most 
similar to

the original PostgreSQL syntax.

New proposal is:

CREATE TABLE numbers(int number)
PARTITION BY partition_method (list_of_columns)
USING (partition_desc)

where partition_desc is:

MODULUS n
| VALUES IN (value_list), [DEFAULT PARTITION part_name]
| START ([datatype] 'start_value')
  END ([datatype] 'end_value')
  EVERY (partition_step), [DEFAULT PARTITION part_name]

where partition_step is:
[datatype] [number | INTERVAL] 'interval_value'
 
example:


CREATE TABLE years(int year)
PARTITION BY RANGE (year)
USING
(START (2006) END (2016) EVERY (1),
DEFAULT PARTITION other_years);

It is less wordy than the previous version. It uses a free keyword option
style. It covers static partitioning for all methods, default partition for
list and range methods, and can be extended to implement dynamic 
partitioning

for range partitions.

[1] 
https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Other_DBMS
[2] 
https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements#Proposal_.28is_subject_to_change.29


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



recovering from "found xmin ... from before relfrozenxid ..."

2020-07-13 Thread Robert Haas
Hi,

A number of EDB customers have had this error crop on their tables for
reasons that we have usually not been able to determine. In many
cases, it's probably down to things like running buggy old releases
for a long time before upgrading, or bad backup and recovery
procedures. It's more than possible that there are still-unfixed
server bugs, but I do not have any compelling evidence of such bugs at
this time. Unfortunately, once you're in this situation, it's kind of
hard to find your way out of it. There are a few problems:

1. There's nothing to identify the tuple that has the problem, and no
way to know how many more of them there might be. Back-patching
b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
part of this.

2. In some other, similar situations, e.g. where the tuple data is
garbled, it's often possible to get out from under the problem by
deleting the tuple at issue. But I think that doesn't necessarily fix
anything in this case.

3. We've had some success with using a PL/plgsql loop with an
EXCEPTION block to extract all the accessible tuples from the table.
Then you can truncate the original table and reinsert the data. But
this is slow, so it stinks if the table is big, and it's not a viable
approach if the table in question is a system catalog table -- at
least if it's not if it's something critical like pg_class.

I realize somebody's probably going to say "well, you shouldn't try to
repair a database that's in this state, you shouldn't let it happen in
the first place, and if it does happen, you should track the root
cause to the ends of the earth." But I think that's a completely
impractical approach. I at least have no idea how I'm supposed to
figure out when and how a bad relfrozenxid ended up in the table, and
by the time the problem is discovered after an upgrade the problem
that caused it may be quite old. Moreover, not everyone is as
interested in an extended debugging exercise as they are in getting
the system working again, and VACUUM failing repeatedly is a pretty
serious problem.

Therefore, one of my colleagues has - at my request - created a couple
of functions called heap_force_kill() and heap_force_freeze() which
take an array of TIDs. The former truncates them all to dead line
pointers. The latter resets the infomask and xmin to make the xmin
frozen. (It should probably handle the xmax too; not sure that the
current version does that, but it's easily fixed if not.) The
intention is that you can use these to get either get rid of, or get
access to, tuples whose visibility information is corrupted for
whatever reason. These are pretty sharp tools; you could corrupt a
perfectly-good table by incautious use of them, or destroy a large
amount of data. You could, for example, force-freeze a tuple created
by a transaction which added a column, inserted data, and rolled back;
that would likely be disastrous. However, in the cases that I'm
thinking about, disaster has already struck, and something that you
can use to get things back to a saner state is better than just
leaving the table perpetually broken. Without something like this, the
backup plan is probably to shut down the server and try to edit the
pages using a perl script or something, but that seems clearly worse.

So I have these questions:

- Do people think it would me smart/good/useful to include something
like this in PostgreSQL?

- If so, how? I would propose a new contrib module that we back-patch
all the way, because the VACUUM errors were back-patched all the way,
and there seems to be no advantage in making people wait 5 years for a
new version that has some kind of tooling in this area.

- Any ideas for additional things we should include, or improvements
on the sketch above?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane
"David G. Johnston"  writes:
> To be clear, by "escape hatch" you mean "add a GUC that instructs the
> PostgreSQL executor to ignore hash_mem when deciding whether to spill the
> contents of the hash table to disk - IOW to never spill the contents of a
> hash table to disk"?  If so that seems separate from whether to add a
> hash_mem GUC to provide finer grained control - people may well want both.

If we define the problem narrowly as "allow people to get back exactly
the pre-v13 behavior", then yeah you'd need an escape hatch of that
sort.  We should not, however, forget that the pre-v13 behavior is
pretty darn problematic.  It's hard to see why anyone would really
want to get back exactly "never spill even if it leads to OOM".

The proposals for allowing a higher-than-work_mem, but not infinite,
spill boundary seem to me to be a reasonable way to accommodate cases
where the old behavior is accidentally preferable to what v13 does
right now.  Moreover, such a knob seems potentially useful in its
own right, at least as a stopgap until we figure out how to generalize
or remove work_mem.  (Which might be a long time.)

I'm not unalterably opposed to providing an escape hatch of the other
sort, but right now I think the evidence for needing it isn't there.
If we get field complaints that can't be resolved with the "raise the
spill threshold by X" approach, we could reconsider.  But that approach
seems a whole lot less brittle than "raise the spill threshold to
infinity", so I think we should start with the former type of fix.

regards, tom lane




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 12:57 PM David G. Johnston
 wrote:
> To be clear, by "escape hatch" you mean "add a GUC that instructs the 
> PostgreSQL executor to ignore hash_mem when deciding whether to spill the 
> contents of the hash table to disk - IOW to never spill the contents of a 
> hash table to disk"?

Yes, that's what that means.

> If so that seems separate from whether to add a hash_mem GUC to provide finer 
> grained control - people may well want both.

They might want the escape hatch too, as an additional measure, but my
assumption is that anybody in favor of the
hash_mem/hash_mem_multiplier proposal takes that position because they
think that it's the principled solution. That's the kind of subtlety
that is bound to get lost when summarizing general sentiment at a high
level. In any case no individual has seriously argued that there is a
simultaneous need for both -- at least not yet.

This thread is already enormous, and very hard to keep up with. I'm
trying to draw a line under the discussion. For my part, I have
compromised on the important question of the default value of
hash_mem_multiplier -- I am writing a new version of the patch that
makes the default 1.0 (i.e. no behavioral changes by default).

> I would prefer DavidJ as an abbreviation - my middle initial can be dropped 
> when referring to me.

Sorry about that.

-- 
Peter Geoghegan




Re: pg_dump bug for extension owned tables

2020-07-13 Thread Andrew Dunstan


On 7/13/20 2:46 PM, Fabrízio de Royes Mello wrote:
>
>
> >
> > yeah, that's the fix I came up with too. The only thing I added was
> > "Assert(tbinfo->attgenerated);" at about line 2097.
> >
>
> Cool.
>
> >
> > Will wait for your TAP tests.
> >
>
> Actually I've sent it already but it seems to have gone to the
> moderation queue.
>
> Anyway attached with your assertion and TAP tests.
>
>



Thanks, that all seems fine. The TAP test changes are a bit of a pain in
the neck before release 11, so I think I'll just do those back that far,
but the main fix for all live branches.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David G. Johnston
On Mon, Jul 13, 2020 at 11:50 AM Peter Geoghegan  wrote:

>
> Primarily in favor of escape hatch:
>
> Jeff,
> DavidR,
> Pavel,
> Andres,
> Robert ??,
> Amit ??
>
>
To be clear, by "escape hatch" you mean "add a GUC that instructs the
PostgreSQL executor to ignore hash_mem when deciding whether to spill the
contents of the hash table to disk - IOW to never spill the contents of a
hash table to disk"?  If so that seems separate from whether to add a
hash_mem GUC to provide finer grained control - people may well want both.

Primarily in favor of hash_mem/hash_mem_multiplier:
>
> PeterG,
> Tom,
> Alvaro,
> Tomas,
> Justin,
> DavidG,
> Jonathan Katz
>
>
I would prefer DavidJ as an abbreviation - my middle initial can be dropped
when referring to me.

David J.


Re: pg_dump bug for extension owned tables

2020-07-13 Thread Fabrízio de Royes Mello
On Mon, Jul 13, 2020 at 3:29 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
>
>
> On 7/13/20 10:52 AM, Fabrízio de Royes Mello wrote:
> >
> > On Sat, Jul 11, 2020 at 8:07 PM Andrew Dunstan
> >  > > wrote:
> > >
> > >
> > > On 6/26/20 2:10 PM, Fabrízio de Royes Mello wrote:
> > > >
> > > > On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello
> > > > mailto:fabriziome...@gmail.com>
> > >>
wrote:
> > > > >
> > > > >
> > > > > On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan
> > > >  > 
> > > >  > >> wrote:
> > > > > >
> > > > > >
> > > > > > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > > > > > It appears that for extension owned tables
> > tbinfo.attgenerated isn't
> > > > > > > being properly populated, so line 2050 in REL_12_STABLE, which
> > > > is line
> > > > > > > 2109 in git tip, is failing.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> > > > > >
> > > > >
> > > > > Having a look on it.
> > > > >
> > > >
> > > > Seems when qualify the schemaname the the "tbinfo->interesting"
field
> > > > is not setted for extensions objects, so the getTableAttrs can't
fill
> > > > the attgenerated field properly.
> > > >
> > > > I'm not 100% sure it's the correct way but the attached patch works
> > > > for me and all tests passed. Maybe we should add more TAP tests?
> > > >
> > > >
> > >
> > >
> > > I just tried this patch out on master, with the test case I gave
> > > upthread. It's not working, still getting a segfault.
> > >
> >
> > Ohh really sorry about it... my bad... i completely forgot about it!!!
> >
> > Due to my rush I ended up adding the wrong patch version. Attached the
> > correct version.
> >
> > Will add TAP tests to src/test/modules/test_pg_dump
>
>
> yeah, that's the fix I came up with too. The only thing I added was
> "Assert(tbinfo->attgenerated);" at about line 2097.
>

Cool.

>
> Will wait for your TAP tests.
>

Actually I've sent it already but it seems to have gone to the moderation
queue.

Anyway attached with your assertion and TAP tests.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e758b5c50a..baacd7af63 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2094,6 +2094,8 @@ dumpTableData_insert(Archive *fout, void *dcontext)
 			if (nfields == 0)
 continue;
 
+			Assert(tbinfo->attgenerated);
+
 			/* Emit a row heading */
 			if (rows_per_statement == 1)
 archputs(" (", fout);
@@ -18037,6 +18039,8 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
 			configtbl->dataObj->filtercond = pg_strdup(extconditionarray[j]);
 	}
 }
+
+configtbl->interesting = dumpobj;
 			}
 		}
 		if (extconfigarray)
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index ae120a5ee3..78aa07ce51 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -135,6 +135,12 @@ my %pgdump_runs = (
 			"$tempdir/defaults_tar_format.tar",
 		],
 	},
+	extension_schema => {
+		dump_cmd => [
+			'pg_dump', '--schema=public', '--inserts',
+			"--file=$tempdir/extension_schema.sql", 'postgres',
+		],
+	},
 	pg_dumpall_globals => {
 		dump_cmd => [
 			'pg_dumpall', '--no-sync',
@@ -301,8 +307,9 @@ my %tests = (
 			\n/xm,
 		like => {
 			%full_runs,
-			data_only=> 1,
-			section_data => 1,
+			data_only=> 1,
+			section_data => 1,
+			extension_schema => 1,
 		},
 	},
 
@@ -536,6 +543,7 @@ my %tests = (
 		like   => {%pgdump_runs},
 		unlike => {
 			data_only  => 1,
+			extension_schema   => 1,
 			pg_dumpall_globals => 1,
 			section_data   => 1,
 			section_pre_data   => 1,
@@ -549,6 +557,7 @@ my %tests = (
 		like   => {%pgdump_runs},
 		unlike => {
 			data_only  => 1,
+			extension_schema   => 1,
 			pg_dumpall_globals => 1,
 			section_data   => 1,
 			section_pre_data   => 1,
@@ -569,6 +578,17 @@ my %tests = (
 			schema_only  => 1,
 			section_pre_data => 1,
 		},
+	},
+
+	# Dumpable object inside specific schema
+	'INSERT INTO public.regress_table_dumpable VALUES (1);' => {
+		create_sql   => 'INSERT INTO public.regress_table_dumpable VALUES (1);',
+		regexp   => qr/^
+			\QINSERT INTO public.regress_table_dumpable VALUES (1);\E
+			\n/xm,
+		like => {
+			extension_schema => 1,
+		},
 	},);
 
 #
diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
index 3ed007a7b1..90e461ed35 100644
--- a/src

Partitioning and postgres_fdw optimisations for multi-tenancy

2020-07-13 Thread Alexey Kondratov

Hi Hackers,

The idea of achieving Postgres scaling via sharding using postgres_fdw + 
partitioning got a lot of attention last years. Many optimisations have 
been done in this direction: partition pruning, partition-wise 
aggregates / joins, postgres_fdw push-down of LIMIT, GROUP BY, etc. In 
many cases they work really nice.


However, still there is a vast case, where postgres_fdw + native 
partitioning doesn't perform so good — Multi-tenant architecture. From 
the database perspective it is presented well in this Citus tutorial 
[1]. The main idea is that there is a number of tables and all of them 
are sharded / partitioned by the same key, e.g. company_id. That way, if 
every company mostly works within its own data, then every query may be 
effectively executed on a single node without a need for an internode 
communication.


I built a simple two node multi-tenant schema for tests, which can be 
easily set up with attached scripts. It creates three tables (companies, 
users, documents) distributed over two nodes. Everything can be found in 
this Gist [2] as well.


Some real-life test queries show, that all single-node queries aren't 
pushed-down to the required node. For example:


SELECT
*
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5;

executed as following

  QUERY PLAN
---
 Nested Loop
   Join Filter: (documents.user_id = users.id)
   ->  Foreign Scan on users_node2 users
   ->  Materialize
 ->  Foreign Scan on documents_node2 documents

i.e. it uses two foreign scans and does the final join locally. However, 
once I specify target partitions explicitly, then the entire query is 
pushed down to the foreign node:


   QUERY PLAN
-
 Foreign Scan
   Relations: (documents_node2) INNER JOIN (users_node2)

Execution time is dropped significantly as well — by more than 3 times 
even for this small test database. Situation for simple queries with 
aggregates or joins and aggregates followed by the sharding key filter 
is the same. Something similar was briefly discussed in this thread [3].


IIUC, it means that push-down of queries through the postgres_fdw works 
perfectly well, the problem is with partition-wise operation detection 
at the planning time. Currently, partition-wise aggregate routines, 
e.g., looks for a GROUP BY and checks whether sharding key exists there 
or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it 
doesn't look for a content of WHERE clause, so frankly speaking it isn't 
a problem, this functionality is not yet implemented.


Actually, sometimes I was able to push down queries with aggregate 
simply by adding an additional GROUP BY with sharding key, like this:


SELECT
count(*)
FROM
documents
WHERE
company_id = 5
GROUP BY company_id;

where this GROUP BY obviously doesn't change a results, it just allows 
planner to choose from more possible paths.


Also, I have tried to hack it a bit and forcedly set 
PARTITIONWISE_AGGREGATE_FULL for this particular query. Everything 
executed fine and returned result was correct, which means that all 
underlying machinery is ready.


That way, I propose a change to the planner, which will check whether 
partitioning key exist in the WHERE clause and will set 
PARTITIONWISE_AGGREGATE_* flags if appropriate. The whole logic may look 
like:


1. If the only one condition by partitioning key is used (like above), 
then it is PARTITIONWISE_AGGREGATE_FULL.
2. If several conditions are used, then it should be 
PARTITIONWISE_AGGREGATE_PARTIAL.


I'm aware that WHERE clause may be extremely complex in general, but we 
could narrow this possible optimisation to the same restrictions as 
postgres_fdw push-down "only WHERE clauses using built-in operators and 
functions will be considered for execution on the remote server".


Although it seems that it will be easier to start with aggregates, 
probably we should initially plan a more general solution? For example, 
check that all involved tables are filtered by partitioning key and push 
down the entire query if all of them target the same foreign server.


Any thoughts?


[1] 
https://docs.citusdata.com/en/v9.3/get_started/tutorial_multi_tenant.html

[2] https://gist.github.com/ololobus/8fba33241f68be2e3765d27bf04882a3
[3] 
https://www.postgresql.org/message-id/flat/CAFT%2BaqL1Tt0qfYqjHH%2BshwPoW8qdFjpJ8vBR5ABoXJDUcHyN1w%40mail.gmail.com


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyDROP TABLE IF EXISTS companies CASCADE;
DROP TABLE IF EXISTS users CASCADE;
DROP TABLE IF EXISTS documents CASCADE;
DROP SERVER IF EXISTS node2 CASCADE;

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER node2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5433');
CREATE US

Re: Proposal: Automatic partition creation

2020-07-13 Thread Tom Lane
Anastasia Lubennikova  writes:
> On 06.07.2020 19:10, Tom Lane wrote:
>> Robert Haas  writes:
>>> I think the big problem here is identifying the operator to use. We
>>> have no way of identifying the "plus" or "minus" operator associated
>>> with a datatype; indeed, that constant doesn't exist.

>> We did indeed solve this in connection with window functions, cf
>> 0a459cec9.  I may be misunderstanding what the problem is here,
>> but I think trying to reuse that infrastructure might help.

> Do we need to introduce a new support function? Is there a reason why we 
> can not rely on '+' operator?

(1) the appropriate operator might not be named '+'
(2) even if it is, it might not be in your search_path
(3) you're vulnerable to security problems from someone capturing the
'+' operator with a better match; since you aren't writing the
operator explicitly, you can't fix that by qualifying it
(4) if the interval constant is written as an undecorated string
literal, the parser may have trouble resolving a match at all

> I understand that the addition operator may lack or be overloaded for
> some complex datatypes, but I haven't found any examples that are useful
> for range partitioning.

"It works for all the built-in data types" isn't really a satisfactory
answer.  But even just in the built-in types, consider "date":

# select oid::regoperator from pg_operator where oprname ='+' and oprleft = 
'date'::regtype;
  oid   

 +(date,interval)
 +(date,integer)
 +(date,time without time zone)
 +(date,time with time zone)
(4 rows)

It's not that immediately obvious which of these would make sense to use.

But the short answer here is that we did not accept relying on '+' being
the right thing for window function ranges, and I don't see why it is more
acceptable for partitioning ranges.  The existing places where our parser
relies on implicit operator names are, without exception, problematic [1].

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/ffefc172-a487-aa87-a0e7-472bf29735c8%40gmail.com




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 7:25 AM David Rowley  wrote:
> I think it would be good if we could try to move towards getting
> consensus here rather than reiterating our arguments over and over.

+1

> Updated summary:
> * For hash_mem = Tomas [7], Justin [16]
> * For hash_mem_multiplier with a default > 1.0 = DavidG [21]
> * For hash_mem_multiplier with default = 1.0 =  PeterG [15][0], Tom [20][24]
> * hash_mem out of scope for PG13 = Bruce [8], Andres [9]
> * hashagg_mem default to -1 meaning use work_mem = DavidR [23] (2nd 
> preference)
> * Escape hatch that can be removed later when we get something better
> = Jeff [11], DavidR [12], Pavel [13], Andres [14], Justin [1]
> * Add enable_hashagg_spill = Tom [2] (I'm unclear on this proposal.
> Does it affect the planner or executor or both?) (updated opinion in
> [20])
> * Maybe do nothing until we see how things go during beta = Bruce [3], Amit 
> [10]
> * Just let users set work_mem = Stephen [21], Alvaro [4] (Alvaro
> changed his mind after Andres pointed out that changes other nodes in
> the plan too [25])
> * Swap enable_hashagg for a GUC that specifies when spilling should
> occur. -1 means work_mem = Robert [17], Amit [18]
> * hash_mem does not solve the problem = Tomas [6] (changed his mind in [7])

I don't think that hashagg_mem needs to be considered here, because
you were the only one that spoke out in favor of that idea, and it's
your second preference in any case (maybe Tom was in favor of such a
thing at one point, but he clearly favors hash_mem/hash_mem_multiplier
now so it hardly matters). I don't think that hashagg_mem represents a
meaningful compromise between the escape hatch and
hash_mem/hash_mem_multiplier in any case. (I would *prefer* the escape
hatch to hashagg_mem, since at least the escape hatch is an "honest"
escape hatch.)

ISTM that there are three basic approaches to resolving this open item
that remain:

1. Do nothing.

2. Add an escape hatch.

3. Add hash_mem/hash_mem_multiplier.

Many people (e.g., Tom, Jeff, you, Andres, myself) have clearly
indicated that doing nothing is simply a non-starter. It's not just
that it doesn't get a lot of votes -- it's something that is strongly
opposed. We can rule it out right away.

This is where it gets harder. Many of us have views that are won't
easily fit into buckets. For example, even though I myself proposed
hash_mem/hash_mem_multiplier, I've said that I can live with the
escape hatch. Similarly, Jeff favors the escape hatch, but has said
that he can live with hash_mem/hash_mem_multiplier. And, Andres said
to me privately that he thinks that hash_mem could be a good idea,
even though he opposes it now due to release management
considerations.

Even still, I think that it's possible to divide people into two camps
on this without grossly misrepresenting anybody.

Primarily in favor of escape hatch:

Jeff,
DavidR,
Pavel,
Andres,
Robert ??,
Amit ??

Primarily in favor of hash_mem/hash_mem_multiplier:

PeterG,
Tom,
Alvaro,
Tomas,
Justin,
DavidG,
Jonathan Katz

There are clear problems with this summary, including for example the
fact that Robert weighed in before the hash_mem/hash_mem_multiplier
proposal was even on the table. What he actually said about it [1]
seems closer to hash_mem, so I feel that putting him in that bucket is
a conservative assumption on my part. Same goes for Amit, who warmed
to the idea of hash_mem_multiplier recently. (Though I probably got
some detail wrong, in which case please correct me.)

ISTM that there is a majority of opinion in favor of
hash_mem/hash_mem_multiplier. If you assume that I have this wrong,
and that we're simply deadlocked, then it becomes a matter for the
RMT. I strongly doubt that that changes the overall outcome, since
this year's RMT members happen to all be in favor of the
hash_mem/hash_mem_multiplier proposal on an individual basis.

[1] 
https://www.postgresql.org/message-id/ca+tgmobyv9+t-wjx-ctpdqurcgt1thz1ml3v1nxc4m4g-h6...@mail.gmail.com
-- 
Peter Geoghegan




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Justin Pryzby
On Mon, Jul 13, 2020 at 12:47:36PM -0400, Alvaro Herrera wrote:
> On 2020-Jul-13, Jeff Davis wrote:
> 
> > On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote:
> > > Updated summary:
> > > * For hash_mem = Tomas [7], Justin [16]
> > > * For hash_mem_multiplier with a default > 1.0 = DavidG [21]
> > > * For hash_mem_multiplier with default = 1.0 =  PeterG [15][0], Tom
> > > [20][24]
> > 
> > I am OK with these options, but I still prefer a simple escape hatch.
> 
> I'm in favor of hash_mem_multiplier.  I think a >1 default is more
> sensible than =1 in the long run, but if strategic vote is what we're
> doing, then I support the =1 option.

I recanted and support hash_mem_multiplier (or something supporting that
behavior, even if it also supports an absolute/scalar value).
https://www.postgresql.org/message-id/20200703145620.gk4...@telsasoft.com

1.0 (or -1) is fine, possibly to be >= 1.0 in master at a later date.

-- 
Justin




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane
Alvaro Herrera  writes:
> I'm in favor of hash_mem_multiplier.  I think a >1 default is more
> sensible than =1 in the long run, but if strategic vote is what we're
> doing, then I support the =1 option.

FWIW, I also think that we'll eventually end up with >1 default.
But the evidence to support that isn't really there yet, so
I'm good with 1.0 default to start with.

regards, tom lane




Re: pg_dump bug for extension owned tables

2020-07-13 Thread Andrew Dunstan


On 7/13/20 10:52 AM, Fabrízio de Royes Mello wrote:
>
> On Sat, Jul 11, 2020 at 8:07 PM Andrew Dunstan
>  > wrote:
> >
> >
> > On 6/26/20 2:10 PM, Fabrízio de Royes Mello wrote:
> > >
> > > On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello
> > > mailto:fabriziome...@gmail.com>
> >> wrote:
> > > >
> > > >
> > > > On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan
> > >  
> > >  >> wrote:
> > > > >
> > > > >
> > > > > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > > > > It appears that for extension owned tables
> tbinfo.attgenerated isn't
> > > > > > being properly populated, so line 2050 in REL_12_STABLE, which
> > > is line
> > > > > > 2109 in git tip, is failing.
> > > > > >
> > > > > >
> > > > >
> > > > > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> > > > >
> > > >
> > > > Having a look on it.
> > > >
> > >
> > > Seems when qualify the schemaname the the "tbinfo->interesting" field
> > > is not setted for extensions objects, so the getTableAttrs can't fill
> > > the attgenerated field properly.
> > >
> > > I'm not 100% sure it's the correct way but the attached patch works
> > > for me and all tests passed. Maybe we should add more TAP tests?
> > >
> > >
> >
> >
> > I just tried this patch out on master, with the test case I gave
> > upthread. It's not working, still getting a segfault.
> >
>
> Ohh really sorry about it... my bad... i completely forgot about it!!!
>
> Due to my rush I ended up adding the wrong patch version. Attached the
> correct version.
>
> Will add TAP tests to src/test/modules/test_pg_dump


yeah, that's the fix I came up with too. The only thing I added was
"Assert(tbinfo->attgenerated);" at about line 2097.


Will wait for your TAP tests.


thanks


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Proposal: Automatic partition creation

2020-07-13 Thread Anastasia Lubennikova

On 06.07.2020 19:10, Tom Lane wrote:

Robert Haas  writes:

On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova
 wrote:

I am going to implement this via SPI, which allow to simplify checks and
calculations. Do you see any pitfalls in this approach?

I don't really see why we need SPI here.

I would vote against any core facility that is implemented via SPI
queries.  It is just too darn hard to control the semantics completely in
the face of fun stuff like varying search_path.  Look at what a mess the
queries generated by the RI triggers are --- and they only have a very
small set of behaviors to worry about.  I'm still only about 95% confident
they don't have security issues, too.

If you're using SPI to try to look up appropriate operators, I think
the chances of being vulnerable to security problems are 100%.
Good to know, thank you for that. I had doubts about the internal usage 
of SPI,

but didn't know what exactly can go wrong.




I think the big problem here is identifying the operator to use. We
have no way of identifying the "plus" or "minus" operator associated
with a datatype; indeed, that constant doesn't exist.

We did indeed solve this in connection with window functions, cf
0a459cec9.  I may be misunderstanding what the problem is here,
but I think trying to reuse that infrastructure might help.


Do we need to introduce a new support function? Is there a reason why we 
can
not rely on '+' operator? I understand that the addition operator may 
lack or
be overloaded for some complex datatypes, but I haven't found any 
examples that
are useful for range partitioning. Both pg_pathman and pg_partman also 
use '+'

to generate bounds.

I explored the code a bit more and came up with this function, which is 
very
similar to generate_series_* functions, but it doesn't use SPI and looks 
for

the function that implements the '+' operator, instead of direct call:

// almost pseudocode

static Const *
generate_next_bound(Const *start, Const *interval)
{
    ObjectWithArgs *sum_oper_object = makeNode(ObjectWithArgs);

    sum_oper_object->type = OBJECT_OPERATOR;
    /* hardcode '+' operator for addition */
    sum_oper_object->objname = list_make1(makeString("+"));

    ltype = makeTypeNameFromOid(start->consttype, start->consttypmod);
    rtype = makeTypeNameFromOid(interval->consttype, 
interval->consttypmod);


    sum_oper_object->objargs = list_make2(ltype, rtype);

    sum_oper_oid = LookupOperWithArgs(sum_oper_object, false);
    oprcode = get_opcode(sum_oper_oid);
    fmgr_info(oprcode, &opproc);

next_bound->constvalue = FunctionCall2(&opproc,
                         start->constvalue,
                         interval->constvalue);
}

Thoughts?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Report error position in partition bound check

2020-07-13 Thread Alexandra Wang
> On 2 July 2020, at 06:39, Daniel Gustafsson  wrote:
> > On 10 Apr 2020, at 23:50, Alexandra Wang  wrote:
>
> > On Fri, Apr 10, 2020 at 8:37 AM Ashutosh Bapat 
> > mailto:ashutosh.ba...@2ndquadrant.com>> 
> > wrote:
> > > for a multi-key value the ^
> > > points to the first column and the reader may think that that's the
> > > problematci column. Should it instead point to ( ?
> >
> > I attached a v2 of Amit's 0002 patch to also report the exact column
> > for the partition overlap errors.
>
> This patch fails to apply to HEAD due to conflicts in the create_table 
> expected
> output.  Can you please submit a rebased version?  I'm marking the CF entry
> Waiting on Author in the meantime.

Thank you Daniel. Here's the rebased patch. I also squashed the two
patches into one so it's easier to review.

--
Alex

From 334bb4ea93448073778930201c0b959c5acab924 Mon Sep 17 00:00:00 2001
From: Alexandra Wang 
Date: Mon, 13 Jul 2020 10:28:04 -0700
Subject: [PATCH] Improve check new partition bound error position report

We have been passing a dummy ParseState to ereport(). Without the source
text in the ParseState ereport does not report the error position even
if a error location is supplied. This patch passes a ParseState to
check_new_partition_bound() when it is available.

-- Create parent table
create table foo (a int, b int, c date) partition by range (b,c);

-- Before:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

-- After:
create table foo_part_1 partition of foo for values from (1, date '2007-01-01') to (1, date '2006-01-01');
ERROR:  empty range bound specified for partition "foo_part_1"
LINE 1: ...e foo_part_1 partition of foo for values from (1, date '2007...
 ^
DETAIL:  Specified lower bound (1, '2007-01-01') is greater than or equal to upper bound (1, '2006-01-01').

Co-authored-by: Ashwin Agrawal 
Co-authored-by: Amit Langote 
---
 src/backend/commands/tablecmds.c   | 15 --
 src/backend/parser/parse_utilcmd.c |  3 ++
 src/backend/partitioning/partbounds.c  | 63 ++
 src/include/partitioning/partbounds.h  |  4 +-
 src/test/regress/expected/alter_table.out  | 10 
 src/test/regress/expected/create_table.out | 30 +++
 6 files changed, 98 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ed553f7384..4cd7709d33 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -541,7 +541,8 @@ static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partPa
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel);
 static ObjectAddress ATExecAttachPartition(List **wqueue, Relation rel,
-		   PartitionCmd *cmd);
+		   PartitionCmd *cmd,
+		   AlterTableUtilityContext * context);
 static void AttachPartitionEnsureIndexes(Relation rel, Relation attachrel);
 static void QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
 			   List *partConstraint,
@@ -1005,7 +1006,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * Check first that the new partition's bound is valid and does not
 		 * overlap with any of existing partitions of the parent.
 		 */
-		check_new_partition_bound(relname, parent, bound);
+		check_new_partition_bound(relname, parent, bound, pstate);
 
 		/*
 		 * If the default partition exists, its partition constraints will
@@ -4646,7 +4647,8 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	  cur_pass, context);
 			Assert(cmd != NULL);
 			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def);
+ATExecAttachPartition(wqueue, rel, (PartitionCmd *) cmd->def,
+	  context);
 			else
 ATExecAttachPartitionIdx(wqueue, rel,
 		 ((PartitionCmd *) cmd->def)->name);
@@ -16186,7 +16188,8 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
  * Return the address of the newly attached partition.
  */
 static ObjectAddress
-ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
+ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd,
+	  AlterTableUtilityContext * context)
 {
 	Relation	attachrel,
 catalog;
@@ -16201,6 +16204,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 	const char *trigger_name;
 	Oid			defaultPartOid;
 	List	   *partBoundConstraint;
+	ParseState *pstate = make_parsestate(NULL);
 
 	/*
 	 * We must lock the default partition if one exists, because attaching a
@@ -16365,8 +16369,9 @@ ATExecAt

Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM

2020-07-13 Thread Justin Pryzby
On Sun, May 31, 2020 at 10:13:39PM +, Bossart, Nathan wrote:
> Here is a rebased version of the patch.

Should bin/vacuumdb support this?

Should vacuumdb have a way to pass an arbitrary option to the server, instead
of tacking on options (which are frequently forgotten on the initial commit to
the backend VACUUM command) ?  That has the advantage that vacuumdb could use
new options even when connecting to a new server version than client.  I think
it would be safe as long as it avoided characters like ')' and ';'.  Maybe
all that's needed is isdigit() || isalpha() || isspace() || c=='_'

+MAIN_RELATION_CLEANUP [ boolean ]
+TOAST_TABLE_CLEANUP [ boolean 
]

Maybe should be called TOAST_RELATION_CLEANUP 

See attached.

-- 
Justin
>From 2a4f411b3f2ee0706b1431f006e998503a420e03 Mon Sep 17 00:00:00 2001
From: "Bossart, Nathan" 
Date: Sun, 31 May 2020 22:13:39 +
Subject: [PATCH 1/3] Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP
 options to VACUUM

Here is a rebased version of the patch.

Nathan

>From 27ccbb7af5d1e16da65c819b7fb93955395643d8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sun, 31 May 2020 21:29:39 +
Subject: [PATCH v4 1/1] Add MAIN_RELATION_CLEANUP and TOAST_TABLE_CLEANUP
 options to VACUUM.
---
 doc/src/sgml/ref/vacuum.sgml | 28 ++
 src/backend/commands/vacuum.c| 76 +---
 src/backend/postmaster/autovacuum.c  |  2 +-
 src/bin/psql/tab-complete.c  |  5 +-
 src/include/commands/vacuum.h|  5 +-
 src/test/regress/expected/vacuum.out | 10 
 src/test/regress/sql/vacuum.sql  | 10 
 7 files changed, 112 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index a48f75ad7b..3e15824eaa 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -33,6 +33,8 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 SKIP_LOCKED [ boolean ]
 INDEX_CLEANUP [ boolean ]
+MAIN_RELATION_CLEANUP [ boolean ]
+TOAST_TABLE_CLEANUP [ boolean ]
 TRUNCATE [ boolean ]
 PARALLEL integer
 
@@ -210,6 +212,32 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ defname, "main_relation_cleanup") == 0)
+			main_rel_cleanup = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "toast_table_cleanup") == 0)
+			toast_cleanup = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "parallel") == 0)
@@ -189,13 +198,14 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(analyze ? VACOPT_ANALYZE : 0) |
 		(freeze ? VACOPT_FREEZE : 0) |
 		(full ? VACOPT_FULL : 0) |
-		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0);
+		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) |
+		(main_rel_cleanup ? VACOPT_MAIN_REL_CLEANUP : 0) |
+		(toast_cleanup ? VACOPT_TOAST_CLEANUP : 0);
 
 	/* sanity checks on options */
 	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
 	Assert((params.options & VACOPT_VACUUM) ||
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert(!(params.options & VACOPT_SKIPTOAST));
 
 	if ((params.options & VACOPT_FULL) && params.nworkers > 0)
 		ereport(ERROR,
@@ -318,6 +328,16 @@ vacuum(List *relations, VacuumParams *params,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
+	/*
+	 * Sanity check TOAST_TABLE_CLEANUP option.
+	 */
+	if ((params->options & VACOPT_FULL) != 0 &&
+		(params->options & VACOPT_TOAST_CLEANUP) == 0)
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("VACUUM option TOAST_TABLE_CLEANUP cannot be "
+		"disabled when FULL is specified")));
+
 	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
@@ -446,7 +466,7 @@ vacuum(List *relations, VacuumParams *params,
 
 			if (params->options & VACOPT_VACUUM)
 			{
-if (!vacuum_rel(vrel->oid, vrel->relation, params))
+if (!vacuum_rel(vrel->oid, vrel->relation, params, false))
 	continue;
 			}
 
@@ -1665,7 +1685,10 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
+vacuum_rel(Oid relid,
+		   RangeVar *relation,
+		   VacuumParams *params,
+		   bool processing_toast_table)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1674,6 +1697,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	bool		process_toast;
 
 	Assert(params != NULL);
 
@@ -1841,9 +1865,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast ta

Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Justin Pryzby
On Mon, Jul 13, 2020 at 08:15:42AM +0200, Pavel Stehule wrote:
> > Do you want to add any more documentation ?
> >
> 
> done

Thanks - I think the documentation was maybe excessive.  See attached.

-- 
Justin
>From b6ceedcd7f4395fac822059229cb475aa2805c1e Mon Sep 17 00:00:00 2001
From: Pavel Stehule 
Date: Mon, 13 Jul 2020 10:20:42 +0200
Subject: [PATCH 1/2] proposal: possibility to read dumped table's name from
 file

---
 doc/src/sgml/ref/pg_dump.sgml |  93 +++
 src/bin/pg_dump/pg_dump.c | 215 ++
 2 files changed, 308 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7a37fd8045..2f2bfb4dbf 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -755,6 +755,99 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+This option ensure reading object's filters from specified file.
+If you use "-" as filename, then stdin is used as source. This file
+has to have following line format:
+
+(+|-)[tnfd] objectname
+
+Only one object name can be specified per one line:
+
++t mytable1
++t mytable2
++f some_foreign_table
+-d mytable3
+
+With this file the dump ensures dump table mytable1,
+mytable2. The data of foreign table
+some_foreign_table will be dumped too. And the data
+of mytable3 will not be dumped.
+   
+
+   
+The first char + or - specifies
+if object name will be used as include or exclude filter.
+   
+
+   
+The second char
+t,
+n,
+f,
+d
+specifies a object type.
+
+
+ 
+  t
+  
+   
+In inclusive form (+) it does same work like
+--table. In exclusive form (-)
+it is same like --exclude-table.
+   
+  
+ 
+
+ 
+  n
+  
+   
+In inclusive form (+) it does same work like
+--schema. In exclusive form (-)
+it is same like --exclude-schema.
+   
+  
+ 
+
+ 
+  f
+  
+   
+In inclusive form (+) it does same work like
+--include-foreign-data. The exclusive form
+(-) is not allowed.
+   
+  
+ 
+
+ 
+  d
+  
+   
+The inclusive form (+) is not allowed.
+In exclusive form (-) it is same like
+--exclude-table-data.
+   
+  
+ 
+
+   
+
+   
+The option --filter can be used together with options
+--table, --exclude-table,
+--schema, --exclude-schema,
+--include-foreign-data and
+--exclude-table-data.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e758b5c50a..fd6b7a174a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -290,6 +290,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(TableInfo *tbinfo);
+static void read_patterns_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -364,6 +365,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -603,6 +605,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* filter implementation */
+read_patterns_from_file(optarg, &dopt);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -1022,6 +1028,7 @@ help(const char *progname)
 			 "   access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAMEread object name filter expressions from file\n"));
 	printf(_("  --if-exists  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "   include data of foreign tables on foreign\n"
@@ -18597,3 +18604,211 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse reloptions array");
 }
+
+#define		FILTER_INITIAL_LINE_SIZE		1024
+#define		PG_GETLINE_EXTEND_LINE_SIZE

Re: min_safe_lsn column in pg_replication_slots view

2020-07-13 Thread Alvaro Herrera
On 2020-Jul-13, Alvaro Herrera wrote:

> A much more sensible answer is to initialize the segno to the segment
> currently being written, as in the attached.

Ran the valgrind test locally and it passes.  Pushed it now.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_dump bug for extension owned tables

2020-07-13 Thread Fabrízio de Royes Mello
On Mon, Jul 13, 2020 at 11:52 AM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Sat, Jul 11, 2020 at 8:07 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
> >
> >
> > On 6/26/20 2:10 PM, Fabrízio de Royes Mello wrote:
> > >
> > > On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello
> > > mailto:fabriziome...@gmail.com>> wrote:
> > > >
> > > >
> > > > On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan
> > >  > > > wrote:
> > > > >
> > > > >
> > > > > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > > > > It appears that for extension owned tables tbinfo.attgenerated
isn't
> > > > > > being properly populated, so line 2050 in REL_12_STABLE, which
> > > is line
> > > > > > 2109 in git tip, is failing.
> > > > > >
> > > > > >
> > > > >
> > > > > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> > > > >
> > > >
> > > > Having a look on it.
> > > >
> > >
> > > Seems when qualify the schemaname the the "tbinfo->interesting" field
> > > is not setted for extensions objects, so the getTableAttrs can't fill
> > > the attgenerated field properly.
> > >
> > > I'm not 100% sure it's the correct way but the attached patch works
> > > for me and all tests passed. Maybe we should add more TAP tests?
> > >
> > >
> >
> >
> > I just tried this patch out on master, with the test case I gave
> > upthread. It's not working, still getting a segfault.
> >
>
> Ohh really sorry about it... my bad... i completely forgot about it!!!
>
> Due to my rush I ended up adding the wrong patch version. Attached the
correct version.
>
> Will add TAP tests to src/test/modules/test_pg_dump
>

Attached the patch including TAP tests.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e758b5c50a..a998e5c821 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -18037,6 +18037,8 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
 			configtbl->dataObj->filtercond = pg_strdup(extconditionarray[j]);
 	}
 }
+
+configtbl->interesting = dumpobj;
 			}
 		}
 		if (extconfigarray)
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index ae120a5ee3..78aa07ce51 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -135,6 +135,12 @@ my %pgdump_runs = (
 			"$tempdir/defaults_tar_format.tar",
 		],
 	},
+	extension_schema => {
+		dump_cmd => [
+			'pg_dump', '--schema=public', '--inserts',
+			"--file=$tempdir/extension_schema.sql", 'postgres',
+		],
+	},
 	pg_dumpall_globals => {
 		dump_cmd => [
 			'pg_dumpall', '--no-sync',
@@ -301,8 +307,9 @@ my %tests = (
 			\n/xm,
 		like => {
 			%full_runs,
-			data_only=> 1,
-			section_data => 1,
+			data_only=> 1,
+			section_data => 1,
+			extension_schema => 1,
 		},
 	},
 
@@ -536,6 +543,7 @@ my %tests = (
 		like   => {%pgdump_runs},
 		unlike => {
 			data_only  => 1,
+			extension_schema   => 1,
 			pg_dumpall_globals => 1,
 			section_data   => 1,
 			section_pre_data   => 1,
@@ -549,6 +557,7 @@ my %tests = (
 		like   => {%pgdump_runs},
 		unlike => {
 			data_only  => 1,
+			extension_schema   => 1,
 			pg_dumpall_globals => 1,
 			section_data   => 1,
 			section_pre_data   => 1,
@@ -569,6 +578,17 @@ my %tests = (
 			schema_only  => 1,
 			section_pre_data => 1,
 		},
+	},
+
+	# Dumpable object inside specific schema
+	'INSERT INTO public.regress_table_dumpable VALUES (1);' => {
+		create_sql   => 'INSERT INTO public.regress_table_dumpable VALUES (1);',
+		regexp   => qr/^
+			\QINSERT INTO public.regress_table_dumpable VALUES (1);\E
+			\n/xm,
+		like => {
+			extension_schema => 1,
+		},
 	},);
 
 #
diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
index 3ed007a7b1..90e461ed35 100644
--- a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
+++ b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
@@ -13,6 +13,11 @@ CREATE SEQUENCE regress_pg_dump_seq;
 CREATE SEQUENCE regress_seq_dumpable;
 SELECT pg_catalog.pg_extension_config_dump('regress_seq_dumpable', '');
 
+CREATE TABLE regress_table_dumpable (
+	col1 int
+);
+SELECT pg_catalog.pg_extension_config_dump('regress_table_dumpable', '');
+
 CREATE SCHEMA regress_pg_dump_schema;
 
 GRANT USAGE ON regress_pg_dump_seq TO regress_dump_test_role;


Re: Compatible defaults for LEAD/LAG

2020-07-13 Thread Pavel Stehule
Hi

ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing 
napsal:

> On 5/31/20 9:53 PM, Tom Lane wrote:
> > Vik Fearing  writes:
> >>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> >>   postgres-# ORDER BY n;
> >>   ERROR:  function lag(numeric, integer, integer) does not exist
> >>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>^
> >
> > Yeah, we have similar issues elsewhere.
> >
> >> Attached is a patch that fixes this issue using the new anycompatible
> >> pseudotype.  I am hoping this can be slipped into 13 even though it
> >> requires a catversion bump after BETA1.
> >
> > When the anycompatible patch went in, I thought for a little bit about
> > trying to use it with existing built-in functions, but didn't have the
> > time to investigate the issue in detail.  I'm not in favor of hacking
> > things one-function-at-a-time here; we should look through the whole
> > library and see what we've got.
> >
> > The main thing that makes this perhaps-not-trivial is that an
> > anycompatible-ified function will match more cases than it would have
> > before, possibly causing conflicts if the function or operator name
> > is overloaded.  We'd have to look at such cases and decide what we
> > want to do --- one answer would be to drop some of the alternatives
> > and rely on the parser to add casts, but that might slow things down.
> >
> > Anyway, I agree that this is a good direction to pursue, but not in
> > a last-minute-hack-for-v13 way.
>
> Fair enough.  I put it in the commitfest app for version 14.
> https://commitfest.postgresql.org/28/2574/
> --
> Vik Fearing
>

The patch is ok. It is almost trivial. It solves one issue, but maybe it
introduces a new issue.

Other databases try to coerce default constant expression to value
expression. I found a description about this behaviour for MSSQL, Sybase,
BigQuery.

I didn't find related documentation for Oracle, and I have not a access to
some running instance of Oracle to test it.

Ansi SQL say - type of default expression should be compatible with lag
expression, and declared type of result should be type of value expression.

IWD 9075-2:201?(E) 6.10  - j) v)

Current implementation is limited (and the behaviour is not user friendly
in all details), but new behaviour (implemented by patch) is in half cases
out of standard :(.

These cases are probably not often - and they are really corner cases, but
I cannot to qualify how much important this fact is.

For users, the implemented feature is better, and it is safe.
Implementation is trivial - is significantly simpler than implementation
that is 100% standard, although it should not be a hard problem too (in
analyze stage it probably needs a few lines too).

There has to be a decision, because now we can go in both directions. After
choosing there will not be a way back.

Regards

Pavel


Re: min_safe_lsn column in pg_replication_slots view

2020-07-13 Thread Alvaro Herrera
A much more sensible answer is to initialize the segno to the segment
currently being written, as in the attached.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 699e3a25e0673353fcb10fa92577f7534e594227 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 13:22:40 -0400
Subject: [PATCH v3] Fix uninitialized value in segno calculation

Remove previous hack in KeepLogSeg that added a case to deal with a
(badly represented) invalid segment number; to replace it, initialize
the segment number to be retreated to the currently being written
segment.

Per valgrind-running buildfarm member skink, and some sparc64 animals.

Discussion: https://postgr.es/m/1724648.1594230...@sss.pgh.pa.us
---
 src/backend/access/transam/xlog.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 28daf72a50..0a97b1d37f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9523,13 +9523,13 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	if (XLogRecPtrIsInvalid(targetLSN))
 		return WALAVAIL_INVALID_LSN;
 
-	currpos = GetXLogWriteRecPtr();
-
 	/*
-	 * calculate the oldest segment currently reserved by all slots,
-	 * considering wal_keep_segments and max_slot_wal_keep_size
+	 * Calculate the oldest segment currently reserved by all slots,
+	 * considering wal_keep_segments and max_slot_wal_keep_size.  Initialize
+	 * oldestSlotSeg to the current segment.
 	 */
-	XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+	currpos = GetXLogWriteRecPtr();
+	XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
 	KeepLogSeg(currpos, &oldestSlotSeg);
 
 	/*
@@ -9548,6 +9548,9 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	else
 		oldestSegMaxWalSize = 1;
 
+	/* the segment we care about */
+	XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
+
 	/*
 	 * No point in returning reserved or extended status values if the
 	 * targetSeg is known to be lost.
@@ -9624,7 +9627,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	}
 
 	/* don't delete WAL segments newer than the calculated segment */
-	if (XLogRecPtrIsInvalid(*logSegNo) || segno < *logSegNo)
+	if (segno < *logSegNo)
 		*logSegNo = segno;
 }
 
-- 
2.20.1



Fix header identification

2020-07-13 Thread Jesse Zhang
Hi hackers,

PFA a patch that fixes up the identification for 4 header files.

I did a little archaeology trying to find plausible reasons for why we
committed the wrong identification in the first place, and here's what I
came up with:

jsonfuncs.h was created in ce0425b162d0a to house backend-only json
function declarations previously in jsonapi.h, and the identification
was a copy-pasta from jsonapi.h (then in src/include/utils).

jsonapi.h was wholesale moved to common in beb4699091e9f without
changing identification.

partdesc.h was created in 1bb5e78218107 but I can't find a good excuse
why we made a mistake then, except for (maybe) the prototype for
RelationBuildPartitionDesc() was moved from partcache.h and so we may
have also taken its identification line, although that sounds arbitrary.

llvmjit_emit.h was created with the wrong identification, probably due
to a personal template...

Cheers,
Jesse
From 55c7280ee09ab258efc1c59b4ce6226228bcbab0 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Thu, 9 Jul 2020 13:39:39 -0700
Subject: [PATCH] Fix header identification.

jsonfuncs.h was created in ce0425b162d0a to house backend-only json
function declarations previously in jsonapi.h, and the identification
was a copy-pasta from jsonapi.h (then in src/include/utils).

jsonapi.h was wholesale moved to common in beb4699091e9f without
changing identification.

partdesc.h was created in 1bb5e78218107 but I can't find a good excuse
why we made a mistake then, except for (maybe) the prototype for
RelationBuildPartitionDesc() was moved from partcache.h and so we may
have also taken its identification line, although that sounds arbitrary.

llvmjit_emit.h was created with the wrong identification, probably due
to a personal template...
---
 src/include/common/jsonapi.h| 2 +-
 src/include/jit/llvmjit_emit.h  | 2 +-
 src/include/partitioning/partdesc.h | 2 +-
 src/include/utils/jsonfuncs.h   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index bcfd57cc53c46..1fee0ea81e47a 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/common/jsonapi.h
  *
  *-
  */
diff --git a/src/include/jit/llvmjit_emit.h b/src/include/jit/llvmjit_emit.h
index 74607a43770af..1a7d6db7259e0 100644
--- a/src/include/jit/llvmjit_emit.h
+++ b/src/include/jit/llvmjit_emit.h
@@ -4,7 +4,7 @@
  *
  * Copyright (c) 2018-2020, PostgreSQL Global Development Group
  *
- * src/include/lib/llvmjit_emit.h
+ * src/include/jit/llvmjit_emit.h
  */
 #ifndef LLVMJIT_EMIT_H
 #define LLVMJIT_EMIT_H
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index fb416e073d075..70df764981e70 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -4,7 +4,7 @@
  *
  * Copyright (c) 1996-2020, PostgreSQL Global Development Group
  *
- * src/include/utils/partdesc.h
+ * src/include/partitioning/partdesc.h
  *
  *-
  */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 1f1b4029cbf12..4796b2b78bf9a 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/utils/jsonfuncs.h
  *
  *-
  */
-- 
2.27.0



Re: output columns of \dAo and \dAp

2020-07-13 Thread Alexander Korotkov
On Mon, Jul 13, 2020 at 7:54 PM Jonathan S. Katz  wrote:
> On 7/13/20 10:37 AM, Tom Lane wrote:
> > Alexander Korotkov  writes:
> >> Good compromise.  Done as you proposed.
> >
> > I'm OK with this version.
>
> I saw this was committed and the item was adjusted on the Open Items list.

Thank you!

--
Regards,
Alexander Korotkov




Re: output columns of \dAo and \dAp

2020-07-13 Thread Jonathan S. Katz
On 7/13/20 10:37 AM, Tom Lane wrote:
> Alexander Korotkov  writes:
>> Good compromise.  Done as you proposed.
> 
> I'm OK with this version.

I saw this was committed and the item was adjusted on the Open Items list.

Thank you!

Jonathan



signature.asc
Description: OpenPGP digital signature


PostgreSQL 13 Beta 3 Release Date

2020-07-13 Thread Jonathan S. Katz
Hi,

The PostgreSQL 13 Release Management Team is pleased to announce the
release date of PostgreSQL 13 Beta 3 is set to 2020-08-13, which is the
same day as the cumulative update release[1]. Please be sure to have
your patches committed for PostgreSQL 13 no latter than Sunday,
2020-08-09 AOE[2].

We thank everyone for your continued testing and resolution of open
items[3] on the list.

Thanks!

The PostgreSQL 13 RMT

[1] https://www.postgresql.org/developer/roadmap/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth
[3] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items



signature.asc
Description: OpenPGP digital signature


Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Alvaro Herrera
On 2020-Jul-13, Jeff Davis wrote:

> On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote:
> > Updated summary:
> > * For hash_mem = Tomas [7], Justin [16]
> > * For hash_mem_multiplier with a default > 1.0 = DavidG [21]
> > * For hash_mem_multiplier with default = 1.0 =  PeterG [15][0], Tom
> > [20][24]
> 
> I am OK with these options, but I still prefer a simple escape hatch.

I'm in favor of hash_mem_multiplier.  I think a >1 default is more
sensible than =1 in the long run, but if strategic vote is what we're
doing, then I support the =1 option.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost  wrote:
> Yes, increasing work_mem isn't unusual, at all.

It's unusual as a way of avoiding OOMs!

> Eh?  That's not at all what it looks like- they were getting OOM's
> because they set work_mem to be higher than the actual amount of memory
> they had and the Sort before the GroupAgg was actually trying to use all
> that memory.  The HashAgg ended up not needing that much memory because
> the aggregated set wasn't actually that large.  If anything, this shows
> exactly what Jeff's fine work here is (hopefully) going to give us- the
> option to plan a HashAgg in such cases, since we can accept spilling to
> disk if we end up underestimate, or take advantage of that HashAgg
> being entirely in memory if we overestimate.

I very specifically said that it wasn't a case where something like
hash_mem would be expected to make all the difference.

> Having looked back, I'm not sure that I'm really in the minority
> regarding the proposal to add this at this time either- there's been a
> few different comments that it's too late for v13 and/or that we should
> see if we actually end up with users seriously complaining about the
> lack of a separate way to specify the memory for a given node type,
> and/or that if we're going to do this then we should have a broader set
> of options covering other nodes types too, all of which are positions
> that I agree with.

By proposing to do nothing at all, you are very clearly in a small
minority. While (for example) I might have debated the details with
David Rowley a lot recently, and you couldn't exactly say that we're
in agreement, our two positions are nevertheless relatively close
together.

AFAICT, the only other person that has argued that we should do
nothing (have no new GUC) is Bruce, which was a while ago now. (Amit
said something similar, but has since softened his opinion [1]).

[1] 
https://postgr.es.m/m/CAA4eK1+KMSQuOq5Gsj-g-pYec_8zgGb4K=xrznbcccnaumf...@mail.gmail.com
-- 
Peter Geoghegan




Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Pavel Stehule
po 13. 7. 2020 v 16:57 odesílatel Daniel Gustafsson 
napsal:

> > On 13 Jul 2020, at 13:02, Pavel Stehule  wrote:
>
> > I like JSON format. But why here? For this purpose the JSON is over
> engineered.
>
> I respectfully disagree, JSON is a commonly used and known format in
> systems
> administration and most importantly: we already have code to parse it in
> the
> frontend.
>

I disagree with the idea so  if we have a client side JSON parser we have
to use it everywhere.
For this case, parsing JSON means more code, not less. I checked the
parse_manifest.c. More
the JSON API is DOM type. For this purpose the SAX type is better. But
still, things should be simple as possible.
There is not any necessity to use it.

JSON is good for a lot of purposes, and can be good if the document uses
more lexer types, numeric, ... But nothing is used there


> > This input file has  no nested structure - it is just a stream of lines.
>
> Well, it has a set of object types which in turn have objects.  There is
> more
> structure than meets the eye.
>

> Also, the current patch allows arbitrary whitespace before object names,
> but no
> whitespace before comments etc.  Using something where the rules of
> parsing are
> known is rarely a bad thing.
>

if I know - JSON hasn't comments at all.


> > I don't think so introducing JSON here can be a good idea.
>
> Quite possibly it isn't, but not discussing options seems like a worse
> idea so
> I wanted to bring it up.
>
> > It is a really different case than pg_dump manifest file - in this case,
> in this case pg_dump is consument.
>
> Right, as I said these are two different, while tangentially related,
> things.
>

Backup manifest format has no trivial complexity - and using JSON has
sense. Input filter file is a trivial - +/- list of strings (and it will be
everytime).

In this case I don't see any benefits from JSON - on both sides (producent,
consuments). It is harder (little bit) to parse it, it is harder (little
bit) to generate it.

Regards

Pavel


> cheers ./daniel


Re: [PATCH] fix GIN index search sometimes losing results

2020-07-13 Thread Pavel Borisov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hi, all!

It seems that as of now we have two sets of patches for this bug:
1. Tom Lane's: 0001-make-callbacks-ternary.patch and 
0002-remove-calc-not-flag.patch
2. My: gin-gist-weight-patch-v4.diff

There was a quite long discussion above and I suppose that despite the 
difference both of them suit and will do the necessary fix. 
So I decided to make a review of both Tom Lane's patches.

Both of them apply clean. Checks are sucessful. There are regression tests 
included and they cover the bug. Also I made checks on my PgList database and I 
suppose the bug is indeed fixed.

For 0001-make-callbacks-ternary.patch
As it was mentioned in discussion, the issue was that in certain cases compare 
function of a single operand in a query should give undefined meaning "MAYBE" 
which should remain towards to the root of a tree. So the patch in my opinion 
adresses the problem in a right way.

Possible dangers of changed callback from binary to ternary is that any side 
modules which still use binary interface will get warnings on compile and will 
need minor modifications of code to comply with new interface. I checked it 
with RUM index and indeed get warnings on compile. In discussion above it was 
noted that anyway there is no way to get right results in tsearch with NOT 
without modification of this so I'd recommend committing patch 0001.

For 0002-remove-calc-not-flag.patch
The patch changes the behavior which is now considered default. This is true in 
RUM module and maybe in some other tsearch side modules. Applying the patch can 
make code more beautiful but possibly will not give some performance gain and 
bug is anyway fixed by patch 0001.

Overall I'd recommend patch 0001-make-callbacks-ternary.patch and close the 
issue.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

The new status of this patch is: Ready for Committer


Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut

On 2020-07-13 16:11, Tomas Vondra wrote:

Why is running out of disk space worse experience than running out of
memory?

Sure, it'll take longer and ultimately the query fails (and if it fills
the device used by the WAL then it may also cause shutdown of the main
instance due to inability to write WAL). But that can be prevented by
moving the temp tablespace and/or setting the temp file limit, as
already mentioned.

With OOM, if the kernel OOM killer decides to act, it may easily bring
down the instance too, and there are much less options to prevent that.


Well, that's an interesting point.  Depending on the OS setup, by 
default an out of memory might actually be worse if the OOM killer 
strikes in an unfortunate way.  That didn't happen to me in my tests, so 
the OS must have been configured differently by default.


So maybe a lesson here is that just like we have been teaching users to 
adjust the OOM killer, we have to teach them now that setting the temp 
file limit might become more important.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: GSSENC'ed connection stalls while reconnection attempts.

2020-07-13 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Fri, 10 Jul 2020 12:01:10 -0400, Tom Lane  wrote in 
>> The attached patch makes this all act more like the way SSL is handled,
>> and for me it resolves the reconnection problem.

> It looks good to me.

OK, thanks.

>>> The reason that psql doesn't notice the error is pqPacketSend returns
>>> STATUS_OK when write error occurred.  That behavior contradicts to the
>>> comment of the function. The function is used only while making
>>> connection so it's ok to error out immediately by write failure.  I
>>> think other usage of pqFlush while making a connection should report
>>> write failure the same way.

>> I'm disinclined to mess with that, because (a) I don't think it's the
>> actual source of the problem, and (b) it would affect way more than
>> just GSS mode.

> If I did that in pqFlush your objection would be right, but
> pqPacketSend is defined as "RETURNS: STATUS_ERROR if the write fails"
> so not doing that is just wrong. (pqSendSome reported write failure in
> this case.) For other parts in authentication code, I don't think it
> doesn't affect badly because authentication should proceed without any
> read/write overlapping.

As the comment for pqSendSome says, we report a write failure immediately
only if we also cannot read.  I don't really see a reason why the behavior
described there isn't fine during initial connection as well.  If you feel
that the comment for pqPacketSend needs adjustment, we can do that.
In any case, I'm quite against changing pqPacketSend's behavior because
"it's only used during initial connection"; there is nothing about the
function that restricts it to that case.

Bottom line here is that I'm suspicious of changing the behavior of
the read/write code on the strength of a bug in the GSS state management
logic.  If there's a reason to change the read/write code, we should be
able to demonstrate it without the GSS bug.

regards, tom lane




Re: [patch] demote

2020-07-13 Thread Jehan-Guillaume de Rorthais
Hi,

Another summary + patch + tests.

This patch supports 2PC. The goal is to keep them safe during demote/promote
actions so they can be committed/rollbacked later on a primary. See tests.

The checkpointer is now shutdowned after the demote shutdown checkpoint. It
removes some useless code complexity, eg. avoiding to signal postmaster from
checkpointer to keep going with the demotion. 

Cascaded replication is now supported. Wal senders stay actives during
demotion but set their local "am_cascading_walsender = true". It has been a
rough debug session (thank you rr and tests!) on my side, but it might deserve
it. I believe they should stay connected during the demote actions for futur
features, eg. triggering a switchover over the replication protocol using an
admin function.

The first tests has been added in "recovery/t/021_promote-demote.pl". I'll add
some more tests in futur versions.

I believe the patch is ready for some preliminary tests and advice or
directions.

On my todo:

* study how to only disconnect or cancel active RW backends
  * ...then add pg_demote() admin function
* cancel running checkpoint for fast demote ?
* user documentation
* Robert's concern about snapshot during hot standby
* some more coding style cleanup/refactoring
* anything else reported to me :)

Thanks,

On Fri, 3 Jul 2020 00:12:10 +0200
Jehan-Guillaume de Rorthais  wrote:

> Hi,
> 
> Here is a small activity summary since last report.
> 
> On Thu, 25 Jun 2020 19:27:54 +0200
> Jehan-Guillaume de Rorthais  wrote:
> [...]
> > I hadn't time to investigate Robert's concern about shared memory for
> > snapshot during recovery.
> 
> I hadn't time to dig very far, but I suppose this might be related to the
> comment in ProcArrayShmemSize(). If I'm right, then it seems the space is
> already allocated as long as hot_standby is enabled. I realize it doesn't
> means we are on the safe side of the fence though. I still have to have a
> better understanding on this.
> 
> > The patch doesn't deal with prepared xact yet. Testing
> > "start->demote->promote" raise an assert if some prepared xact exist. I
> > suppose I will rollback them during demote in next patch version.
> 
> Rollback all prepared transaction on demote seems easy. However, I realized
> there's no point to cancel them. After the demote action, they might still be
> committed later on a promoted instance.
> 
> I am currently trying to clean shared memory for existing prepared transaction
> so they are handled by the startup process during recovery.
> I've been able to clean TwoPhaseState and the ProcArray. I'm now in the
> process to clean remaining prepared xact locks.
> 
> Regards,
> 
> 



-- 
Jehan-Guillaume de Rorthais
Dalibo
>From 4470772702273c720cdea942ed229d59f3a70318 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Fri, 10 Apr 2020 18:01:45 +0200
Subject: [PATCH 1/2] Support demoting instance from production to standby

Architecture:

* creates a shutdown checkpoint on demote
* use DB_DEMOTING state in controlfile
* try to handle subsystems init correctly during demote
* keep some sub-processes alive:
  stat collector, bgwriter and optionally archiver or wal senders
* the code currently use USR1 to signal the wal senders to check
  if they must enable the cascading mode
* ShutdownXLOG takes a boolean arg to handle demote differently
* the checkpointer is restarted for code simplicity

Trivial manual tests:

* make check: OK
* make check-world: OK
* start in production->demote->demote: OK
* start in production->demote->stop: OK
* start in production->demote-> promote: OK
* switch roles between primary and standby (switchover): OK
* commit and check 2PC after demote/promote
* commit and check 2PC after switchover

Discuss/Todo:

* cancel or kill active and idle in xact RW backends
  * keep RO backends
  * pg_demote() function?
* code reviewing, arch, analysis, checks, etc
* investigate snapshots shmem needs/init during recovery compare to
  production
* add doc
* cancel running checkpoint during demote
  * replace with a END_OF_PRODUCTION xlog record?
---
 src/backend/access/transam/twophase.c   |  95 +++
 src/backend/access/transam/xlog.c   | 315 
 src/backend/postmaster/checkpointer.c   |  28 +++
 src/backend/postmaster/postmaster.c | 261 +++-
 src/backend/replication/walsender.c |   1 +
 src/backend/storage/ipc/procarray.c |   2 +
 src/backend/storage/ipc/procsignal.c|   4 +
 src/backend/storage/lmgr/lock.c |  12 +
 src/bin/pg_controldata/pg_controldata.c |   2 +
 src/bin/pg_ctl/pg_ctl.c | 111 +
 src/include/access/twophase.h   |   1 +
 src/include/access/xlog.h   |  19 +-
 src/include/catalog/pg_control.h|   1 +
 src/include/libpq/libpq-be.h|   7 +-
 src/include/postmaster/bgwriter.h   |   1 +
 src/include/storage/lock.h  |   2 +
 src/include/storage/procsignal.h|   1 +
 src/includ

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Bharath Rupireddy
>
> I had one small comment:
> +{
> +   int copied_bytes = 0;
> +
> +#define BUF_BYTES  (cstate->raw_buf_len - cstate->raw_buf_index)
> +#define DRAIN_COPY_RAW_BUF(cstate, dest, nbytes)\
> +   do {\
> +   memcpy((dest), (cstate)->raw_buf +
> (cstate)->raw_buf_index, (nbytes));\
> +   (cstate)->raw_buf_index += (nbytes);\
> +   } while(0)
>
> BUF_BYTES could be used in CopyLoadRawBuf function also.
>

Thanks Vignesh for the find out. I changed and attached the v5 patch.
The regression tests(make check and make check-world) ran
successfully.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 757091a228b8def825f671cf318adde8cc4639ed Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 13 Jul 2020 19:57:57 +0530
Subject: [PATCH v5] Performance Improvement For Copy From Binary Files

Read data from binary file in RAW_BUF_SIZE bytes at a time for
COPY FROM command, instead of reading everytime from file for
field count, size and field data. This avoids fread calls
significantly and improves performance of the COPY FROM binary
files command.
---
 src/backend/commands/copy.c | 138 +---
 1 file changed, 82 insertions(+), 56 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 99d1457180..7973cde323 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -195,7 +195,8 @@ typedef struct CopyStateData
 	 * the buffer on each cycle.
 	 *
 	 * (In binary COPY FROM, attribute_buf holds the binary data for the
-	 * current field, while the other variables are not used.)
+	 * current field copied from raw_buf, which in turn holds raw data read
+	 * from the file.)
 	 */
 	StringInfoData attribute_buf;
 
@@ -219,8 +220,9 @@ typedef struct CopyStateData
 	 * Finally, raw_buf holds raw data read from the data source (file or
 	 * client connection).  CopyReadLine parses this data sufficiently to
 	 * locate line boundaries, then transfers the data to line_buf and
-	 * converts it.  Note: we guarantee that there is a \0 at
-	 * raw_buf[raw_buf_len].
+	 * converts it.  Also, CopyReadBinaryData() slices the data into
+	 * attributes based on the caller-specified field length in bytes.
+	 * Note: we guarantee that there is a \0 at raw_buf[raw_buf_len].
 	 */
 #define RAW_BUF_SIZE 65536		/* we palloc RAW_BUF_SIZE+1 bytes */
 	char	   *raw_buf;
@@ -257,6 +259,10 @@ typedef struct
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+/* Gets the bytes available in raw_buf. */
+
+#define BUF_BYTES	(cstate->raw_buf_len - cstate->raw_buf_index)
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
@@ -373,6 +379,7 @@ static int	CopyReadAttributesCSV(CopyState cstate);
 static Datum CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
 	 Oid typioparam, int32 typmod,
 	 bool *isnull);
+static int	CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
 static void CopyAttributeOutText(CopyState cstate, char *string);
 static void CopyAttributeOutCSV(CopyState cstate, char *string,
 bool use_quote, bool single_attr);
@@ -391,9 +398,7 @@ static void CopySendEndOfRow(CopyState cstate);
 static int	CopyGetData(CopyState cstate, void *databuf,
 		int minread, int maxread);
 static void CopySendInt32(CopyState cstate, int32 val);
-static bool CopyGetInt32(CopyState cstate, int32 *val);
 static void CopySendInt16(CopyState cstate, int16 val);
-static bool CopyGetInt16(CopyState cstate, int16 *val);
 
 
 /*
@@ -732,25 +737,6 @@ CopySendInt32(CopyState cstate, int32 val)
 	CopySendData(cstate, &buf, sizeof(buf));
 }
 
-/*
- * CopyGetInt32 reads an int32 that appears in network byte order
- *
- * Returns true if OK, false if EOF
- */
-static bool
-CopyGetInt32(CopyState cstate, int32 *val)
-{
-	uint32		buf;
-
-	if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
-	{
-		*val = 0;/* suppress compiler warning */
-		return false;
-	}
-	*val = (int32) pg_ntoh32(buf);
-	return true;
-}
-
 /*
  * CopySendInt16 sends an int16 in network byte order
  */
@@ -763,23 +749,6 @@ CopySendInt16(CopyState cstate, int16 val)
 	CopySendData(cstate, &buf, sizeof(buf));
 }
 
-/*
- * CopyGetInt16 reads an int16 that appears in network byte order
- */
-static bool
-CopyGetInt16(CopyState cstate, int16 *val)
-{
-	uint16		buf;
-
-	if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
-	{
-		*val = 0;/* suppress compiler warning */
-		return false;
-	}
-	*val = (int16) pg_ntoh16(buf);
-	return true;
-}
-
 
 /*
  * CopyLoadRawBuf loads some more data into raw_buf
@@ -800,7 +769,7 @@ CopyLoadRawBuf(CopyState cstate)
 	if (cstate->raw_buf_index < cstate->raw_buf_len)
 	{
 		/* Copy down the unprocessed data */
-		nbytes = cstate->raw_buf_len - cstate->raw_buf_index;
+		nbytes = BUF_BYTES;
 		mem

Re: WIP: BRIN multi-range indexes

2020-07-13 Thread Tomas Vondra

On Mon, Jul 13, 2020 at 02:54:56PM +0900, Masahiko Sawada wrote:

On Mon, 13 Jul 2020 at 09:33, Alvaro Herrera  wrote:


On 2020-Jul-13, Tomas Vondra wrote:

> On Sun, Jul 12, 2020 at 07:58:54PM -0400, Alvaro Herrera wrote:

> > Maybe we can try to handle this with some other function that interprets
> > the bytea in 'value' and returns a user-readable text.  I think it'd
> > have to be a superuser-only function, because otherwise you could easily
> > cause a crash by passing a value of a different opclass.  But since this
> > seems a developer-only thing, that restriction seems fine to me.
>
> Ummm, I disagree a superuser check is sufficient protection from a
> segfault or similar issues.

My POV there is that it's the user's responsibility to call the right
function; and if they fail to do so, it's their fault.  I agree it's not
ideal, but frankly these pageinspect things are not critical to get 100%
user-friendly.

> If we really want to print something nicer, I'd say it needs to be a
> special function in the BRIN opclass.

If that can be done, then +1.  We just need to ensure that the function
knows and can verify the type of index that the value comes from.  I
guess we can pass the index OID so that it can extract the opclass from
catalogs to verify.


+1 from me, too. Perhaps we can have it as optional. If a BRIN opclass
doesn't have it, the 'values' can be null.



I'd say that if the opclass does not have it, then we should print the
bytea value (or whatever the opclass uses to store the summary) using
the type functions.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Daniel Gustafsson
> On 13 Jul 2020, at 13:02, Pavel Stehule  wrote:

> I like JSON format. But why here? For this purpose the JSON is over 
> engineered.

I respectfully disagree, JSON is a commonly used and known format in systems
administration and most importantly: we already have code to parse it in the
frontend.

> This input file has  no nested structure - it is just a stream of lines. 

Well, it has a set of object types which in turn have objects.  There is more
structure than meets the eye.

Also, the current patch allows arbitrary whitespace before object names, but no
whitespace before comments etc.  Using something where the rules of parsing are
known is rarely a bad thing.

> I don't think so introducing JSON here can be a good idea.

Quite possibly it isn't, but not discussing options seems like a worse idea so
I wanted to bring it up.

> It is a really different case than pg_dump manifest file - in this case, in 
> this case pg_dump is consument. 

Right, as I said these are two different, while tangentially related, things.

cheers ./daniel



Re: pg_dump bug for extension owned tables

2020-07-13 Thread Fabrízio de Royes Mello
On Sat, Jul 11, 2020 at 8:07 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:
>
>
> On 6/26/20 2:10 PM, Fabrízio de Royes Mello wrote:
> >
> > On Fri, Jun 26, 2020 at 11:55 AM Fabrízio de Royes Mello
> > mailto:fabriziome...@gmail.com>> wrote:
> > >
> > >
> > > On Fri, Jun 26, 2020 at 11:24 AM Andrew Dunstan
> >  > > wrote:
> > > >
> > > >
> > > > On 6/26/20 9:57 AM, Andrew Dunstan wrote:
> > > > > It appears that for extension owned tables tbinfo.attgenerated
isn't
> > > > > being properly populated, so line 2050 in REL_12_STABLE, which
> > is line
> > > > > 2109 in git tip, is failing.
> > > > >
> > > > >
> > > >
> > > > Should have mentioned this is in src/bin/pg_dump/pg_dump.c
> > > >
> > >
> > > Having a look on it.
> > >
> >
> > Seems when qualify the schemaname the the "tbinfo->interesting" field
> > is not setted for extensions objects, so the getTableAttrs can't fill
> > the attgenerated field properly.
> >
> > I'm not 100% sure it's the correct way but the attached patch works
> > for me and all tests passed. Maybe we should add more TAP tests?
> >
> >
>
>
> I just tried this patch out on master, with the test case I gave
> upthread. It's not working, still getting a segfault.
>

Ohh really sorry about it... my bad... i completely forgot about it!!!

Due to my rush I ended up adding the wrong patch version. Attached the
correct version.

Will add TAP tests to src/test/modules/test_pg_dump

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e758b5c50a..a998e5c821 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -18037,6 +18037,8 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
 			configtbl->dataObj->filtercond = pg_strdup(extconditionarray[j]);
 	}
 }
+
+configtbl->interesting = dumpobj;
 			}
 		}
 		if (extconfigarray)


Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Jeff Davis
On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote:
> Updated summary:
> * For hash_mem = Tomas [7], Justin [16]
> * For hash_mem_multiplier with a default > 1.0 = DavidG [21]
> * For hash_mem_multiplier with default = 1.0 =  PeterG [15][0], Tom
> [20][24]

I am OK with these options, but I still prefer a simple escape hatch.

> * Maybe do nothing until we see how things go during beta = Bruce
> [3], Amit [10]
> * Just let users set work_mem = Stephen [21], Alvaro [4] (Alvaro
> changed his mind after Andres pointed out that changes other nodes in
> the plan too [25])

I am not on board with these options.

Regards,
Jeff Davis






Re: output columns of \dAo and \dAp

2020-07-13 Thread Tom Lane
Alexander Korotkov  writes:
> Good compromise.  Done as you proposed.

I'm OK with this version.

regards, tom lane




Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread Justin Pryzby
On Mon, Jul 13, 2020 at 12:04:09PM +0200, Daniel Gustafsson wrote:
> Sorry for jumping in late, but thinking about this extension to pg_dump:
> doesn't it make more sense to use an existing file format like JSON for this,
> given that virtually all devops/cd/etc tooling know about JSON already?
> 
> Considering its application and the problem space, I'd expect users to 
> generate
> this file rather than handcraft it with 10 rows of content, and standard file
> formats help there.

I mentioned having tested this patch as we would use it.  But it's likely I
*wouldn't* use it if the format was something which required added complexity
to pipe in from an existing shell script.

> At the very least it seems limiting to not include a file format version
> identifier since we'd otherwise risk running into backwards compat issues
> should we want to expand on this in the future.

Maybe .. I'm not sure.  The patch would of course be extended to handle
additional include/exclude options.  Is there any other future behavior we
might reasonably anticipate ?

If at some point we wanted to support another file format, maybe it would look
like: --format=v2:filters.txt (or maybe the old one would be v1:filters.txt)

-- 
Justin




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David Rowley
On Tue, 14 Jul 2020 at 01:13, Stephen Frost  wrote:
> Yes, increasing work_mem isn't unusual, at all.  What that tweet shows
> that I don't think folks who are suggesting things like setting this
> factor to 2.0 is that people may have a work_mem configured in the
> gigabytes- meaning that a 2.0 value would result in a work_mem of 5GB
> and a hash_mem of 10GB.  Now, I'm all for telling people to review their
> configurations between major versions, but that's a large difference
> that's going to be pretty deeply hidden in a 'multiplier' setting.

I think Peter seems to be fine with setting the default to 1.0, per [0].

This thread did split off a while back into "Default setting for
enable_hashagg_disk (hash_mem)", I did try and summarise who sits
where on this in [19].

I think it would be good if we could try to move towards getting
consensus here rather than reiterating our arguments over and over.

Updated summary:
* For hash_mem = Tomas [7], Justin [16]
* For hash_mem_multiplier with a default > 1.0 = DavidG [21]
* For hash_mem_multiplier with default = 1.0 =  PeterG [15][0], Tom [20][24]
* hash_mem out of scope for PG13 = Bruce [8], Andres [9]
* hashagg_mem default to -1 meaning use work_mem = DavidR [23] (2nd preference)
* Escape hatch that can be removed later when we get something better
= Jeff [11], DavidR [12], Pavel [13], Andres [14], Justin [1]
* Add enable_hashagg_spill = Tom [2] (I'm unclear on this proposal.
Does it affect the planner or executor or both?) (updated opinion in
[20])
* Maybe do nothing until we see how things go during beta = Bruce [3], Amit [10]
* Just let users set work_mem = Stephen [21], Alvaro [4] (Alvaro
changed his mind after Andres pointed out that changes other nodes in
the plan too [25])
* Swap enable_hashagg for a GUC that specifies when spilling should
occur. -1 means work_mem = Robert [17], Amit [18]
* hash_mem does not solve the problem = Tomas [6] (changed his mind in [7])

Perhaps people who have managed to follow this thread but not chip in
yet can reply quoting the option above that they'd be voting for. Or
if you're ok changing your mind to some option that has more votes
than the one your name is already against. That might help move this
along.

David

[0] 
https://www.postgresql.org/message-id/CAH2-Wz=VV6EKFGUJDsHEqyvRk7pCO36BvEoF5sBQry_O6R2=n...@mail.gmail.com
[1] https://www.postgresql.org/message-id/20200624031443.gv4...@telsasoft.com
[2] https://www.postgresql.org/message-id/2214502.1593019...@sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/20200625182512.gc12...@momjian.us
[4] https://www.postgresql.org/message-id/20200625224422.GA9653@alvherre.pgsql
[5] 
https://www.postgresql.org/message-id/caa4ek1k0cgk_8hryxsvppgoh_z-ny+uztcfwb2we6baj9dx...@mail.gmail.com
[6] 
https://www.postgresql.org/message-id/20200627104141.gq7d3hm2tvoqgjjs@development
[7] 
https://www.postgresql.org/message-id/20200629212229.n3afgzq6xpxrr4cu@development
[8] https://www.postgresql.org/message-id/20200703030001.gd26...@momjian.us
[9] 
https://www.postgresql.org/message-id/20200707171216.jqxrld2jnxwf5...@alap3.anarazel.de
[10] 
https://www.postgresql.org/message-id/CAA4eK1KfPi6iz0hWxBLZzfVOG_NvOVJL=9UQQirWLpaN=ka...@mail.gmail.com
[11] 
https://www.postgresql.org/message-id/8bff2e4e8020c3caa16b61a46918d21b573eaf78.ca...@j-davis.com
[12] 
https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=ukzkq1_fzhy+xzmuzajinj6rwythh4...@mail.gmail.com
[13] 
https://www.postgresql.org/message-id/cafj8prbf1w4ndz-ynd+muptfizfbs7+cpjc4ob8v9d3x0ms...@mail.gmail.com
[14] 
https://www.postgresql.org/message-id/20200624191433.5gnqgrxfmucex...@alap3.anarazel.de
[15] 
https://www.postgresql.org/message-id/CAH2-WzmD+i1pG6rc1+Cjc4V6EaFJ_qSuKCCHVnH=oruqd-z...@mail.gmail.com
[16] https://www.postgresql.org/message-id/20200703024649.gj4...@telsasoft.com
[17] 
https://www.postgresql.org/message-id/ca+tgmobyv9+t-wjx-ctpdqurcgt1thz1ml3v1nxc4m4g-h6...@mail.gmail.com
[18] 
https://www.postgresql.org/message-id/caa4ek1k0cgk_8hryxsvppgoh_z-ny+uztcfwb2we6baj9dx...@mail.gmail.com
[19] 
https://www.postgresql.org/message-id/caaphdvrp1fiev4aql2zscbhi32w+gp01j+qnhwou7y7p-qf...@mail.gmail.com
[20] https://www.postgresql.org/message-id/2107841.1594403...@sss.pgh.pa.us
[21] 
https://www.postgresql.org/message-id/20200710141714.gi12...@tamriel.snowman.net
[22] 
https://www.postgresql.org/message-id/CAKFQuwa2gwLa0b%2BmQv5r5A_Q0XWsA2%3D1zQ%2BZ5m4pQprxh-aM4Q%40mail.gmail.com
[23] 
https://www.postgresql.org/message-id/caaphdvpxbhhp566rrjjwgnfs0yoxr53eztz5lhh-jcekvqd...@mail.gmail.com
[24] https://www.postgresql.org/message-id/2463591.1594514...@sss.pgh.pa.us
[25] 
https://www.postgresql.org/message-id/20200625225853.GA11137%40alvherre.pgsql




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tomas Vondra

On Mon, Jul 13, 2020 at 01:51:42PM +0200, Peter Eisentraut wrote:

On 2020-04-07 20:20, Jeff Davis wrote:

Now that we have Disk-based Hash Aggregation, there are a lot more
situations where the planner can choose HashAgg. The
enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
costing. If false, it only generates a HashAgg path if it thinks it
will fit in work_mem, similar to the old behavior (though it wlil now
spill to disk if the planner was wrong about it fitting in work_mem).
The current default is true.


I have an anecdote that might be related to this discussion.

I was running an unrelated benchmark suite.  With PostgreSQL 12, one 
query ran out of memory.  With PostgreSQL 13, the same query instead 
ran out of disk space.  I bisected this to the introduction of 
disk-based hash aggregation.  Of course, the very point of that 
feature is to eliminate the out of memory and make use of disk space 
instead.  But running out of disk space is likely to be a worse 
experience than running out of memory.  Also, while it's relatively 
easy to limit memory use both in PostgreSQL and in the kernel, it is 
difficult or impossible to limit disk space use in a similar way.




Why is running out of disk space worse experience than running out of
memory?

Sure, it'll take longer and ultimately the query fails (and if it fills
the device used by the WAL then it may also cause shutdown of the main
instance due to inability to write WAL). But that can be prevented by
moving the temp tablespace and/or setting the temp file limit, as
already mentioned.

With OOM, if the kernel OOM killer decides to act, it may easily bring
down the instance too, and there are much less options to prevent that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Commitfest 2020-07 almost halfway

2020-07-13 Thread Daniel Gustafsson
We are fast approaching mid-July, and with it Mid-commitfest.  As has been the
case with most commitfests for a while, this CF had a record number of entries
with 246 patches. As of this writing, the status breakdown looks like this:

  Needs review: 139
  Waiting on Author: 34

  Ready for Committer: 8
  Committed: 43
  Returned with Feedback: 11
  Rejected: 1
  Withdrawn: 9

  Moved to next CF: 1

which means we have reached closure on ~26% of the patches.  Let's collectively
try to get that number closer to 50% to really jumpstart v14!

All patches which didn't apply when the CF started were notified on their
respective threads, if no new version is posted by the last week of the CF they
will be considered stalled and returned with feedback.

cheers ./daniel



Re: OpenSSL 3.0.0 compatibility

2020-07-13 Thread Tom Lane
Peter Eisentraut  writes:
>>> where would be a good place to define
>>> OPENSSL_API_COMPAT?

> Actually, it would be most formally correct to set it using AC_DEFINE in 
> configure.in, so that configure tests see it.

Yeah, very good point.

regards, tom lane




Re: proposal: possibility to read dumped table's name from file

2020-07-13 Thread vignesh C
On Mon, Jul 13, 2020 at 1:51 PM Pavel Stehule  wrote:
>> Can this be changed to dump objects and data based on the filter
>> expressions from the filter file.
>
>
> I am sorry, I don't understand. This should work for data from specified by 
> filter without any modification.
>
I meant can this:
printf(_("  --filter=FILENAMEread object name filter
expressions from file\n"));
be changed to:
printf(_("  --filter=FILENAMEdump objects and data based
on the filter expressions from the filter file\n"));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread vignesh C
On Mon, Jul 13, 2020 at 8:02 AM Amit Langote  wrote:
> By the way, considering the rebase over cd22d3cdb9b, it seemed to me
> that we needed to update the comments in CopyStateData struct
> definition a bit more.  While doing that, I realized
> CopyReadFromRawBuf as a name for the new function might be misleading
> as long as we are only using it for binary data.  Maybe
> CopyReadBinaryData is more appropriate?  See attached v4 with these
> and a few other cosmetic changes.
>

I had one small comment:
+{
+   int copied_bytes = 0;
+
+#define BUF_BYTES  (cstate->raw_buf_len - cstate->raw_buf_index)
+#define DRAIN_COPY_RAW_BUF(cstate, dest, nbytes)\
+   do {\
+   memcpy((dest), (cstate)->raw_buf +
(cstate)->raw_buf_index, (nbytes));\
+   (cstate)->raw_buf_index += (nbytes);\
+   } while(0)

BUF_BYTES could be used in CopyLoadRawBuf function also.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost  wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > Have you got a better proposal that is reasonably implementable for v13?
> > > (I do not accept the argument that "do nothing" is a better proposal.)
> 
> > So, no, I don't agree that 'do nothing' (except ripping out the one GUC
> > that was already added) is a worse proposal than adding another work_mem
> > like thing that's only for some nodes types.
> 
> The question was "Have you got a better proposal that is reasonably
> implementable for v13?".
> 
> This is anecdotal, but just today somebody on Twitter reported
> *increasing* work_mem to stop getting OOMs from group aggregate +
> sort:
> 
> https://twitter.com/theDressler/status/1281942941133615104

Yes, increasing work_mem isn't unusual, at all.  What that tweet shows
that I don't think folks who are suggesting things like setting this
factor to 2.0 is that people may have a work_mem configured in the
gigabytes- meaning that a 2.0 value would result in a work_mem of 5GB
and a hash_mem of 10GB.  Now, I'm all for telling people to review their
configurations between major versions, but that's a large difference
that's going to be pretty deeply hidden in a 'multiplier' setting.

I'm still wholly unconvinced that we need such a setting, just to be
clear, but I don't think there's any way it'd be reasonable to have it
set to something other than "whatever work_mem is" by default- and it
needs to actually be "what work_mem is" and not "have the same default
value" or *everyone* would have to configure it.

> It was possible to fix the problem in this instance, since evidently
> there wasn't anything else that really did try to consume ~5 GB of
> work_mem memory. Evidently the memory isn't available in any general
> sense, so there are no OOMs now. Nevertheless, we can expect OOMs on
> this server just as soon as there is a real need to do a ~5GB sort,
> regardless of anything else.

Eh?  That's not at all what it looks like- they were getting OOM's
because they set work_mem to be higher than the actual amount of memory
they had and the Sort before the GroupAgg was actually trying to use all
that memory.  The HashAgg ended up not needing that much memory because
the aggregated set wasn't actually that large.  If anything, this shows
exactly what Jeff's fine work here is (hopefully) going to give us- the
option to plan a HashAgg in such cases, since we can accept spilling to
disk if we end up underestimate, or take advantage of that HashAgg
being entirely in memory if we overestimate.

> I don't think that this kind of perverse effect is uncommon. Hash
> aggregate can naturally be far faster than group agg + sort, Hash agg
> can naturally use a lot less memory in many cases, and we have every
> reason to think that grouping estimates are regularly totally wrong.

I'm confused as to why we're talking about the relative performance of a
HashAgg vs. a Sort+GroupAgg- of course the HashAgg is going to be faster
if it's got enough memory, but that's a constraint we have to consider
and deal with because, otherwise, the query can end up failing and
potentially impacting other queries or activity on the system, including
resulting in the entire database system falling over due to the OOM
Killer firing and killing a process and the database ending up
restarting and going through crash recovery, which is going to be quite
a bit worse than performance maybe not being great.

> You're significantly underestimating the risk.

Of... what?  That we'll end up getting worse performance because we
underestimated the size of the result set and we end up spilling to
disk with the HashAgg?  I think I'm giving that risk the amount of
concern it deserves- which is, frankly, not very much.  Users who run
into that issue, as this tweet *also* showed, are familiar with work_mem
and can tune it to address that.  This reaction to demand a new GUC to
break up work_mem into pieces strikes me as unjustified, and doing so
during beta makes it that much worse.

Having looked back, I'm not sure that I'm really in the minority
regarding the proposal to add this at this time either- there's been a
few different comments that it's too late for v13 and/or that we should
see if we actually end up with users seriously complaining about the
lack of a separate way to specify the memory for a given node type,
and/or that if we're going to do this then we should have a broader set
of options covering other nodes types too, all of which are positions
that I agree with.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut

On 2020-07-13 14:16, David Rowley wrote:

Isn't that what temp_file_limit is for?


Yeah, I guess that is so rarely used that I had forgotten about it.  So 
maybe that is also something that more users will want to be aware of.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Binary support for pgoutput plugin

2020-07-13 Thread Dave Cramer
On Sat, 11 Jul 2020 at 10:20, Tom Lane  wrote:

> Petr Jelinek  writes:
> > On 11/07/2020 14:14, Dave Cramer wrote:
> >> So is there any point in having them as options then ?
>
> > I am guessing this is copied from pglogical, right? We have them there
> > because it can optionally send data in the on-disk format (not the
> > network binary format) and there this matters, but for network binary
> > format they do not matter as Tom says.
>
> Ah, I wondered why that was there at all.  Yes, you should just delete
> all that logic --- it's irrelevant as long as we use the send/recv
> functions.
>
> regards, tom lane
>


Ok,

removed all the unnecessary options.
Added the test case that Daniel had created.
Added a note to the docs.

Note WAL_DEBUG is removed in patch 3. I could rebase that into patch 1 if
required.

Thanks,

Dave


0003-Actually-set-the-default-to-unchanged-as-per-the-com.patch
Description: Binary data


0002-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data


0004-Add-psql-support-and-tests.patch
Description: Binary data


0001-Add-binary-protocol-for-publications-and-subscriptio.patch
Description: Binary data


  1   2   >