date:20200526

Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-26 Thread Kyotaro Horiguchi

At Tue, 26 May 2020 22:56:39 -0400, Isaac Morland  
wrote in 
> On Tue, 26 May 2020 at 21:46, Kyotaro Horiguchi 
> wrote:
> 
> > At Tue, 26 May 2020 09:10:40 -0400, Jeff Janes 
> > wrote in
> > > In postgresql.conf, it says:
> > >
> > > #max_slot_wal_keep_size = -1  # measured in bytes; -1 disables
> > >
> > > I don't know if that is describing the dimension of this parameter or the
> > > units of it, but the default units for it are megabytes, not individual
> > > bytes, so I think it is pretty confusing.
> >
> > Agreed. It should be a leftover at the time the unit was changed
> > (before committed) to MB from bytes.  The default value makes the
> > confusion worse.
> >
> > Is the following works?
> >
> > #max_slot_wal_keep_size = -1  # in MB; -1 disables
> 
> 
> Extreme pedant question: Is it MB (10^6 bytes) or MiB (2^20 bytes)?


GUC variables for file/memory sizes are in a traditional
representation, that is, a power of two represented by
SI-prefixes. AFAICS PostgreSQL doesn't use binary-prefixed units.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: password_encryption default

2020-05-26 Thread Peter Eisentraut


On 2020-05-27 08:00, Michael Paquier wrote:

On Tue, May 26, 2020 at 10:25:25AM +0200, Peter Eisentraut wrote:

Yeah, I was too enthusiastic about removing that.  Here is a better patch.


+as an MD5 hash.  (on is also accepted, as an alias
+for md5.)  The default is
+scram-sha-256.
Shouldn't password_encryption = on/true/1/yes be an equivalent of
scram-sha-256 as the default gets changed?


I think these are mostly legacy options anyway, so if we wanted to make 
a change, we should remove them.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-26 Thread Michael Paquier

On Wed, May 27, 2020 at 10:46:27AM +0900, Kyotaro Horiguchi wrote:
> Agreed. It should be a leftover at the time the unit was changed
> (before committed) to MB from bytes.  The default value makes the
> confusion worse.
> 
> Is the following works?
> 
> #max_slot_wal_keep_size = -1  # in MB; -1 disables

Indeed, better to fix that.  The few GUCs using memory units that have
such a mention in their comments use the actual name of the memory
unit, and not its abbreviation (see log_temp_files).  So it seems more
logic to me to just use "in megabytes; -1 disables", that would be
also more consistent with the time-unit-based ones.
--
Michael


signature.asc
Description: PGP signature

Re: segmentation fault using currtid and partitioned tables

2020-05-26 Thread Michael Paquier

On Wed, May 27, 2020 at 12:29:39AM -0500, Jaime Casanova wrote:
> so, currently the patch just installs protections on both currtid_*
> functions and adds some tests... therefore we can consider it as a bug
> fix and let it go in 13? actually also backpatch in 12...

Yes, and it has the advantage to be simple.

> only point to mention is a typo (a missing "l") in an added comment:
> 
> + * currtid_byrename

Oops, thanks.
--
Michael


signature.asc
Description: PGP signature

Re: password_encryption default

2020-05-26 Thread Michael Paquier

On Tue, May 26, 2020 at 10:25:25AM +0200, Peter Eisentraut wrote:
> Yeah, I was too enthusiastic about removing that.  Here is a better patch.

+as an MD5 hash.  (on is also accepted, as an alias
+for md5.)  The default is
+scram-sha-256.
Shouldn't password_encryption = on/true/1/yes be an equivalent of
scram-sha-256 as the default gets changed?
--
Michael


signature.asc
Description: PGP signature

Re: segmentation fault using currtid and partitioned tables

2020-05-26 Thread Jaime Casanova

On Mon, 25 May 2020 at 22:01, Michael Paquier  wrote:
>
> On Mon, May 25, 2020 at 06:29:10PM +0900, Michael Paquier wrote:
> > Perhaps you are right though, and that we don't need to spend this
> > much energy into improving the error messages so I am fine to discard
> > this part.  At the end, in order to remove the crashes, you just need
> > to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
> > rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
> > instead of elog(), and keep the test coverage of the previous patch
> > (including the tests for the aggregates I noticed were missing).
> > Would you be fine with that?
>
> And this means the attached.  Thoughts are welcome.

so, currently the patch just installs protections on both currtid_*
functions and adds some tests... therefore we can consider it as a bug
fix and let it go in 13? actually also backpatch in 12...

patch works, server doesn't crash anymore

only point to mention is a typo (a missing "l") in an added comment:

+ * currtid_byrename

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default gucs for EXPLAIN

2020-05-26 Thread David G. Johnston

On Tuesday, May 26, 2020, David Rowley  wrote:

> On Tue, 26 May 2020 at 23:59, Vik Fearing  wrote:
> > Are you saying we should have all new EXPLAIN options off forever into
> > the future because apps won't know about the new data?  I guess we
> > should also not ever introduce new plan nodes because those won't be
> > known either.
>
> Another argument against this is that it creates dependency among the
> new GUCs. Many of the options are not compatible with each other. e.g.
>
> postgres=# explain (timing on) select 1;
> ERROR:  EXPLAIN option TIMING requires ANALYZE
>
> Would you propose we just error out in that case, or should we
> silently enable the required option, or disable the conflicting
> option?
>
>
The same thing we do today...ignore options that require analyze if analyze
is not specified.  There are no other options documented that are dependent
with options besides than analyze.  The docs say timing defaults to on, its
only when explicitly specified instead of being treated as a default that
the user message appears.  All the GUCs are doing is changing the default.

David J.

Re: Default gucs for EXPLAIN

2020-05-26 Thread David Rowley

On Tue, 26 May 2020 at 23:59, Vik Fearing  wrote:
> Are you saying we should have all new EXPLAIN options off forever into
> the future because apps won't know about the new data?  I guess we
> should also not ever introduce new plan nodes because those won't be
> known either.

Another argument against this is that it creates dependency among the
new GUCs. Many of the options are not compatible with each other. e.g.

postgres=# explain (timing on) select 1;
ERROR:  EXPLAIN option TIMING requires ANALYZE

Would you propose we just error out in that case, or should we
silently enable the required option, or disable the conflicting
option?

David

Re: Why don't you to document pg_shmem_allocations view's name list?

2020-05-26 Thread Masahiro Ikeda


On 2020-05-26 11:08, Michael Paquier wrote:

On Tue, May 26, 2020 at 10:16:19AM +0900, Masahiro Ikeda wrote:

I think it is more useful if the name list of the
pg_shmem_allocations view is listed in one page.

Why don't you document pg_shmem_allocations view's name list?


Documenting that would create a dependency between the docs and the
backend code, with very high chances of missing doc updates each time
new code that does an extra shared memory allocation is added.  I
think that there would be a point in bringing more sanity and
consistency to the names of the shared memory sections passed down to
ShmemInitStruct() and SimpleLruInit() though.
--
Michael


Thanks for replaying.

I understood the reason why the name list is not documented.
I agree your opinion.

Regards,

--
Masahiro Ikeda

Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-26 Thread Isaac Morland

On Tue, 26 May 2020 at 21:46, Kyotaro Horiguchi 
wrote:

> At Tue, 26 May 2020 09:10:40 -0400, Jeff Janes 
> wrote in
> > In postgresql.conf, it says:
> >
> > #max_slot_wal_keep_size = -1  # measured in bytes; -1 disables
> >
> > I don't know if that is describing the dimension of this parameter or the
> > units of it, but the default units for it are megabytes, not individual
> > bytes, so I think it is pretty confusing.
>
> Agreed. It should be a leftover at the time the unit was changed
> (before committed) to MB from bytes.  The default value makes the
> confusion worse.
>
> Is the following works?
>
> #max_slot_wal_keep_size = -1  # in MB; -1 disables


Extreme pedant question: Is it MB (10^6 bytes) or MiB (2^20 bytes)?

Re: some grammar refactoring

2020-05-26 Thread Tom Lane

Robert Haas  writes:
> On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
>  wrote:
>> Most utility commands don't have an intermediate parse analysis pass.
>> They just go straight from the grammar to the execution.  Maybe that
>> could be rethought, but that's the way it is now.

> I think it can and should be rethought at some point.

The other problem is that the ones that do have explicit parse analysis
tend to be doing it at the wrong time.  I've fixed some ALTER TABLE
problems by rearranging that, but we still have open bugs that are due
to this type of mistake, eg [1].  I agree that we need a rethink, and
we need it badly.

If this patch is changing when any parse-analysis-like actions happen,
then I would say that it needs very careful review --- much more than
the "refactoring" label would suggest.  Maybe it's making things better,
or maybe it doesn't matter; but this area is a minefield.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/16272-6e32da020e9a9381%40postgresql.org

Re: hash join error improvement (old)

2020-05-26 Thread Thomas Munro

On Wed, May 27, 2020 at 1:30 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > There are more uses of BufFileRead that don't bother to distinguish
> > these two cases apart, though -- logtape.c, tuplestore.c,
> > gistbuildbuffers.c all do the same.
>
> Yeah.  I rather suspect that callers of BufFileRead/Write are mostly
> expecting that those functions will throw an ereport() for any interesting
> error condition.  Maybe we should make it so, instead of piecemeal fixing
> the callers?

Yeah.  I proposed that over here:

https://www.postgresql.org/message-id/ca+hukgk0w+gts8advskdvu7cfzse5q+0np_9kpsxg2na1ne...@mail.gmail.com

But I got stuck trying to figure out whether to back-patch (arguably
yes: there are bugs here, but arguably no: the interfaces change).

Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-26 Thread Kyotaro Horiguchi

At Tue, 26 May 2020 09:10:40 -0400, Jeff Janes  wrote in 
> In postgresql.conf, it says:
> 
> #max_slot_wal_keep_size = -1  # measured in bytes; -1 disables
> 
> I don't know if that is describing the dimension of this parameter or the
> units of it, but the default units for it are megabytes, not individual
> bytes, so I think it is pretty confusing.

Agreed. It should be a leftover at the time the unit was changed
(before committed) to MB from bytes.  The default value makes the
confusion worse.

Is the following works?

#max_slot_wal_keep_size = -1  # in MB; -1 disables

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0bd9a7360caaf8a31d71ba6392d9e4b7496a9c02 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 27 May 2020 10:42:24 +0900
Subject: [PATCH] Change a comment in postgresql.conf.sample

Fix a lame comment for max_slot_wal_keep_size so that it suggests the
unit for the value.
---
 src/backend/utils/misc/postgresql.conf.sample | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 995b6ca155..43340eee71 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -289,7 +289,7 @@
 #max_wal_senders = 10		# max number of walsender processes
 # (change requires restart)
 #wal_keep_segments = 0		# in logfile segments; 0 disables
-#max_slot_wal_keep_size = -1	# measured in bytes; -1 disables
+#max_slot_wal_keep_size = -1	# in MB; -1 disables
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 
 #max_replication_slots = 10	# max number of replication slots
-- 
2.18.2

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Melanie Plageman

On Tue, May 26, 2020 at 5:40 PM Jeff Davis  wrote:

> On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:
> >
> > As for the tlist fix, I think that's mostly ready too - the one thing
> > we
> > should do is probably only doing it for AGG_HASHED. For AGG_SORTED
> > it's
> > not really necessary.
>
> Melanie previously posted a patch to avoid spilling unneeded columns,
> but it introduced more code:
>
>
>
> https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com
>
> and it seems that Heikki also looked at it. Perhaps we should get an
> acknowledgement from one of them that your one-line change is the right
> approach?
>
>
I spent some time looking at it today, and, it turns out I was wrong.

I thought that there was a case I had found where CP_SMALL_TLIST did not
eliminate as many columns as could be eliminated for the purposes of
spilling, but, that turned out not to be the case.

I changed CP_LABEL_TLIST to CP_SMALL_TLIST in
create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of
different queries and this 2-3 line change worked for all the cases I
tried. Is that where you made the change?
And then are you proposing to set it based on the aggstrategy to either
CP_LABEL_TLIST or CP_SMALL_TLIST here?

-- 
Melanie Plageman

Re: hash join error improvement (old)

2020-05-26 Thread Tom Lane

Alvaro Herrera  writes:
> There are more uses of BufFileRead that don't bother to distinguish
> these two cases apart, though -- logtape.c, tuplestore.c,
> gistbuildbuffers.c all do the same.

Yeah.  I rather suspect that callers of BufFileRead/Write are mostly
expecting that those functions will throw an ereport() for any interesting
error condition.  Maybe we should make it so, instead of piecemeal fixing
the callers?

regards, tom lane

Re: Remove page-read callback from XLogReaderState.

2020-05-26 Thread Kyotaro Horiguchi

Thank you for the comment.

At Tue, 26 May 2020 20:17:47 +0800, Craig Ringer  wrote 
in 
> On Tue, 26 May 2020, 15:40 Kyotaro Horiguchi, 
> wrote:
> 
> >
> > This patch removes all the three callbacks (open/close/page_read) in
> > XL_ROUTINE from XLogReaderState.  It only has "cleanup" callback
> > instead.
> >
> 
> I actually have a use in mind for these callbacks - to support reading WAL
> for logical decoding from a restore_command like tool. So we can archive
> wal when it's no longer required for recovery and reduce the risk of
> filling pg_wal if a standby lags.
> 
> I don't object to your cleanup at all. I'd like it to  be properly
> pluggable, whereas right now it has hard coded callbacks that differ for
> little reason.
>
> Just noting that the idea of a callback here isn't a bad thing.

I agree that plugin is generally not bad as far as it were standalone,
that is, as far as it is not tightly cooperative with the opposite
side of the caller of it.  However, actually it seems to me that the
xlogreader plugins are too-deeply coupled with the callers of
xlogreader in many aspects involving error-handling and
retry-mechanism.

As Alvaro mentioned we may have page-decrypt callback shortly as
another callback of xlogreader.  Xlogreader could be more messy by
introducing such plugins, that actually have no business with
xlogreader at all.

Evidently xlogreader can be a bottom-end module (that is, a module
that doesn't depend on another module). It is I think a good thing to
isolate xlogreader from the changes of its callers and correlated
plugins.

A major problem of this patch is that the state machine used there
might be another mess here, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Jeff Davis

On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:
> Yeah. I agree prefetching is definitely out of v13 scope. It might be
> interesting to try how useful would it be, if you're willing to spend
> some time on a prototype.

I think a POC would be pretty quick; I'll see if I can hack something
together.

> I think it's pretty much ready to go.

Committed with max of 128 preallocated blocks. Minor revisions.

> 
> As for the tlist fix, I think that's mostly ready too - the one thing
> we
> should do is probably only doing it for AGG_HASHED. For AGG_SORTED
> it's
> not really necessary.

Melanie previously posted a patch to avoid spilling unneeded columns,
but it introduced more code:

https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com

and it seems that Heikki also looked at it. Perhaps we should get an
acknowledgement from one of them that your one-line change is the right
approach?

Regards,
Jeff Davis

Re: Default gucs for EXPLAIN

2020-05-26 Thread David G. Johnston

On Tue, May 26, 2020 at 4:53 PM David Rowley  wrote:

> If we add
> a new executor node then that's something that the server will send to
> the client.  The client does not need knowledge about which version of
> PostreSQL it is connected to. If it receives details about some new
> node type in an EXPLAIN then it can be fairly certain that the server
> supports that node type.
>

The above is basically how I imagine explain handling software works today
- if it sees a specific structure in the output it processes it.  It has
zero expectations about whether a feature with a option knob is set to true
or false.  And its deals with the one non-boolean option by examining the
output text.

> What we're talking about here is the opposite direction. The client is
> sending the command to the server, and the command it'll need to send
> is going to have to be specific to the server version.   Now perhaps
> all such tools already have good infrastructure to change behaviour
> based on version, after all, these tools do also tend to query
> catalogue tables from time to time and those change between versions.
>

I don't see how adding these optional GUCs impacts that materially.  If the
client provides a custom UI to the user and then writes an explain command
itself it will need to possibly understand version differences whether
these GUCs exist or not.

To hammer the point home if that client software is memorizing the choices
made for the various options and then conditions its output based upon
those choices then it should be specifying every one of them explicitly, in
which case the GUCs wouldn't matter.  If it is somehow depending upon the
existing defaults and user choices to figure out the option values then,
yes, the GUCs would be hidden information that may possibly confuse it if,
say, a user has a GUC BUFFERS on but didn't make a choice in the client UI
which defaulted to FALSE mimicking our default and because the default was
chosen didn't output BUFFER off but left the option unspecified and now the
buffers appear, which it for some reason isn't expecting and thus blows
up.  I could care less about that client and certainly wouldn't let its
possible existence hold me back from adding a feature that bare-bones
client users who send their own explain queries would find useful.

David J.

Re: Default gucs for EXPLAIN

2020-05-26 Thread David G. Johnston

On Tue, May 26, 2020 at 4:30 AM David Rowley  wrote:

>
> I imagine the
> authors of those applications might get upset if we create something
> outside of the command that controls what the command does. Perhaps
> the idea here is not quite as bad as that as applications could still
> override the options by mentioning each EXPLAIN option in the command
> they send to the server.
>

I admittedly haven't tried to write an explain output parser but I'm
doubting the conclusion that it is necessary to know the values of the
various options in order to properly parse the output.

The output format type is knowable by observing the actual structure (first
few characters probably) of the output and for everything else (all of the
booleans) any parser worth its salt is going to be able to parse output
where every possible setting is set to on.

I'm inclined to go with having everything except ANALYZE be something that
has a GUC default override.

David J.

Re: New 'pg' consolidated metacommand patch

2020-05-26 Thread David G. Johnston

On Tue, May 26, 2020 at 4:19 PM Mark Dilger 
wrote:

> I'd also appreciate +1 and -1 votes on the overall idea, in case this
> entire feature, regardless of implementation, is simply something the
> community does not want.
>

-1, at least as part of core.  My question would be how much of this is
would be needed if someone were to create an external project that
installed a "pg" command on top of an existing PostgreSQL installation.  Or
put differently, how many of the changes to the existing binaries are
required versus nice-to-have?

David J.

Re: Default gucs for EXPLAIN

2020-05-26 Thread David Rowley

On Tue, 26 May 2020 at 23:59, Vik Fearing  wrote:
>
> On 5/26/20 1:30 PM, David Rowley wrote:
> > On Tue, 26 May 2020 at 13:36, Bruce Momjian  wrote:
> >>
> >> On Sat, May 23, 2020 at 06:16:25PM +, Nikolay Samokhvalov wrote:
> >>> This is a very good improvement! Using information about buffers is my 
> >>> favorite
> >>> way to optimize queries.
> >>>
> >>> Not having BUFFERS enabled by default means that in most cases, when 
> >>> asking for
> >>> help, people send execution plans without buffers info.
> >>>
> >>> And it's simply in on event to type "(ANALYZE, BUFFERS)" all the time.
> >>>
> >>> So I strongly support this change, thank you, Vik.
> >>
> >> I am not excited about this new feature.
> >
> > I'm against adding GUCs to control what EXPLAIN does by default.
> >
> > A few current GUCs come to mind which gives external control to a
> > command's behaviour are:
> >
> > standard_conforming_strings
> > backslash_quote
> > bytea_output
> >
> > It's pretty difficult for application authors to write code that will
> > just work due to these GUCs. We end up with GUCs like
> > escape_string_warning to try and help application authors find areas
> > which may be problematic.
> >
> > It's not an easy thing to search for in the archives, but we've
> > rejected GUCs that have proposed new ways which can break applications
> > in this way. For example [1]. You can see some arguments against that
> > in [2].
> >
> > Now, there are certainly far fewer applications out there that will
> > execute an EXPLAIN, but the number is still above zero. I imagine the
> > authors of those applications might get upset if we create something
> > outside of the command that controls what the command does. Perhaps
> > the idea here is not quite as bad as that as applications could still
> > override the options by mentioning each EXPLAIN option in the command
> > they send to the server. However, we're not done adding new options
> > yet, so by doing this we'd be pretty much insisting that applications
> > that use EXPLAIN know about all EXPLAIN options for the server version
> > they're connected to. That seems like a big demand given that we've
> > been careful to still support the old
> > EXPLAIN syntax after we added the new way to specify the options in 
> > parenthesis.
>
>
> Nah, this argument doesn't hold.  If an app wants something on or off,
> it should say so.  If it doesn't care, then it doesn't care.
>
> Are you saying we should have all new EXPLAIN options off forever into
> the future because apps won't know about the new data?  I guess we
> should also not ever introduce new plan nodes because those won't be
> known either.

I don't think this is a particularly good counter argument.  If we add
a new executor node then that's something that the server will send to
the client.  The client does not need knowledge about which version of
PostreSQL it is connected to. If it receives details about some new
node type in an EXPLAIN then it can be fairly certain that the server
supports that node type.

What we're talking about here is the opposite direction. The client is
sending the command to the server, and the command it'll need to send
is going to have to be specific to the server version.   Now perhaps
all such tools already have good infrastructure to change behaviour
based on version, after all, these tools do also tend to query
catalogue tables from time to time and those change between versions.
Perhaps it would be good to hear from authors of such tools and get
their input.  If they all agree that it's not a problem then that
certainly weakens my argument, but if they don't then perhaps you
should reconsider.

David

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-26 Thread Andy Fan

You can use the attached sql to reproduce this issue, but I'm not sure you
can
get the above result at the first time that is because when optimizer think
the
2 index scan have the same cost, it will choose the first one it found, the
order
depends on RelationGetIndexList.  If so,  you may try drop and create
j1_i_im5 index.

The sense behind this patch is we still use the cost based optimizer, just
when we
we find out the 2 index scans have the same cost,  we prefer to use the
index which
have more qual filter on Index Cond.  This is implemented by adjust the
qual cost
on index filter slightly higher.

The issue here is not so uncommon in real life.  consider a log based
application, which
has serval indexes on with create_date as a leading column,  when the
create_date
first load the for the given day but before the new statistics is gathered,
that probably run
into this issue.

-- 
Best Regards
Andy Fan


index_choose.sql
Description: Binary data

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-26 Thread Andy Fan

On Tue, May 26, 2020 at 9:59 PM Ashutosh Bapat 
wrote:

> On Tue, May 26, 2020 at 1:52 PM Andy Fan  wrote:
> >
> >
> > Consider the below example:
> >
> > create table j1(i int, im5 int,  im100 int, im1000 int);
> > insert into j1 select i, i%5, i%100, i%1000 from generate_series(1,
> 1000)i;
> > create index j1_i_im5 on j1(i, im5);
> > create index j1_i_im100 on j1(i, im100);
> > analyze j1;
> > explain select * from j1 where i = 100 and im5 = 5;
> >
> > We may get the plan like this:
> >
> > demo=# explain select  * from  j1 where i = 100 and im5 = 1;
> >   QUERY PLAN
> > --
> >  Index Scan using j1_i_im100 on j1  (cost=0.43..8.46 rows=1 width=16)
> >Index Cond: (i = 100)
> >Filter: (im5 = 1)
> > (3 rows)
> >
> > At this case, optimizer can estimate there are only 1 row to return, so
> both
> > indexes have same cost, which one will be choose is un-controlable. This
> is
> > fine for above query based on the estimation is accurate. However
> estimation
> > can't be always accurate in real life. Some inaccurate estimation can
> cause an
> > wrong index choose. As an experience, j1_i_im5 index should always be
> choose
> > for above query.
>
> I think we need a better example where choosing an index makes a
> difference.
>
> An index can be chosen just because it's path was created before some
> other more appropriate index but the cost difference was within fuzzy
> limit. Purely based on the order in which index paths are created.
>

Here is an further example with the above case:

demo=# insert into j1 select 1, 1, 1, 1 from generate_series(1, 10)i;
INSERT 0 10

With the current implementation, it is

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
QUERY PLAN
--
 Index Scan using j1_i_im100 on j1  (cost=0.43..8.44 rows=1 width=16)
(actual time=63.431..63.431 rows=0 loops=1)
   Index Cond: (i = 1)
   Filter: (im5 = 2)
   Rows Removed by Filter: 11
 Planning Time: 0.183 ms
 Execution Time: 63.484 ms
(6 rows)

With the patch above, it can always choose a correct index even the
statistics is inaccurate:

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
  QUERY PLAN
--
 Index Scan using j1_i_im5 on j1  (cost=0.43..8.46 rows=1 width=16) (actual
time=0.030..0.030 rows=0 loops=1)
   Index Cond: ((i = 1) AND (im5 = 2))
 Planning Time: 1.087 ms
 Execution Time: 0.077 ms
(4 rows)

-- 
Best Regards
Andy Fan

Re: hash join error improvement (old)

2020-05-26 Thread Alvaro Herrera

On 2020-May-26, Tom Lane wrote:

> Are you sure you correctly identified the source of the bogus error
> report?

This version's better.  It doesn't touch the write side at all.
On the read side, only report a short read as such if errno's not set.

This error isn't frequently seen.  This page
https://blog.csdn.net/pg_hgdb/article/details/106279303
(A Postgres fork; blames the error on the temp hash files being encrypted,
suggests to increase temp_buffers) is the only one I found.

There are more uses of BufFileRead that don't bother to distinguish
these two cases apart, though -- logtape.c, tuplestore.c,
gistbuildbuffers.c all do the same.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index d13efc4d98..f91e69a09d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -930,9 +930,17 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 		return NULL;
 	}
 	if (nread != sizeof(header))
-		ereport(ERROR,
-(errcode_for_file_access(),
- errmsg("could not read from hash-join temporary file: %m")));
+	{
+		if (errno == 0)
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not read from hash-join temporary file: read only %d of %d bytes",
+			(int) nread, (int) sizeof(header;
+		else
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not read from hash-join temporary file: %m")));
+	}
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
@@ -940,9 +948,17 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 		(void *) ((char *) tuple + sizeof(uint32)),
 		header[1] - sizeof(uint32));
 	if (nread != header[1] - sizeof(uint32))
-		ereport(ERROR,
-(errcode_for_file_access(),
- errmsg("could not read from hash-join temporary file: %m")));
+	{
+		if (errno == 0)
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not read from hash-join temporary file: read only %d of %d bytes",
+			(int) nread, (int) (header[1] - sizeof(uint32);
+		else
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not read from hash-join temporary file: %m")));
+	}
 	return ExecStoreMinimalTuple(tuple, tupleSlot, true);
 }

Re: Default gucs for EXPLAIN

2020-05-26 Thread Vik Fearing

On 5/26/20 10:08 PM, Justin Pryzby wrote:
> If you want to change the default, I think that should be a separate 
> patch/thread.

Yes, it will be.
-- 
Vik Fearing

Re: Default gucs for EXPLAIN

2020-05-26 Thread Justin Pryzby

On Sat, May 23, 2020 at 06:33:48PM +0200, Vik Fearing wrote:
> > Do we really want default_explain_analyze ?
> > It sounds like bad news that EXPLAIN DELETE might or might not remove rows
> > depending on the state of a variable.
> 
> I have had sessions where not using ANALYZE was the exception, not the
> rule.  I would much prefer to type  EXPLAIN (ANALYZE OFF)  in those cases.

I suggest that such sessions are themselves exceptional.

> > I think this should be split into two patches:
> > One to make the default explain options configurable, and a separate patch 
> > to
> > change the defaults.
> 
> This patch does not change the defaults, so I'm not sure what you mean here?

Sorry, ignore that; I wrote it before digesting the patch.  

On Sat, May 23, 2020 at 06:16:25PM +, Nikolay Samokhvalov wrote:
> Not having BUFFERS enabled by default means that in most cases, when asking
> for help, people send execution plans without buffers info.

I also presumed that's where this patch was going to lead, but it doesn't
actually change the default.  So doesn't address that, except that if someone
reports a performance problem, we can tell them to run:

|alter system set explain_buffers=on; SELECT pg_reload_conf()

..which is no better, except that it would also affect any *additional* problem
reports which might be made from that cluster.

If you want to change the default, I think that should be a separate 
patch/thread.

-- 
Justin

Re: some grammar refactoring

2020-05-26 Thread Robert Haas

On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
 wrote:
> On 2020-05-25 21:09, Mark Dilger wrote:
> > I don't think it moves the needle too much, either.  But since your patch 
> > is entirely a refactoring patch and not a feature patch, I thought it would 
> > be fair to ask larger questions about how the code should be structured.  I 
> > like using enums and switch statements and getting better error messages, 
> > but there doesn't seem to be any fundamental reason why that should be in 
> > the command execution step.  It feels like a layering violation to me.
>
> Most utility commands don't have an intermediate parse analysis pass.
> They just go straight from the grammar to the execution.  Maybe that
> could be rethought, but that's the way it is now.

I think it can and should be rethought at some point. The present
split leads to a lot of weird coding. We've had security
vulnerabilities that were due to things like passing the same RangeVar
to two different places, leading to two different lookups for the name
that could be induced to return different OIDs. It also leads to a lot
of fuzzy thinking about where locks are taken, in which order, and how
many times, and with what strength. The code for queries seems to have
been thought through a lot more carefully, because the existence of
prepared queries makes mistakes a lot more noticeable. I hope some day
someone will be motivated to improve the situation for DDL as well,
though it will probably be a thankless task.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: hash join error improvement (old)

2020-05-26 Thread Alvaro Herrera

On 2020-May-26, Tom Lane wrote:

> Digging further down, it looks like BufFileWrite calls BufFileDumpBuffer
> which calls FileWrite which takes pains to set errno correctly after a
> short write --- so other than the lack of commentary about these
> functions' error-reporting API, I don't think there's any actual bug here.

Doh, you're right, this patch is completely broken ... aside from
carelessly writing the wrong "if" test, my unfamiliarity with the stdio
fread/fwrite interface is showing.  I'll look more carefully.

> Are you sure you correctly identified the source of the bogus error
> report?

Nope.  And I wish the bogus error report was all there was to it.  The
actual problem is a server crash.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Tomas Vondra


On Tue, May 26, 2020 at 11:40:07AM -0700, Jeff Davis wrote:

On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote:

I'm not familiar with logtape internals but IIRC the blocks are
linked
by each block having a pointer to the prev/next block, which means we
can't prefetch more than one block ahead I think. But maybe I'm
wrong,
or maybe fetching even just one block ahead would help ...


We'd have to get creative. Keeping a directory in the LogicalTape
structure might work, but I'm worried the memory requirements would be
too high.

One idea is to add a "prefetch block" to the TapeBlockTrailer (perhaps
only in the forward direction?). We could modify the prealloc list so
that we always know the next K blocks that will be allocated to the
tape. All for v14, of course, but I'd be happy to hack together a
prototype to collect data.



Yeah. I agree prefetching is definitely out of v13 scope. It might be
interesting to try how useful would it be, if you're willing to spend
some time on a prototype.



Do you have any other thoughts on the current prealloc patch for v13,
or is it about ready for commit?



I think it's pretty much ready to go.

I have some some doubts about the maximum value (128 probably means
read-ahead values above 256 are probably pointless, although I have not
tested that). But it's still a huge improvement with 128, so let's get
that committed.

I've been thinking about actually computing the expected number of
blocks per tape, and tying the maximum to that, somehow. But that's
something we can look at in the future.

As for the tlist fix, I think that's mostly ready too - the one thing we
should do is probably only doing it for AGG_HASHED. For AGG_SORTED it's
not really necessary.


iregards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Trouble with hashagg spill I/O pattern and costing

2020-05-26 Thread Jeff Davis

On Tue, 2020-05-26 at 16:15 +0200, Tomas Vondra wrote:
> I'm not familiar with logtape internals but IIRC the blocks are
> linked
> by each block having a pointer to the prev/next block, which means we
> can't prefetch more than one block ahead I think. But maybe I'm
> wrong,
> or maybe fetching even just one block ahead would help ...

We'd have to get creative. Keeping a directory in the LogicalTape
structure might work, but I'm worried the memory requirements would be
too high.

One idea is to add a "prefetch block" to the TapeBlockTrailer (perhaps
only in the forward direction?). We could modify the prealloc list so
that we always know the next K blocks that will be allocated to the
tape. All for v14, of course, but I'd be happy to hack together a
prototype to collect data.

Do you have any other thoughts on the current prealloc patch for v13,
or is it about ready for commit?

Regards,
Jeff Davis

PG_CRON logging

2020-05-26 Thread Rajin Raj

Hi,

Is there way to insert the cron job execution status to a table?
Or any other method to identify the job status without checking the log
file?

*Regards,*
*Rajin *

Re: race condition when writing pg_control

2020-05-26 Thread Bossart, Nathan

On 5/21/20, 9:52 PM, "Thomas Munro"  wrote:
> Here's a version with a commit message added.  I'll push this to all
> releases in a day or two if there are no objections.

Looks good to me.  Thanks!

Nathan

Re: Default gucs for EXPLAIN

2020-05-26 Thread Guillaume Lelarge

Le mar. 26 mai 2020 à 16:25, Stephen Frost  a écrit :

> Greetings,
>
> * Guillaume Lelarge (guilla...@lelarge.info) wrote:
> > Le mar. 26 mai 2020 à 04:27, Stephen Frost  a écrit
> :
> > > To that end- what if this was done client-side with '\explain' or
> > > similar?  Basically, it'd work like \watch or \g but we'd have options
> > > under pset like "explain_analyze t/f" and such.  I feel like that'd
> also
> > > largely address the concerns about how this might 'feature creep' to
> > > other commands- because those other commands don't work with a query
> > > buffer, so it wouldn't really make sense for them.
> > >
> > > As for the concerns wrt explain UPDATE or explain DETELE actually
> > > running the query, that's what transactions are for, and if you don't
> > > feel comfortable using transactions or using these options- then don't.
> >
> > This means you'll always have to check if the new GUCs are set up in a
> way
> > it will actually execute the query or to open a transaction for the same
> > reason. This is a huge behaviour change where people might lose data.
>
> It's only a behaviour change if you enable it.. and the suggestion I
> made specifically wouldn't even be a regular 'explain', you'd be using
> '\explain' in psql, a new command.
>
> > I really don't like this proposal (the new GUCs).
>
> The proposal you're commenting on (seemingly mine, anyway) didn't
> include adding any new GUCs.
>
>
My bad. I didn't read your email properly, sorry.

I wouldn't complain about a \explain metacommand. The proposal I (still)
dislike is Vik's.


-- 
Guillaume.

Re: hash join error improvement (old)

2020-05-26 Thread Tom Lane

Alvaro Herrera  writes:
> Hmm, right -- I was extending the partial read case to apply to a
> partial write, and we deal with those very differently.  I changed the
> write case to use our standard approach.

Actually ... looking more closely, this proposed change in
ExecHashJoinSaveTuple flat out doesn't work, because it assumes that
BufFileWrite reports errors the same way as write(), which is not the
case.  In particular, written < 0 can't happen; moreover, you've
removed detection of a short write as opposed to a completely failed
write.

Digging further down, it looks like BufFileWrite calls BufFileDumpBuffer
which calls FileWrite which takes pains to set errno correctly after a
short write --- so other than the lack of commentary about these
functions' error-reporting API, I don't think there's any actual bug here.
Are you sure you correctly identified the source of the bogus error
report?

Similarly, I'm afraid you introduced rather than removed problems
in ExecHashJoinGetSavedTuple.  BufFileRead doesn't use negative
return values either.

regards, tom lane

Re: Default gucs for EXPLAIN

2020-05-26 Thread Stephen Frost

Greetings,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> the partial solution can be custom psql statements. Now, it can be just
> workaround
> 
> \set explain 'explain (analyze, buffers)'
> :explain select * from pg_class ;
> 
> and anybody can prepare customized statements how he likes

Yeah, it's really very rudimentary though, unfortunately.  A proper
language in psql would be *really* nice with good ways to reference
variables and such..

I don't view this as really being a good justification to not have a
\explain type of command.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Default gucs for EXPLAIN

2020-05-26 Thread Stephen Frost

Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Monday, May 25, 2020, Stephen Frost  wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> > > On Mon, May 25, 2020 at 09:36:50PM -0400, Bruce Momjian wrote:
> > > > I am not excited about this new feature.  Why do it only for EXPLAIN?
> >
> > Would probably help to understand what your thinking is here regarding
> > how it could be done for everything...?  In particular, what else are
> > you thinking it'd be sensible for?
> 
> COPY comes to mind immediately.

Indeed... and we have a \copy already, so following my proposal, at
least, it seems like we could naturally add in options to have defaults
to be used with \copy is used in psql.  That might end up being a bit
more interesting since we didn't contempalte that idea when \copy was
first written and therefore we might need to change the syntax that the
backend COPY commands to make this work (maybe adopting a similar syntax
to explain, in addition to the existing WITH options after the COPY
command, and then deciding which to prefer when both exist, or thorw an
error in such a case).

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Default gucs for EXPLAIN

2020-05-26 Thread Stephen Frost

Greetings,

* Guillaume Lelarge (guilla...@lelarge.info) wrote:
> Le mar. 26 mai 2020 à 04:27, Stephen Frost  a écrit :
> > To that end- what if this was done client-side with '\explain' or
> > similar?  Basically, it'd work like \watch or \g but we'd have options
> > under pset like "explain_analyze t/f" and such.  I feel like that'd also
> > largely address the concerns about how this might 'feature creep' to
> > other commands- because those other commands don't work with a query
> > buffer, so it wouldn't really make sense for them.
> >
> > As for the concerns wrt explain UPDATE or explain DETELE actually
> > running the query, that's what transactions are for, and if you don't
> > feel comfortable using transactions or using these options- then don't.
>
> This means you'll always have to check if the new GUCs are set up in a way
> it will actually execute the query or to open a transaction for the same
> reason. This is a huge behaviour change where people might lose data.

It's only a behaviour change if you enable it.. and the suggestion I
made specifically wouldn't even be a regular 'explain', you'd be using
'\explain' in psql, a new command.

> I really don't like this proposal (the new GUCs).

The proposal you're commenting on (seemingly mine, anyway) didn't
include adding any new GUCs.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: what can go in root.crt ?

2020-05-26 Thread Chapman Flack

On 05/26/20 09:35, Andrew Dunstan wrote:

> The trouble is I think you have it the wrong way round. It makes sense
> to give less trust to a non-root CA than to one of its up-chain
> authorities, e.g. only trust it for certain domains, or for a lesser
> period of time. But it doesn't seem to make much sense to trust the
> up-chain CA less, since it is what you should base your trust of the
> lower CA on.

I wonder if there might be different meanings of 'trust' in play here
complicating the conversation.

At $work, when I make a certificate request and send it off to our
own in-house bureau of making certificates happen, what you might
expect is that they would be running the first level of CA right
in house (and IIRC that was the case in my early years here).
So I would get back some chain like this:

  WE ARE A PROMINENT GLOBAL ISSUER FOUND IN WEB BROWSER TRUST STORES
WE ISSUE TO LOTS OF FOLKS
  WE ISSUE TO ORGS LIKE YOURS
WE ARE YOUR ORG
  my server cert

In that picture, the question of whether I give more or less trust to
PROMINENT GLOBAL ISSUER because they have larger market cap and their
name in the news, or to WE ARE YOUR ORG because they are my org, seems
to turn on different understandings of trust. There might be a lot of
reasons in general to trust PROMINENT GLOBAL in the sense of putting
their cert in widely distributed web browser trust stores. But there
are excellent reasons to trust WE ARE YOUR ORG as authoritative on
what's a server for my org.

Now in these later days when there is no longer an in-house CA at the
bottom of this chain, the situation's not as clear-cut. WE ISSUE TO ORGS
LIKE YOURS isn't quite authoritative on what's a server for my org.
But there are inked signatures on paper between their honcho and my org's
honcho that don't exist between my org and PROMINENT GLOBAL. And you would
have to work harder to get a spoof cert for one of my servers signed by
them. You would have to talk /them/ into it.

If I have PROMINENT GLOBAL in there, you just have to make offers to
their umpty sub-CAs and their umpty-squared sub-sub-CAs and find just
one that will make a deal.

> to give less trust to a non-root CA than to one of its up-chain
> authorities, e.g. only trust it for certain domains, or for a lesser

That's certainly appropriate, and I'd be delighted if the root.crt file
supported syntax like this:

 *.myorg.org: WE ARE YOUR ORG.crt
 *: PROMINENT GLOBAL ISSUER.crt { show exfiltration/HIPAA/FERPA banner }

Doing the same thing (or some of it) in certificate style, you would
want WE ARE YOUR ORG.crt to be signed with a Name Constraints extension
limiting it to be a signer for .myorg.org certificates. That is indeed
a thing. The history in [1] shows it was at first of limited value
because client libraries didn't all grok it, or would accept certificates
without Subject Alt Name extensions and verify by CN instead, without the
constraint. But I have noticed more recently that mainstream web browsers,
anyway, are no longer tolerant of certs without SAN, and that seems to be
part of a road map to giving the Name Constraints more teeth.

Regards,
-Chap

[1]
https://security.stackexchange.com/questions/31376/can-i-restrict-a-certification-authority-to-signing-certain-domains-only

Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-26 Thread Ashutosh Bapat

On Tue, May 26, 2020 at 1:52 PM Andy Fan  wrote:
>
>
> Consider the below example:
>
> create table j1(i int, im5 int,  im100 int, im1000 int);
> insert into j1 select i, i%5, i%100, i%1000 from generate_series(1, 
> 1000)i;
> create index j1_i_im5 on j1(i, im5);
> create index j1_i_im100 on j1(i, im100);
> analyze j1;
> explain select * from j1 where i = 100 and im5 = 5;
>
> We may get the plan like this:
>
> demo=# explain select  * from  j1 where i = 100 and im5 = 1;
>   QUERY PLAN
> --
>  Index Scan using j1_i_im100 on j1  (cost=0.43..8.46 rows=1 width=16)
>Index Cond: (i = 100)
>Filter: (im5 = 1)
> (3 rows)
>
> At this case, optimizer can estimate there are only 1 row to return, so both
> indexes have same cost, which one will be choose is un-controlable. This is
> fine for above query based on the estimation is accurate. However estimation
> can't be always accurate in real life. Some inaccurate estimation can cause an
> wrong index choose. As an experience, j1_i_im5 index should always be choose
> for above query.

I think we need a better example where choosing an index makes a difference.

An index can be chosen just because it's path was created before some
other more appropriate index but the cost difference was within fuzzy
limit. Purely based on the order in which index paths are created.

>
> This one line change is the best method I can think.
>
> -   cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
> +  cpu_per_tuple = cpu_tuple_cost + (qpqual_cost.per_tuple * 1.001);
>
> We make the qual cost on index filter is slightly higher than qual cost in 
> Index
> Cond. This will also good for QUAL (i=x AND m=y AND n=z). Index are (i, m,
> other_col1) and (i, other_col1, other_col2).  But this change also
> changed the relation between the qual cost on index scan and qual cost on seq
> scan. However I think that impact is so tiny that I think we can ignore that 
> (we
> can choose a better factor between 1 and 1.001).
>
> Even the root cause of this issue comes from an inaccurate estimation. but I
> don't think that is an issue easy/possible to fix, however I'm open for
> suggestion on that as well.
>
> Any suggestions?
>
> --
> Best Regards
> Andy Fan



-- 
Best Wishes,
Ashutosh Bapat

Re: hash join error improvement (old)

2020-05-26 Thread Alvaro Herrera

Hi Tom, thanks for looking.

On 2020-May-25, Tom Lane wrote:

> I don't mind if you want to extend that paradigm to also use "wrote only
> %d bytes" wording, but the important point is to get the SQLSTATE set on
> the basis of ENOSPC rather than whatever random value errno will have
> otherwise.

Hmm, right -- I was extending the partial read case to apply to a
partial write, and we deal with those very differently.  I changed the
write case to use our standard approach.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index d13efc4d98..3817c41225 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -882,16 +882,26 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 	}
 
 	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
-	if (written != sizeof(uint32))
+	if (written < 0)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to hash-join temporary file: %m")));
+	}
 
 	written = BufFileWrite(file, (void *) tuple, tuple->t_len);
-	if (written != tuple->t_len)
+	if (written < 0)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to hash-join temporary file: %m")));
+	}
 }
 
 /*
@@ -929,20 +939,30 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 		ExecClearTuple(tupleSlot);
 		return NULL;
 	}
-	if (nread != sizeof(header))
+	else if (nread < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not read from hash-join temporary file: %m")));
+	else if (nread != sizeof(header))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not read from hash-join temporary file: read only %d of %d bytes",
+		(int) nread, (int) sizeof(header;
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
 	nread = BufFileRead(file,
 		(void *) ((char *) tuple + sizeof(uint32)),
 		header[1] - sizeof(uint32));
-	if (nread != header[1] - sizeof(uint32))
+	if (nread < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not read from hash-join temporary file: %m")));
+	else if (nread != header[1] - sizeof(uint32))	/* might read zero bytes */
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not read from hash-join temporary file: read only %d of %d bytes",
+		(int) nread, (int) (header[1] - sizeof(uint32);
 	return ExecStoreMinimalTuple(tuple, tupleSlot, true);
 }

Re: what can go in root.crt ?

2020-05-26 Thread Andrew Dunstan



On 5/25/20 3:32 PM, Chapman Flack wrote:
> On 05/25/20 15:15, Chapman Flack wrote:
>> Does that mean it also would fail if I directly put the server's
>> end-entity cert there?
>>
>> Would I have to put all three of WE ISSUE TO ORGS LIKE YOURS,
>> WE ISSUE TO LOTS, and WE ISSUE TO EVERYBODY in the root.crt file
>> in order for verification to succeed?
>>
>> If I did that, would the effect be any different from simply putting
>> WE ISSUE TO EVERYBODY there, as before? Would it then happily accept
>> a cert with a chain that ended at WE ISSUE TO EVERYBODY via some other
>> path? Is there a way I can accomplish trusting only certs issued by
>> WE ISSUE TO ORGS LIKE YOURS?
> The client library is the PG 10 one that comes with Ubuntu 18.04
> in case it matters.
>
> I think I have just verified that I can't make it work by putting
> the end entity cert there either. It is back working again with only
> the WE ISSUE TO EVERYBODY cert there, but if there is a workable way
> to narrow that grant of trust a teensy little bit, I would be happy
> to do that.
>


The trouble is I think you have it the wrong way round. It makes sense
to give less trust to a non-root CA than to one of its up-chain
authorities, e.g. only trust it for certain domains, or for a lesser
period of time. But it doesn't seem to make much sense to trust the
up-chain CA less, since it is what you should base your trust of the
lower CA on.



cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: what can go in root.crt ?

2020-05-26 Thread Chapman Flack

On 05/26/20 02:05, Craig Ringer wrote:
> The main reason to put intermediate certificates in the root.crt is that it
> allows PostgreSQL to supply the whole certificate chain to a client during

Hold on a sec; you're not talking about what I'm talking about, yet.

Yes, you have make the chain available to the server to serve out with
its own cert so clients can verify. root.crt isn't where you put that,
though. You put that in server.crt (or wherever the ssl_cert_file GUC
points).

> That frees the clients from needing to have local copies
> of the intermediate certificates; they only have to know about WE ISSUE TO
> EVERYBODY.

Bingo. Put WE ISSUE TO EVERYBODY in the root.crt (client-side, libpq) file,
and the clients happily connect to the server. It is easy and convenient.

But if WE STEAL YOUR STUFF gets their certs signed by WE SIGN ANYTHING
FOR A PRICE and their CA is WE'RE SOMETIMES LESS ATTENTIVE THAN YOU HOPE
and /their/ CA is WE ISSUE TO EVERYBODY, then the clients would just as
happily connect to a server of the same name run by WE STEAL YOUR STUFF.

Which brings us around to what I was talking about.

> If you wanted to require that your certs are signed by WE ISSUE TO ORGS
> LIKE YOURS.COM, you must configure your CLIENTS with a restricted root of
> trust that accepts only the intermediate certificate of WE ISSUE TO ORGS
> LIKE YOURS.COM .

Precisely. And the place to configure that restricted root of trust would
have to be ~/.postgresql/root.crt on the client, and the question is,
does that work?

> Assuming the client will accept it; not all clients allow
> you to configure "certificates I trust to sign peers" separately to
> "certificates that sign my trusted roots". Because really, in security
> terms that's nonsensical.

And that's the key question: there are clients that grok that and clients
that don't; so now, libpq is which kind of client?

Could you expand on your "sign _peers_" notion, and on what exactly
you are calling nonsensical?

Each of those intermediate CAs really is a CA; the WE ISSUE TO ORGS
LIKE yours cert does contain these extensions:

X509v3 extensions:
...
X509v3 Key Usage: critical
Digital Signature, Certificate Sign, CRL Sign
X509v3 Basic Constraints: critical
CA:TRUE, pathlen:0
...

My server's end-entity certificate does not have the Certificate Sign,
CRL Sign or CA:TRUE bits, or a URL to a revocation-checking service.
An end entity and a CA are not 'peers' in those respects.

Regards,
-Chap

Re: Default gucs for EXPLAIN

2020-05-26 Thread Pavel Stehule

út 26. 5. 2020 v 4:27 odesílatel Stephen Frost  napsal:

> Greetings,
>
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Mon, May 25, 2020 at 09:36:50PM -0400, Bruce Momjian wrote:
> > > I am not excited about this new feature.  Why do it only for EXPLAIN?
>
> Would probably help to understand what your thinking is here regarding
> how it could be done for everything...?  In particular, what else are
> you thinking it'd be sensible for?
>
> > > That is a log of GUCs.  I can see this becoming a feature creep
> > > disaster.
>
> I'd only view it as a feature creep disaster if we end up extending it
> to things that don't make any sense..  I don't see any particular reason
> why we'd have to do that though.  On the other hand, if there's a clean
> way to do it for everything, that'd be pretty neat.
>
> > FWIW, Neither am I.  This would create an extra maintenance cost, and
> > I would not want such stuff to spread to other commands either, say
> > CLUSTER, VACUUM, REINDEX, etc.  And note that it is always possible to
> > do that with a small extension using the utility hook and some
> > pre-loaded user-settable GUCs.
>
> The suggestion to "go write C code that will be loaded via a utility
> hook" is really entirely inappropriate here.
>
> This strikes me as a pretty reasonable 'creature comfort' kind of idea.
> Inventing GUCs to handle it is maybe not the best approach, but we
> haven't really got anything better right at hand- psql can't parse
> general SQL, today, and hasn't got it's own idea of "how to run
> explain".  On the other hand, I could easily see a similar feature
> being included in pgAdmin4 where running explain is clicking on a button
> instead of typing 'explain'.
>
> To that end- what if this was done client-side with '\explain' or
> similar?  Basically, it'd work like \watch or \g but we'd have options
> under pset like "explain_analyze t/f" and such.  I feel like that'd also
> largely address the concerns about how this might 'feature creep' to
> other commands- because those other commands don't work with a query
> buffer, so it wouldn't really make sense for them.
>
> As for the concerns wrt explain UPDATE or explain DETELE actually
> running the query, that's what transactions are for, and if you don't
> feel comfortable using transactions or using these options- then don't.
>

the partial solution can be custom psql statements. Now, it can be just
workaround

\set explain 'explain (analyze, buffers)'
:explain select * from pg_class ;

and anybody can prepare customized statements how he likes

Regards

Pavel




> Thanks,
>
> Stephen
>

max_slot_wal_keep_size comment in postgresql.conf

2020-05-26 Thread Jeff Janes

In postgresql.conf, it says:

#max_slot_wal_keep_size = -1  # measured in bytes; -1 disables

I don't know if that is describing the dimension of this parameter or the
units of it, but the default units for it are megabytes, not individual
bytes, so I think it is pretty confusing.

Cheers,

Jeff

Re: Two fsync related performance issues?

2020-05-26 Thread Craig Ringer

On Tue, 12 May 2020, 08:42 Paul Guo,  wrote:

> Hello hackers,
>
> 1. StartupXLOG() does fsync on the whole data directory early in the crash
> recovery. I'm wondering if we could skip some directories (at least the
> pg_log/, table directories) since wal, etc could ensure consistency. Here
> is the related code.
>
>   if (ControlFile->state != DB_SHUTDOWNED &&
>   ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
>   {
>   RemoveTempXlogFiles();
>   SyncDataDirectory();
>   }
>

This would actually be a good candidate for a thread pool. Dispatch sync
requests and don't wait. Come back later when they're done.

Unsure if that's at all feasible given that pretty much all the Pg APIs
aren't thread safe though. No palloc, no elog/ereport, etc. However I don't
think we're ready to run bgworkers or use shm_mq etc at that stage.

Of course if OSes would provide asynchronous IO interfaces that weren't
utterly vile and broken, we wouldn't have to worry...

>
> RecreateTwoPhaseFile() writes a state file for a prepared transaction and
> does fsync. It might be good to do fsync for all files once after writing
> them, given the kernel is able to do asynchronous flush when writing those
> file contents. If the TwoPhaseState->numPrepXacts is large we could do
> batching to avoid the fd resource limit. I did not test them yet but this
> should be able to speed up checkpoint/restartpoint a bit.
>

I seem to recall some hints we can set on a FD or mmapped  range that
encourage dirty buffers to be written without blocking us, too. I'll have
to look them up...

>

Re: Remove page-read callback from XLogReaderState.

2020-05-26 Thread Craig Ringer

On Tue, 26 May 2020, 15:40 Kyotaro Horiguchi, 
wrote:

>
> This patch removes all the three callbacks (open/close/page_read) in
> XL_ROUTINE from XLogReaderState.  It only has "cleanup" callback
> instead.
>

I actually have a use in mind for these callbacks - to support reading WAL
for logical decoding from a restore_command like tool. So we can archive
wal when it's no longer required for recovery and reduce the risk of
filling pg_wal if a standby lags.

I don't object to your cleanup at all. I'd like it to  be properly
pluggable, whereas right now it has hard coded callbacks that differ for
little reason.

Just noting that the idea of a callback here isn't a bad thing.

>

Re: Add A Glossary

2020-05-26 Thread Peter Eisentraut


On 2020-04-29 21:55, Corey Huinker wrote:
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut 
> wrote:


Why are all the glossary terms capitalized?  Seems kind of strange.


They weren't intended to be, and they don't appear to be in the page I'm 
looking at. Are you referring to the anchor like in 
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? 
If so, that all-capping is part of the rendering, as the ids were all 
named in all-lower-case.


Sorry, I meant why is the first letter of each term capitalized.  That 
seems unusual.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default gucs for EXPLAIN

2020-05-26 Thread Vik Fearing

On 5/26/20 1:30 PM, David Rowley wrote:
> On Tue, 26 May 2020 at 13:36, Bruce Momjian  wrote:
>>
>> On Sat, May 23, 2020 at 06:16:25PM +, Nikolay Samokhvalov wrote:
>>> This is a very good improvement! Using information about buffers is my 
>>> favorite
>>> way to optimize queries.
>>>
>>> Not having BUFFERS enabled by default means that in most cases, when asking 
>>> for
>>> help, people send execution plans without buffers info.
>>>
>>> And it's simply in on event to type "(ANALYZE, BUFFERS)" all the time.
>>>
>>> So I strongly support this change, thank you, Vik.
>>
>> I am not excited about this new feature.
> 
> I'm against adding GUCs to control what EXPLAIN does by default.
> 
> A few current GUCs come to mind which gives external control to a
> command's behaviour are:
> 
> standard_conforming_strings
> backslash_quote
> bytea_output
> 
> It's pretty difficult for application authors to write code that will
> just work due to these GUCs. We end up with GUCs like
> escape_string_warning to try and help application authors find areas
> which may be problematic.
> 
> It's not an easy thing to search for in the archives, but we've
> rejected GUCs that have proposed new ways which can break applications
> in this way. For example [1]. You can see some arguments against that
> in [2].
> 
> Now, there are certainly far fewer applications out there that will
> execute an EXPLAIN, but the number is still above zero. I imagine the
> authors of those applications might get upset if we create something
> outside of the command that controls what the command does. Perhaps
> the idea here is not quite as bad as that as applications could still
> override the options by mentioning each EXPLAIN option in the command
> they send to the server. However, we're not done adding new options
> yet, so by doing this we'd be pretty much insisting that applications
> that use EXPLAIN know about all EXPLAIN options for the server version
> they're connected to. That seems like a big demand given that we've
> been careful to still support the old
> EXPLAIN syntax after we added the new way to specify the options in 
> parenthesis.


Nah, this argument doesn't hold.  If an app wants something on or off,
it should say so.  If it doesn't care, then it doesn't care.

Are you saying we should have all new EXPLAIN options off forever into
the future because apps won't know about the new data?  I guess we
should also not ever introduce new plan nodes because those won't be
known either.

I'm not buying that.
-- 
Vik Fearing

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-05-26 Thread Andy Fan

On Fri, May 22, 2020 at 9:51 PM Fujii Masao 
wrote:

>
>
> On 2020/05/22 15:10, Andy Fan wrote:
> >
> >
> > On Thu, May 21, 2020 at 3:49 PM Julien Rouhaud  > wrote:
> >
> > Le jeu. 21 mai 2020 à 09:17, Michael Paquier  > a écrit :
> >
> > On Thu, May 21, 2020 at 08:49:53AM +0200, Julien Rouhaud wrote:
> >  > On Tue, May 19, 2020 at 4:29 AM Andy Fan <
> zhihui.fan1...@gmail.com > wrote:
> >  >> Thanks for the excellent extension. I want to add 5 more
> fields to satisfy the
> >  >> following requirements.
> >  >>
> >  >> int   subplan; /* No. of subplan in this query */
> >  >> int   subquery; /* No. of subquery */
> >  >> int   joincnt; /* How many relations are joined */
> >  >> bool hasagg; /* if we have agg function in this query */
> >  >> bool hasgroup; /* has group clause */
> >  >
> >  > Most of those fields can be computed using the raw sql
> satements.  Why
> >  > not adding functions like query_has_agg(querytext) to get the
> >  > information from pgss stored query text instead?
> >
> > Yeah I personally find concepts related only to the query string
> > itself not something that needs to be tied to pg_stat_statements.
> > ...
> >
> >
> > Indeed cte will bring additional concerns about the fields
> semantics. That's another good reason to go with external functions so you
> can add extra parameters for that if needed.
> >
> >
> > There are something more we can't get from query string easily. like:
> > 1. view involved.   2. subquery are pulled up so there is not subquery
> > indeed.  3. sublink are pull-up or become as an InitPlan rather than
> subPlan.
> > 4. joins are removed by remove_useless_joins.
>
> If we can store the plan for each statement, e.g., like pg_store_plans
> extension [1] does, rather than such partial information, which would
> be enough for your cases?
>
> That would be helpful if I can search the interested data from it. Oracle
has
v$sql_plan,  where every node in the plan has its own record, so it is easy
to search.

-- 
Best Regards
Andy Fan

Re: Default gucs for EXPLAIN

2020-05-26 Thread David Rowley

On Tue, 26 May 2020 at 13:36, Bruce Momjian  wrote:
>
> On Sat, May 23, 2020 at 06:16:25PM +, Nikolay Samokhvalov wrote:
> > This is a very good improvement! Using information about buffers is my 
> > favorite
> > way to optimize queries.
> >
> > Not having BUFFERS enabled by default means that in most cases, when asking 
> > for
> > help, people send execution plans without buffers info.
> >
> > And it's simply in on event to type "(ANALYZE, BUFFERS)" all the time.
> >
> > So I strongly support this change, thank you, Vik.
>
> I am not excited about this new feature.

I'm against adding GUCs to control what EXPLAIN does by default.

A few current GUCs come to mind which gives external control to a
command's behaviour are:

standard_conforming_strings
backslash_quote
bytea_output

It's pretty difficult for application authors to write code that will
just work due to these GUCs. We end up with GUCs like
escape_string_warning to try and help application authors find areas
which may be problematic.

It's not an easy thing to search for in the archives, but we've
rejected GUCs that have proposed new ways which can break applications
in this way. For example [1]. You can see some arguments against that
in [2].

Now, there are certainly far fewer applications out there that will
execute an EXPLAIN, but the number is still above zero. I imagine the
authors of those applications might get upset if we create something
outside of the command that controls what the command does. Perhaps
the idea here is not quite as bad as that as applications could still
override the options by mentioning each EXPLAIN option in the command
they send to the server. However, we're not done adding new options
yet, so by doing this we'd be pretty much insisting that applications
that use EXPLAIN know about all EXPLAIN options for the server version
they're connected to. That seems like a big demand given that we've
been careful to still support the old
EXPLAIN syntax after we added the new way to specify the options in parenthesis.

[1] 
https://www.postgresql.org/message-id/flat/acf85c502e55a143ab9f4ecfe960660a172...@mailserver2.local.mstarlabs.com
[2] https://www.postgresql.org/message-id/17880.1482648516%40sss.pgh.pa.us

David

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-26 Thread Amit Kapila

On Tue, May 26, 2020 at 2:44 PM Dilip Kumar  wrote:
>
> On Tue, May 26, 2020 at 10:27 AM Amit Kapila  wrote:
> >
> > >
> > > 2. There is a bug fix in handling the stream abort in 0008 (earlier it
> > > was 0006).
> > >
> >
> > The code changes look fine but it is not clear what was the exact
> > issue.  Can you explain?
>
> Basically, in case of an empty subtransaction, we were reading the
> subxacts info but when we could not find the subxid in the subxacts
> info we were not releasing the memory.  So on next subxact_info_read
> it will expect that subxacts should be freed but we did not free it in
> that !found case.
>

Okay, on looking at it again, the same code exists in
subxact_info_write as well.  It is better to have a function for it.
Can we have a structure like SubXactContext for all the variables used
for subxact?  As mentioned earlier I find the allocation/deallocation
of subxacts a bit ad-hoc, so there will always be a chance that we can
forget to free it.  Having it allocated in memory context which we can
reset later might reduce that risk.  One idea could be that we have a
special memory context for start and stop messages which can be used
to allocate the subxacts there.  In case of commit/abort, we can allow
subxacts information to be allocated in ApplyMessageContext which is
reset at the end of each protocol message.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: New Feature Request

2020-05-26 Thread Peter Eisentraut


On 2020-05-26 12:10, Bert Scalzo wrote:
So far QIKR shows about a 
2.5X improvement over the PostgreSQL optimizer when fed bad SQL. I am 
not saying the PotsgrSQL optimizer does a poor job, but rather that 
QIKR was designed for "garbage in, not garbage out" - so QIKR fixes all 
the stupid mistakes that people make which can confuse or even cripple 
an optimizer. Hence why I am looking for this hook - and have come to 
the experts for help. I have two very large PostgreSQL partner 
organizations who have asked me to make QIKR work for PostgreSQL as it 
does for MySQL. Again, I am willing to pay for this hook since it's a 
special request for a special purpose and not generally worthwhile in 
many people's opinions - which I cannot argue with.


Your project seems entirely legitimate as a third-party optional plugin.

I think the post_parse_analyze_hook would work for this.  I suggest you 
start with it and see how far you can take it.


It may turn out that you need a hook after the rewriter, but that should 
be a small change and shouldn't affect your own code very much, since 
you'd get handed the same data structure in each case.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: New Feature Request

2020-05-26 Thread Bert Scalzo

I greatly appreciate all the replies. Thanks. I also fully understand and
appreciate all the points made - especially that this idea may not have
general value or acceptance as worthwhile. No argument from me. Let me
explain why I am looking to do this to see if that changes any opinions. I
have written a product called QIKR for MySQL that leverages the MySQL query
rewrite feature and places a knowledge expert of SQL rewrite rules as a
preprocessor to the MySQL optimizer. I have defined an extensive set of
rules based on my 30 years of doing code reviews for app developers who
write terrible SQL. Right now QIKR does 100% syntactic analysis (hoping to
do semantic analysis in a later version). For MySQL (which has a less
mature and less robust optimizer) the performance gains are huge - in
excess of 10X. So far QIKR shows about a 2.5X improvement over the
PostgreSQL optimizer when fed bad SQL. I am not saying the
PotsgrSQL optimizer does a poor job, but rather that QIKR was designed for
"garbage in, not garbage out" - so QIKR fixes all the stupid mistakes that
people make which can confuse or even cripple an optimizer. Hence why I am
looking for this hook - and have come to the experts for help. I have two
very large PostgreSQL partner organizations who have asked me to make
QIKR work for PostgreSQL as it does for MySQL. Again, I am willing to pay
for this hook since it's a special request for a special purpose and not
generally worthwhile in many people's opinions - which I cannot argue with.

On Tue, May 26, 2020 at 2:17 AM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 26.05.2020 04:47, Tomas Vondra wrote:
> > On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
> >> On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
> >>> I am reposting this from a few months back (see below). I am not
> >>> trying to be a
> >>> pest, just very motivated. I really think this feature has merit,
> >>> and if not
> >>> generally worthwhile, I'd be willing to pay someone to code it for
> >>> me as I
> >>> don't have strong enough C skills to modify the PostgreSQL code
> >>> myself. So
> >>> anyone who might have such skills that would be interested, please
> >>> contact me:
> >>> bertscal...@gmail.com.
> >>
> >> I think your best bet is to try getting someone to write a hook
> >> that will do the replacement so that you don't need to modify too much
> >> of the Postgres core code.  You will need to have the hook updated for
> >> new versions of Postgres, which adds to the complexity.
> >>
> >
> > I don't think we have a hook to tweak the incoming SQL, though. We only
> > have post_parse_analyze_hook, i.e. post-parse, at which point we can't
> > just rewrite the SQL directly. So I guess we'd need new hook.
>
> VOPS extension performs query substitution (replace query to the
> original table with query to projection) using post_parse_analysis_hook
> and SPI. So I do not understand why  some extra hook is needed.
>
> >
> > I do however wonder if an earlier hook is a good idea at all - matching
> > the SQL directly seems like a rather naive approach that'll break easily
> > due to formatting, upper/lower-case, subqueries, and many other things.
> > From this standpoint it seems actually better to inspect and tweak the
> > parse-analyze result. Not sure how to define the rules easily, though.
> >
>
> In some cases we need to know exact parameter value (as in case
> SUBSTRING(column,1,3) = 'ABC').
> Sometime concrete value of parameter is not important...
> Also it is not clear where such pattern-matching transformation should
> be used only for the whole query or for any its subtrees?
>
> > As for the complexity, I think hooks are fairly low-maintenance in
> > practice, we tend not to modify them very often, and when we do it's
> > usually just adding an argument etc.
>
> I am not sure if the proposed approach can really be useful in many cases.
> Bad queries are used to be generated by various ORM tools.
> But them rarely generate exactly the same query. So defining matching
> rules for the whole query tree will rarely work.
>
> It seems to be more useful to have extensible SQL optimizer, which
> allows to add user defined rules (may as transformation patterns).
> This is how it is done in GCC code optimizer.
> Definitely writing such rules is very non-trivial task.
> Very few developers will be able to add their own meaningful rules.
> But in any case it significantly simplify improvement of optimizer,
> although most of problems with choosing optimal plan are
> caused by wrong statistic and rue-based optimization can not help here.
>
>
>
>

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-26 Thread Amit Kapila

On Mon, May 25, 2020 at 8:07 PM Dilip Kumar  wrote:
>
> On Fri, May 22, 2020 at 11:54 AM Amit Kapila  wrote:
> >
> > 4.
> > + * XXX Do we need to allocate it in TopMemoryContext?
> > + */
> > +static void
> > +subxact_info_add(TransactionId xid)
> > {
> > ..
> >
> > For this and other places in a patch like in function
> > stream_open_file(), instead of using TopMemoryContext, can we consider
> > using a new memory context LogicalStreamingContext or something like
> > that. We can create LogicalStreamingContext under TopMemoryContext.  I
> > don't see any need of using TopMemoryContext here.
>
> But, when we will delete/reset the LogicalStreamingContext?
>

Why can't we reset it at each stream stop message?

>  because
> we are planning to keep this memory until the worker is alive so that
> supposed to be the top memory context.
>

Which part of allocation do we want to keep till the worker is alive?
Why we need memory-related to subxacts till the worker is alive?  As
we have now, after reading subxact info (subxact_info_read), we need
to ensure that it is freed after its usage due to which we need to
remember and perform pfree at various places.

I think we should once see the possibility that such that we could
switch to this new context in start stream message and reset it in
stop stream message.  That might help in avoiding
MemoryContextSwitchTo TopMemoryContext at various places.

>  If we create any other context
> with the same life span as TopMemoryContext then what is the point?
>

It is helpful for debugging.  It is recommended that we don't use the
top memory context unless it is really required.  Read about it in
src/backend/utils/mmgr/README.

>
> > 8.
> > + * XXX Maybe we should only include the checksum when the cluster is
> > + * initialized with checksums?
> > + */
> > +static void
> > +subxact_info_write(Oid subid, TransactionId xid)
> >
> > Do we really need to have the checksum for temporary files? I have
> > checked a few other similar cases like SharedFileSet stuff for
> > parallel hash join but didn't find them using checksums.  Can you also
> > once see other usages of temporary files and then let us decide if we
> > see any reason to have checksums for this?
>
> Yeah, even I can see other places checksum is not used.
>

So, unless someone speaks up before you are ready for the next version
of the patch, can we remove it?

> >
> > Another point is we don't seem to be doing this for 'changes' file,
> > see stream_write_change.  So, not sure, there is any sense to write
> > checksum for subxact file.
>
> I can see there are comment atop this function
>
> * XXX The subxact file includes CRC32C of the contents. Maybe we should
> * include something like that here too, but doing so will not be as
> * straighforward, because we write the file in chunks.
>

You can remove this comment as well.  I don't know how advantageous it
is to checksum temporary files.  We can anyway add it later if there
is a reason for doing so.

>
>
> > 12.
> > maybe_send_schema()
> > {
> > ..
> > + if (in_streaming)
> > + {
> > + /*
> > + * TOCHECK: We have to send schema after each catalog change and it may
> > + * occur when streaming already started, so we have to track new catalog
> > + * changes somehow.
> > + */
> > + schema_sent = get_schema_sent_in_streamed_txn(relentry, topxid);
> > ..
> > ..
> > }
> >
> > I think it is good to once verify/test what this comment says but as
> > per code we should be sending the schema after each catalog change as
> > we invalidate the streamed_txns list in rel_sync_cache_relation_cb
> > which must be called during relcache invalidation.  Do we see any
> > problem with that mechanism?
>
> I have tested this, I think we are already sending the schema after
> each catalog change.
>

Then remove "TOCHECK" in the above comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-26 Thread Dilip Kumar

On Tue, May 26, 2020 at 10:27 AM Amit Kapila  wrote:
>
> On Fri, May 22, 2020 at 6:21 PM Dilip Kumar  wrote:
> >
> > On Mon, May 18, 2020 at 5:57 PM Amit Kapila  wrote:
> > >
> > >
> > > Few comments on v20-0010-Bugfix-handling-of-incomplete-toast-tuple
> > > 1.
> > > + /*
> > > + * If this is a toast insert then set the corresponding bit.  Otherwise, 
> > > if
> > > + * we have toast insert bit set and this is insert/update then clear the
> > > + * bit.
> > > + */
> > > + if (toast_insert)
> > > + toptxn->txn_flags |= RBTXN_HAS_TOAST_INSERT;
> > > + else if (rbtxn_has_toast_insert(txn) &&
> > > + ChangeIsInsertOrUpdate(change->action))
> > > + {
> > >
> > > Here, it might better to add a comment on why we expect only
> > > Insert/Update?  Also, it might be better that we add an assert for
> > > other operations.
> >
> > I have added comments that why on Insert/Update we clean the flag.
> > But I don't think we only expect insert/update,  we might get the
> > toast delete right? because in toast update we will do toast delete +
> > toast insert.  So when we get toast delete we just don't want to do
> > anything.
> >
>
> Okay, that makes sense.
>
> > >
> > > 2.
> > > @@ -1865,8 +1920,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> > > ReorderBufferTXN *txn,
> > >   * disk.
> > >   */
> > >   dlist_delete(&change->node);
> > > - ReorderBufferToastAppendChunk(rb, txn, relation,
> > > -   change);
> > > + ReorderBufferToastAppendChunk(rb, txn, relation,
> > > +   change);
> > >   }
> > >
> > > This seems to be a spurious change.
> >
> > Done
> >
> > 2. There is a bug fix in handling the stream abort in 0008 (earlier it
> > was 0006).
> >
>
> The code changes look fine but it is not clear what was the exact
> issue.  Can you explain?

Basically, in case of an empty subtransaction, we were reading the
subxacts info but when we could not find the subxid in the subxacts
info we were not releasing the memory.  So on next subxact_info_read
it will expect that subxacts should be freed but we did not free it in
that !found case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: some grammar refactoring

2020-05-26 Thread Peter Eisentraut


On 2020-05-25 21:09, Mark Dilger wrote:

I don't think it moves the needle too much, either.  But since your patch is 
entirely a refactoring patch and not a feature patch, I thought it would be 
fair to ask larger questions about how the code should be structured.  I like 
using enums and switch statements and getting better error messages, but there 
doesn't seem to be any fundamental reason why that should be in the command 
execution step.  It feels like a layering violation to me.


Most utility commands don't have an intermediate parse analysis pass. 
They just go straight from the grammar to the execution.  Maybe that 
could be rethought, but that's the way it is now.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: password_encryption default

2020-05-26 Thread Peter Eisentraut


On 2020-05-25 17:57, Jonathan S. Katz wrote:

I took a look over, it looks good. One question on the initdb.c diff:

-   if (strcmp(authmethodlocal, "scram-sha-256") == 0 ||
-   strcmp(authmethodhost, "scram-sha-256") == 0)
-   {
-   conflines = replace_token(conflines,
- 
"#password_encryption = md5",
- 
"password_encryption = scram-sha-256");
-   }
-

Would we reverse this, i.e. if someone chooses authmethodlocal to be
"md5", we would then set "password_encryption = md5"?


Yeah, I was too enthusiastic about removing that.  Here is a better patch.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From fdf1fdd396073307e917a4eaccb58427926f2312 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 26 May 2020 10:08:22 +0200
Subject: [PATCH v2] Change default of password_encryption to scram-sha-256

Discussion: 
https://www.postgresql.org/message-id/flat/d5b0ad33-7d94-bdd1-caac-43a1c782cab2%402ndquadrant.com
---
 doc/src/sgml/config.sgml  | 12 +++-
 src/backend/commands/user.c   |  2 +-
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/initdb/initdb.c   | 14 ++
 5 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a2694e548a..9cbaff0c51 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1013,11 +1013,13 @@ Authentication
   

 When a password is specified in  or
-, this parameter determines the 
algorithm
-to use to encrypt the password. The default value is 
md5,
-which stores the password as an MD5 hash (on is also
-accepted, as alias for md5). Setting this parameter 
to
-scram-sha-256 will encrypt the password with 
SCRAM-SHA-256.
+, this parameter determines the
+algorithm to use to encrypt the password.  Possible values are
+scram-sha-256, which will encrypt the password with
+SCRAM-SHA-256, and md5, which stores the password
+as an MD5 hash.  (on is also accepted, as an alias
+for md5.)  The default is
+scram-sha-256.


 Note that older clients might lack support for the SCRAM authentication
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 1ef00d6e89..9ce9a66921 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -43,7 +43,7 @@ Oid   binary_upgrade_next_pg_authid_oid = 
InvalidOid;
 
 
 /* GUC parameter */
-intPassword_encryption = PASSWORD_TYPE_MD5;
+intPassword_encryption = PASSWORD_TYPE_SCRAM_SHA_256;
 
 /* Hook to check passwords in CreateRole() and AlterRole() */
 check_password_hook_type check_password_hook = NULL;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2f3e0a70e0..390d5d9655 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4735,7 +4735,7 @@ static struct config_enum ConfigureNamesEnum[] =
 "this parameter determines 
whether the password is to be encrypted.")
},
&Password_encryption,
-   PASSWORD_TYPE_MD5, password_encryption_options,
+   PASSWORD_TYPE_SCRAM_SHA_256, password_encryption_options,
NULL, NULL, NULL
},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample 
b/src/backend/utils/misc/postgresql.conf.sample
index 995b6ca155..120a75386c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -88,7 +88,7 @@
 # - Authentication -
 
 #authentication_timeout = 1min # 1s-600s
-#password_encryption = md5 # md5 or scram-sha-256
+#password_encryption = scram-sha-256   # scram-sha-256 or md5
 #db_user_namespace = off
 
 # GSSAPI using Kerberos
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 67021a6dc1..b1f49abe36 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1198,12 +1198,18 @@ setup_config(void)
  
"#update_process_title = off");
 #endif
 
-   if (strcmp(authmethodlocal, "scram-sha-256") == 0 ||
-   strcmp(authmethodhost, "scram-sha-256") == 0)
+   /*
+* Change password_encryption setting to md5 if md5 was chosen as an
+* authentication method, unless scram-sha-256 was also chosen.
+*/
+   if ((strcmp(authmethodlocal, "md5") == 0 &&
+strcmp(authmethodhost, "scram-sha-256") != 0) ||
+   (strcmp(authmethodhost, "md5")

Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-26 Thread Andy Fan

Consider the below example:

create table j1(i int, im5 int,  im100 int, im1000 int);
insert into j1 select i, i%5, i%100, i%1000 from generate_series(1,
1000)i;
create index j1_i_im5 on j1(i, im5);
create index j1_i_im100 on j1(i, im100);
analyze j1;
explain select * from j1 where i = 100 and im5 = 5;

We may get the plan like this:

demo=# explain select  * from  j1 where i = 100 and im5 = 1;
  QUERY PLAN
--
 Index Scan using j1_i_im100 on j1  (cost=0.43..8.46 rows=1 width=16)
   Index Cond: (i = 100)
   Filter: (im5 = 1)
(3 rows)

At this case, optimizer can estimate there are only 1 row to return, so both
indexes have same cost, which one will be choose is un-controlable. This is
fine for above query based on the estimation is accurate. However estimation
can't be always accurate in real life. Some inaccurate estimation can cause
an
wrong index choose. As an experience, j1_i_im5 index should always be choose
for above query.

This one line change is the best method I can think.

-   cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+  cpu_per_tuple = cpu_tuple_cost + (qpqual_cost.per_tuple * 1.001);

We make the qual cost on index filter is slightly higher than qual cost in
Index
Cond. This will also good for QUAL (i=x AND m=y AND n=z). Index are (i, m,
other_col1) and (i, other_col1, other_col2).  But this change also
changed the relation between the qual cost on index scan and qual cost on
seq
scan. However I think that impact is so tiny that I think we can ignore
that (we
can choose a better factor between 1 and 1.001).

Even the root cause of this issue comes from an inaccurate estimation. but I
don't think that is an issue easy/possible to fix, however I'm open for
suggestion on that as well.

Any suggestions?

-- 
Best Regards
Andy Fan

Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP

2020-05-26 Thread Noah Misch

On Wed, Feb 05, 2020 at 11:45:52PM -0800, Noah Misch wrote:
> Would anyone like to fix this?  I could add it to my queue, but it would wait
> a year or more.

Commit f332241 fixed this.

Re: Remove page-read callback from XLogReaderState.

2020-05-26 Thread Kyotaro Horiguchi

At Wed, 22 Apr 2020 10:12:46 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> cd12323440 conflicts with this. Rebased.

b060dbe000 is conflicting.  I gave up isolating XLogOpenSegment from
xlogreader.c, since the two are tightly coupled than I thought.

This patch removes all the three callbacks (open/close/page_read) in
XL_ROUTINE from XLogReaderState.  It only has "cleanup" callback
instead.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From fd04ff06de3ab7bcc36f0c80fff42452e49e0259 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Sep 2019 20:21:55 +0900
Subject: [PATCH v10 1/4] Move callback-call from ReadPageInternal to
 XLogReadRecord.

The current WAL record reader reads page data using a call back
function.  Although it is not so problematic alone, it would be a
problem if we are going to do add tasks like encryption which is
performed on page data before WAL reader reads them. To avoid that the
record reader facility has to have a new code path corresponds to
every new callback, this patch separates page reader from WAL record
reading facility by modifying the current WAL record reader to a state
machine.

As the first step of that change, this patch moves the page reader
function out of ReadPageInternal, then the remaining tasks of the
function are taken over by the new function XLogNeedData. As the
result XLogPageRead directly calls the page reader callback function
according to the feedback from XLogNeedData.
---
 src/backend/access/transam/xlog.c   |  16 +-
 src/backend/access/transam/xlogreader.c | 281 ++--
 src/backend/access/transam/xlogutils.c  |  12 +-
 src/backend/replication/walsender.c |  10 +-
 src/bin/pg_rewind/parsexlog.c   |  21 +-
 src/bin/pg_waldump/pg_waldump.c |   8 +-
 src/include/access/xlogreader.h |  25 ++-
 src/include/access/xlogutils.h  |   2 +-
 8 files changed, 222 insertions(+), 153 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca09d81b08..da468598e4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -909,7 +909,7 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 XLogSource source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source);
-static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+static bool	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
@@ -4329,7 +4329,6 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 	XLogRecord *record;
 	XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
 
-	/* Pass through parameters to XLogPageRead */
 	private->fetching_ckpt = fetching_ckpt;
 	private->emode = emode;
 	private->randAccess = (xlogreader->ReadRecPtr == InvalidXLogRecPtr);
@@ -11854,7 +11853,7 @@ CancelBackup(void)
  * XLogPageRead() to try fetching the record from another source, or to
  * sleep and retry.
  */
-static int
+static bool
 XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 			 XLogRecPtr targetRecPtr, char *readBuf)
 {
@@ -11913,7 +11912,8 @@ retry:
 			readLen = 0;
 			readSource = XLOG_FROM_ANY;
 
-			return -1;
+			xlogreader->readLen = -1;
+			return false;
 		}
 	}
 
@@ -12008,7 +12008,8 @@ retry:
 		goto next_record_is_invalid;
 	}
 
-	return readLen;
+	xlogreader->readLen = readLen;
+	return true;
 
 next_record_is_invalid:
 	lastSourceFailed = true;
@@ -12022,8 +12023,9 @@ next_record_is_invalid:
 	/* In standby-mode, keep trying */
 	if (StandbyMode)
 		goto retry;
-	else
-		return -1;
+
+	xlogreader->readLen = -1;
+	return false;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 5995798b58..399bad0603 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -36,8 +36,8 @@
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...)
 			pg_attribute_printf(2, 3);
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
-static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
-			 int reqLen);
+static bool XLogNeedData(XLogReaderState *state, XLogRecPtr pageptr,
+		 int reqLen, bool header_inclusive);
 static void XLogReaderInvalReadState(XLogReaderState *state);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
@@ -274,7 +274,6 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
 	uint32		targetRecOff;
 	uint32		pageHeaderSize;
 	bool		gotheader;
-	int			readOf

Re: New Feature Request

2020-05-26 Thread Konstantin Knizhnik





On 26.05.2020 04:47, Tomas Vondra wrote:

On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:

On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
I am reposting this from a few months back (see below). I am not 
trying to be a
pest, just very motivated. I really think this feature has merit, 
and if not
generally worthwhile, I'd be willing to pay someone to code it for 
me as I
don't have strong enough C skills to modify the PostgreSQL code 
myself. So
anyone who might have such skills that would be interested, please 
contact me:

bertscal...@gmail.com.


I think your best bet is to try getting someone to write a hook
that will do the replacement so that you don't need to modify too much
of the Postgres core code.  You will need to have the hook updated for
new versions of Postgres, which adds to the complexity.



I don't think we have a hook to tweak the incoming SQL, though. We only
have post_parse_analyze_hook, i.e. post-parse, at which point we can't
just rewrite the SQL directly. So I guess we'd need new hook.


VOPS extension performs query substitution (replace query to the 
original table with query to projection) using post_parse_analysis_hook

and SPI. So I do not understand why  some extra hook is needed.



I do however wonder if an earlier hook is a good idea at all - matching
the SQL directly seems like a rather naive approach that'll break easily
due to formatting, upper/lower-case, subqueries, and many other things.
From this standpoint it seems actually better to inspect and tweak the
parse-analyze result. Not sure how to define the rules easily, though.



In some cases we need to know exact parameter value (as in case 
SUBSTRING(column,1,3) = 'ABC').

Sometime concrete value of parameter is not important...
Also it is not clear where such pattern-matching transformation should 
be used only for the whole query or for any its subtrees?



As for the complexity, I think hooks are fairly low-maintenance in
practice, we tend not to modify them very often, and when we do it's
usually just adding an argument etc.


I am not sure if the proposed approach can really be useful in many cases.
Bad queries are used to be generated by various ORM tools.
But them rarely generate exactly the same query. So defining matching 
rules for the whole query tree will rarely work.


It seems to be more useful to have extensible SQL optimizer, which 
allows to add user defined rules (may as transformation patterns).

This is how it is done in GCC code optimizer.
Definitely writing such rules is very non-trivial task.
Very few developers will be able to add their own meaningful rules.
But in any case it significantly simplify improvement of optimizer, 
although most of problems with choosing optimal plan are

caused by wrong statistic and rue-based optimization can not help here.

61 matches

Mail list logo