date:20180827

Re: Would it be possible to have parallel archiving?

2018-08-27 Thread Alexander Kukushkin

Hi,

There is the archive_status directory in pg_wal, and if there are
files with suffixes ".ready", you can archive not only the file which
was requested, but quite a few more if there are  ".ready" files
available. After that you have to rename ".ready" to ".done". Postgres
will not call archive_command for files which already marked as
".done".

I think most of the good backup tools already doing that. For example,
pgBackRest, wal-e, wal-g (just named the tools I was working with)/

Regards,
--
Alexander Kukushkin

Would it be possible to have parallel archiving?

2018-08-27 Thread hubert depesz lubaczewski

Hi,
I'm in a situation where we quite often generate more WAL than we can
archive. The thing is - archiving takes long(ish) time but it's
multi-step process and includes talking to remote servers over network.

I tested that simply by running archiving in parallel I can easily get
2-3 times higher throughput.

But - I'd prefer to keep postgresql knowing what is archived, and what
not, so I can't do the parallelization on my own.

So, the question is: is it technically possible to have parallel
archivization, and would anyone be willing to work on it (sorry, my
c skills are basically none, so I can't realistically hack it myself)

Best regards,

depesz

Re: First steps to being a contributer

2018-08-27 Thread Heikki Linnakangas


On 28/08/18 03:39, Daniel Wood wrote:

Having quit Amazon, where I was doing Postgres development, I've
started looking at various things I might work on for fun. One
thought is to start with something easy like the scalability of
GetSnapshotData(). :-)


Cool! :-)


I recently found it interesting to examine performance while running
near 1 million pgbench selects per sec on a 48 core/96 HT Skylake
box. I noticed that additional sessions trying to connect were timing
out when they got stuck in ProcArrayAdd trying to get the
ProcArrayLock in EXCLUSIVE mode. FYI, scale 1 with 2048 clients.

The question is whether it is possible that the problem with
GetSnapshotData() has reached a critical point, with respect to
snapshot scaling, on the newest high end systems.


Yeah, GetSnapshotData() certainly becomes a bottleneck in certain workloads.


What I'd like is a short cut to any of the current discussions of
various ideas to improve snapshot scaling. I have some of my own
ideas but want to review things before posting them.


The main solution we've been discussing on -hackers over the last few 
years is changing the way snapshots work, to use a Commit Sequence 
Number. If we assign each transaction an CSN, then a snapshot is just a 
single integer, and GetSnapshotData() just needs to read the current 
value of the CSN counter. CSNs have problems of their own, of course 
:-). If you search the archives for "CSN", you'll find several threads 
on that.


Other less invasive ideas have also been thrown around. For example, 
when one backend acquires a snapshot, it could store a copy of that in 
shared memory. The next call to GetSnapshotData() could then just 
memcpy() the cached snapshot. Transaction commit would need to 
invalidate the cached copy. This helps, if you have a lot reads and few 
writes.


- Heikki

Re: Flexible configuration for full-text search

2018-08-27 Thread Aleksandr Parfenov

On Fri, 24 Aug 2018 18:50:38 +0300
Alexander Korotkov  wrote:
>Agreed, backward compatibility is important here.  Probably we should
>leave old dictionaries for that.  But I just meant that if we
>introduce new (better) way of stop words handling and encourage users
>to use it, then it would look strange if default configurations work
>the old way...

I agree with Alexander. The only drawback I see is that after addition
of new dictionaries, there will be 3 dictionaries for each language: old
one, stop-word filter for the language, and stemmer dictionary.

Also, the new approach will solve ambiguity in case of 'simple'
dictionary. Currently, it filters stop-words for the language, which
was selected during DB initialization.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: typcache.c typos

2018-08-27 Thread Amit Langote

Hi,

On 2018/08/28 13:42, David Rowley wrote:
> On 28 August 2018 at 16:40, David Rowley  wrote:
>> I've attached a patch to fix a typo in typcache.c. I ended up also
>> rephrasing the sentence since "information about data types that is",
>> almost made me also want to change "is" to "are", since "types" is
>> plural. That would have been a mistake since it's talking about the
>> information and not the types.
> 
> Opps. I mistakenly attached the incorrect patch. The correct one is
> attached to this email.

- * The type cache exists to speed lookup of certain information about data

[ ... ]

+ * The type cache exists to speedup lookups of certain data type information

Sorry if I'm being ignorant, but shouldn't it be "to speed up or to
speed-up" instead of "to speedup?

Thanks,
Amit

Re: typcache.c typos

2018-08-27 Thread David Rowley

On 28 August 2018 at 16:40, David Rowley  wrote:
> I've attached a patch to fix a typo in typcache.c. I ended up also
> rephrasing the sentence since "information about data types that is",
> almost made me also want to change "is" to "are", since "types" is
> plural. That would have been a mistake since it's talking about the
> information and not the types.

Opps. I mistakenly attached the incorrect patch. The correct one is
attached to this email.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


typcache_typos_v2.patch
Description: Binary data

typcache.c typos

2018-08-27 Thread David Rowley

I've attached a patch to fix a typo in typcache.c. I ended up also
rephrasing the sentence since "information about data types that is",
almost made me also want to change "is" to "are", since "types" is
plural. That would have been a mistake since it's talking about the
information and not the types.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


typcache_typos.patch
Description: Binary data

RE: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-27 Thread Shinoda, Noriyoshi (PN Japan GCS Delivery)

Hi Everyone, thank you for your comment. 

>> I like the direction of your thinking, but it seems to me that this
>> would cause a problem if you want to set search_path=foo,bar.
>... OPTIONS ( host 'remhost1', port '5433', dbname 'demodb', 
>option='option1=foo', option='option2=bar' );

I do not want to allow postgres_fdw to set "all" GUCs. Because I think it is 
very difficult to check the validity of all input values.
So, I attached modified the patch so that we can easily add GUC that we can set 
to postgres_fdw module.
If you wish to add more GUC options to the module, add values to the 
non_libpq_options[] array of the InitPgFdwOptions function,
And add the validator code for the GUC in the postgres_fdw_validator function.

Best Regards,
Noriyoshi Shinoda

-
From: Fabrízio de Royes Mello [mailto:fabriziome...@gmail.com] 
Sent: Tuesday, August 28, 2018 3:53 AM
To: [pgdg] Robert Haas 
Cc: mich...@paquier.xyz; Shinoda, Noriyoshi (PN Japan GCS Delivery) 
; Pgsql Hackers 
Subject: Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module



On Mon, Aug 27, 2018 at 3:35 PM Robert Haas  wrote:
>
> On Mon, Aug 27, 2018 at 1:29 AM Michael Paquier  wrote:
> > It seems to me that you would pass down just a string which gets
> > allocated for "options", and injection risks are something to be careful
> > about.  Another possibility would be an array with comma-separated
> > arguments, say:
> > options = 'option1=foo,option2=bar'
> > There is already some work done with comma-separated arguments for the
> > parameter "extensions", now that's more work.
>
> I like the direction of your thinking, but it seems to me that this
> would cause a problem if you want to set search_path=foo,bar.
>

Maybe we can use multiple "options". Something like:

... OPTIONS ( host 'remhost1', port '5433', dbname 'demodb', 
option='option1=foo', option='option2=bar' );

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


postgres_fdw_work_mem_v3.patch
Description: postgres_fdw_work_mem_v3.patch

Re: Pluggable Storage - Andres's take

2018-08-27 Thread Haribabu Kommi

On Fri, Aug 24, 2018 at 12:50 PM Andres Freund  wrote:

> Hi,
>
> On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
> > On Tue, Aug 21, 2018 at 6:59 PM Andres Freund 
> wrote:
> >
> > > On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > > > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund 
> wrote:
> > > > > I'm currently in the process of rebasing zheap onto the pluggable
> > > > > storage work. The goal, which seems to work surprisingly well, is
> to
> > > > > find issues that the current pluggable storage patch doesn't yet
> deal
> > > > > with.  I plan to push a tree including a lot of fixes and
> improvements
> > > > > soon.
> > > > >
> > > > That's good. Did you find any problems in porting zheap into
> pluggable
> > > > storage? Does it needs any API changes or new API requirement?
> > >
> > > A lot, yes. The big changes are:
> > > - removal of HeapPageScanDesc
> > > - introduction of explicit support functions for tablesample & bitmap
> scans
> > > - introduction of callbacks for vacuum_rel, cluster
> > >
> > > And quite a bit more along those lines.
> > >
> >
> > OK. Those are quite a bit of changes.
>
> I've pushed a current version of that to my git tree to the
> pluggable-storage branch. It's not really a version that I think makese
> sense to review or such, but it's probably more useful if you work based
> on that.  There's also the pluggable-zheap branch, which I found
> extremely useful to develop against.
>

OK. Thanks, will check that also.


> There's a few further changes since last time: - Pluggable handlers are
> now stored in static global variables, and thus do not need to be copied
> anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
> actual copying. The various previous rewrite callbacks imo integrated at
> the wrong level.  - there's a GUC that allows to change the default
> table AM - moving COPY to use slots (roughly based on your / Haribabu's
> patch) - removed the AM specific shmem initialization callbacks -
> various AMs are going to need the syncscan APIs, so moving that into AM
> callbacks doesn't make sense.
>

OK.


> Missing:
> - callback for the second scan of CREATE INDEX CONCURRENTLY
> - commands/analyze.c integration (Working on it)
> - fixing your (Haribabu's) slotification of copy patch to compute memory
>   usage somehow
>

I will check it.


> - table creation callback, currently the pluggable-zheap patch has a few
>   conditionals outside of access/zheap for that purpose (see
> RelationTruncate
>

I will check it.


> And then:
> - lotsa cleanups
> - rebasing onto a newer version of the abstract slot patchset
> - splitting out smaller patches
>
>
> You'd moved the bulk insert into tableam callbacks - I don't quite get
> why? There's not really anything AM specific in that code?
>

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.


> > > > Does the new TupleTableSlot abstraction patches has fixed any of
> these
> > > > issues in the recent thread [1]? so that I can look into the change
> of
> > > > FDW API to return slot instead of tuple.
> > >
> > > Yea, that'd be a good thing to start with.
> > >
> >
> > I found out only the RefetchForeignRow API needs the change and done the
> > same.
> > Along with that, I fixed all the issues of  running make check-world.
> > Attached patches
> > for the same.
>
> Thanks, that's really helpful!  I'll try to merge these soon.
>

I can share the rebased patches for the fixes, so that it will be easy to
merge.

I'm starting to think that we're getting closer to something that
> looks right from a high level, even though there's a lot of details to
> clean up.
>

That's good.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: Why hash OIDs?

2018-08-27 Thread Andres Freund

On 2018-08-28 14:45:49 +1200, Thomas Munro wrote:
> On Tue, Aug 28, 2018 at 2:26 PM Tom Lane  wrote:
> > Andres Freund  writes:
> > > On 2018-08-28 13:50:43 +1200, Thomas Munro wrote:
> > >> What bad thing would happen if we used OIDs directly as hash values in
> > >> internal hash tables (that is, instead of uint32_hash() we'd use
> > >> uint32_identity(), or somehow optimise it away entirely, as you can
> > >> see in some C++ standard libraries for eg std::hash)?
> >
> > > Oids are very much not equally distributed, so in all likelihood you'd
> > > get cases very you currently have a reasonably well averaged out usage
> > > of the hashtable, not be that anymore.
> >
> > Right.  In particular, most of our hash usages assume that all bits of
> > the hash value are equally "random", so that we can just mask off the
> > lowest N bits of the hash and not get values that are biased towards
> > particular hash buckets.  It's unlikely that raw OIDs would have that
> > property.
> 
> Yeah, it would be a terrible idea as a general hash function for use
> in contexts where the "avalanche effect" assumption is made about
> information being spread out over the bits (the HJ batching code
> wouldn't work for example).  I was wondering specifically about the
> limited case of hash tables that are used to look things up in caches.

I don't understand why that'd be ok there? With a simple 1:1 hash
function, which you seem to advocate, many hash-tables would be much
fuller in the 1000-3000 (where pg_class, etc all reside) than in any
other range.  A lot of our tables are populated on-demand, so you'd
often end up with most of the data in one or two buckets, and a larger
number largely empty.

Greetings,

Andres Freund

Re: Refactor textToQualifiedNameList()

2018-08-27 Thread Kyotaro HORIGUCHI

Hello.

At Fri, 24 Aug 2018 20:44:12 +0900, Yugo Nagata  wrote in 
<20180824204412.150979ae6b283ddb639f9...@sraoss.co.jp>
> When working on other patch[1], I found there are almost same
> functions, texttoQualifiedNameList() and stringToQualifiedNameList().
> The only difference is the argument type, text or char*. I don't know
> why these functions are defined seperately, but I think the former 
> function can be rewritten using the latter code as the attached patch.
> Is this reasonable fix?

The functions were introduced within a month for different
objectives in March and April, 2002. I supppose that they are
intentionally added as separate functions for simplicitly since
the second one is apparent CnP'ed from the first one.

commit 5f4745adf4fb2a1f933b25d7a2bc72b39fa9edfd
commit 52200befd04b9fa71da83231c808764867079226

Returning to the patch, the downside of it is that textToQNL
makes an extra and unused copy of the parameter string. (It's a
kind of bug that it is forgetting to free rawname.)

Maybe we can separate them into three functions (or one function
and two macros) to get rid of the duplication but I'm not sure
it's worth doing..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Why hash OIDs?

2018-08-27 Thread Thomas Munro

On Tue, Aug 28, 2018 at 2:26 PM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2018-08-28 13:50:43 +1200, Thomas Munro wrote:
> >> What bad thing would happen if we used OIDs directly as hash values in
> >> internal hash tables (that is, instead of uint32_hash() we'd use
> >> uint32_identity(), or somehow optimise it away entirely, as you can
> >> see in some C++ standard libraries for eg std::hash)?
>
> > Oids are very much not equally distributed, so in all likelihood you'd
> > get cases very you currently have a reasonably well averaged out usage
> > of the hashtable, not be that anymore.
>
> Right.  In particular, most of our hash usages assume that all bits of
> the hash value are equally "random", so that we can just mask off the
> lowest N bits of the hash and not get values that are biased towards
> particular hash buckets.  It's unlikely that raw OIDs would have that
> property.

Yeah, it would be a terrible idea as a general hash function for use
in contexts where the "avalanche effect" assumption is made about
information being spread out over the bits (the HJ batching code
wouldn't work for example).  I was wondering specifically about the
limited case of hash tables that are used to look things up in caches.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-27 Thread Michael Paquier

On Mon, Aug 27, 2018 at 02:34:31PM -0400, Robert Haas wrote:
> I like the direction of your thinking, but it seems to me that this
> would cause a problem if you want to set search_path=foo,bar.

proconfig already handles such cases, that's what I was coming at.  We
could reuse most of what it does, no?
--
Michael


signature.asc
Description: PGP signature

Re: Postmaster doesn't send SIGTERM to bgworker during fast shutdown when pmState == PM_STARTUP

2018-08-27 Thread Michael Paquier

On Sun, Aug 26, 2018 at 06:00:23PM +0200, Alexander Kukushkin wrote:
> it is possible to start bgworker with bgw_start_time =
> BgWorkerStart_PostmasterStart, which will be started immediately after
> postmaster.

Right.

> But if you try to do a fast shutdown while postmaster still in the
> pmState == PM_STARTUP, bgworker will never get SIGTERM and postmaster
> will wait forever.
> At the same time, if you do immediate or smart shutdown, it works fine.
> 
> The problem is in the pmdie function. Proposed fix attached.

That seems like a good catch and a correct fix to me.  The handling of
SIGINT is inconsistent with SIGTERM in pmdie().  I would just add a
comment to mention that at this stage only the startup process is
running, and that it has been signaled already.  I'll commit that
tomorrow.
--
Michael


signature.asc
Description: PGP signature

Re: Why hash OIDs?

2018-08-27 Thread Tom Lane

Andres Freund  writes:
> On 2018-08-28 13:50:43 +1200, Thomas Munro wrote:
>> What bad thing would happen if we used OIDs directly as hash values in
>> internal hash tables (that is, instead of uint32_hash() we'd use
>> uint32_identity(), or somehow optimise it away entirely, as you can
>> see in some C++ standard libraries for eg std::hash)?

> Oids are very much not equally distributed, so in all likelihood you'd
> get cases very you currently have a reasonably well averaged out usage
> of the hashtable, not be that anymore.

Right.  In particular, most of our hash usages assume that all bits of
the hash value are equally "random", so that we can just mask off the
lowest N bits of the hash and not get values that are biased towards
particular hash buckets.  It's unlikely that raw OIDs would have that
property.

regards, tom lane

Re: Why hash OIDs?

2018-08-27 Thread Andres Freund

On 2018-08-28 13:50:43 +1200, Thomas Munro wrote:
> I'm curious about something which may qualify as a stupid question.
> 
> What bad thing would happen if we used OIDs directly as hash values in
> internal hash tables (that is, instead of uint32_hash() we'd use
> uint32_identity(), or somehow optimise it away entirely, as you can
> see in some C++ standard libraries for eg std::hash)?

Oids are very much not equally distributed, so in all likelihood you'd
get cases very you currently have a reasonably well averaged out usage
of the hashtable, not be that anymore.

It's also fairly cheap to hash an oid.

> However, as far as I can see OIDs are expected to have an even
> distribution (or at least we don't expect regular sized gaps), so the
> hazard doesn't seem to apply.

Huh? Oids between, say, 1 and FirstNormalObjectId, are vastly more
common than the rest. And even after that, individual tables get large
clusters of sequential values to the global oid counter.

Greetings,

Andres Freund

Why hash OIDs?

2018-08-27 Thread Thomas Munro

Hello,

I'm curious about something which may qualify as a stupid question.

What bad thing would happen if we used OIDs directly as hash values in
internal hash tables (that is, instead of uint32_hash() we'd use
uint32_identity(), or somehow optimise it away entirely, as you can
see in some C++ standard libraries for eg std::hash)?  From what
I know of this subject, the main risk is that, in general, the
distribution of integer keys stored in a hash table might
*accidentally* happen to have regular gaps with a stride that shares a
common factor with the number of buckets leading to an unbalanced hash
table (memory addresses are an example because they tend to have a
stride corresponding to hardware alignment rules, as are some
distributed sequence generation tricks), whereas it probably takes a
non-accidental attack to get higher collision rates with a decent hash
function.  A good hash function would defend against that kind of
thing (but cost cycles to compute), and alternatively a prime number
of buckets would defend against it (but cost more cycles to %).
However, as far as I can see OIDs are expected to have an even
distribution (or at least we don't expect regular sized gaps), so the
hazard doesn't seem to apply.

One problem could be that, although collisions are not expected with
high probability when the hash table is big enough, when they do
occur, hash tables using linear probing-based collision strategies
(simplehash, but not dynahash) probably work less well if the chance
of non-empty buckets having non-empty neighbours (AKA clustering)
increases.  I'm not sure whether to consider OIDs to be clustered in
general or not (though obviously the ones created by initdb are).

-- 
Thomas Munro
http://www.enterprisedb.com

First steps to being a contributer

2018-08-27 Thread Daniel Wood

Having quit Amazon, where I was doing Postgres development, I've started 
looking at various things I might work on for fun. One thought is to start with 
something easy like the scalability of GetSnapshotData(). :-)


I recently found it interesting to examine performance while running near 1 
million pgbench selects per sec on a 48 core/96 HT Skylake box. I noticed that 
additional sessions trying to connect were timing out when they got stuck in 
ProcArrayAdd trying to get the ProcArrayLock in EXCLUSIVE mode. FYI, scale 
1 with 2048 clients.


The question is whether it is possible that the problem with GetSnapshotData() 
has reached a critical point, with respect to snapshot scaling, on the newest 
high end systems.


I didn't have time to complete my analysis as I lost access to the hardware on 
my last day. It shouldn't cost me much more than about $6 per hour to do 
experiments on a 48 core system.


What I'd like is a short cut to any of the current discussions of various ideas 
to improve snapshot scaling. I have some of my own ideas but want to review 
things before posting them.

Re: simplehash.h comment

2018-08-27 Thread Thomas Munro

On Tue, Aug 28, 2018 at 1:40 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > ... and a patch.  Thoughts?
>
> It's not real obvious why you're adding inclusion guards over just a
> subset of the header.  A short comment about that would be worthwhile.

Right, thanks.  I added a comment and committed.

Obviously the longer term fix for that particular issue is not to
define those functions in that header at all.   If no one beats me to
it, I'd like to have another look at consolidating all our popcount()
and ffs() functions[1], and this fls()/log2(), pow2() stuff seems like
it might be part of the same refactoring expedition.  For now I just
added the missing include guards to unblock some other stuff I'm about
to propose that wants to use simplehash.h.

[1] 
https://www.postgresql.org/message-id/flat/CAEepm%3D3k%2B%2BYtf2LNQCvpP6m1%3DgY9zZHP_cfnn47%3DWTsoCrLCvA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Re[2]: doc - improve description of default privileges

2018-08-27 Thread Bradley DeJong


Hi Fabien,

Thanks for writing this up - in particular the psql backslash commands.

comments on the patch ...

1) I think that adding the "This privilege is abbreviated ... when 
displayed." lines to the privilege descriptions is redundant. The 
abbreviations are already listed after the "The entries shown by \dp are 
interpreted thus:" line. Just change that line to something like "The 
entries shown by the psql backslash commands, like \dp, are interpreted 
thus:".


2) I think that the psql command table should go with the current text 
on "Use psql's \dp command to obtain ..." rather than in the Examples 
section. It seems like changing the "For non-table objects there are 
other \d commands ..." line to an introductory comment like "The 
following table lists the \d commands that are used for non-table 
objects along with the default privileges granted to the object's owner 
and PUBLIC.


3) The table title, "Default hardcoded access privileges per object's 
type", seems incomplete because it does not mention the psql commands 
part of the table.



-- Original Message --
From: "Fabien COELHO" 
To: "PostgreSQL Developers" 
Sent: 8/4/2018 4:40:33 AM
Subject: Re: doc - improve description of default privileges



I have not found a convenient presentation of the default privileges 
for different objects, and how to display them (if possible, not 
always).


The information is partly provided within the GRANT description, and 
not very explicit: eg it is said that owners have all possible perms, 
but which they are is not said explicitely, although they are implied 
by the different GRANT sysnopsys. Then some objects are given perms 
for the PUBLIC.


The attached patch tries to improve the documentation, in particular 
with an added table to summarizes my findings, so that they are 
recorded somewhere.


The attached fixes the tablespace entry that I forgot to fill in full.

-- Fabien.

Re: FailedAssertion on partprune

2018-08-27 Thread Jonathan S. Katz


> On Aug 17, 2018, at 2:49 AM, David Rowley  
> wrote:
> 
> On 17 August 2018 at 06:52, Robert Haas  wrote:
>> I don't know whether there's actually a defect here any more.  I was
>> trying to dispel some perceived confusion on the part of David and Tom
>> about what this code was trying to accomplish, but the fact that the
>> code caused some confusion does not necessarily mean that there's
>> anything wrong with it.  On the other hand, perhaps it does have
>> functional deficiencies, or perhaps the comments need work.  Or maybe
>> this code is fine taken in isolation but there are still problems in
>> how it interacts with partition pruning.  Let's start by deciding what
>> we think the problems are, and then we can decide who should fix them.
> 
> I think Tom and I both agree that the plan mentioned in [1] and [2] is
> not quite as optimal as it could be. The investigation I did in [3]
> highlights where this got broken and the reason why it got broken.
> 
> Tom removed the offending Assert() in 59ef49d26d2, but mentioned
> there's no test case which tries to ensure having a partitioned table
> as an Append child works correctly.  I agree that it would be nice to
> have a test for this, but I did try a couple of times to come up with
> a test case to do this, but I was unable to come up with anything
> suitable compact for the regression tests, and even at that, given how
> difficult it is to get a plan into this shape, I didn't hold up much
> hope of the test's plan remaining stable. Tom seems to agree with this
> as he mentioned in [2].
> 
> I think the best path forward is if you (Robert) could verify the
> behaviour that I mentioned in [3] is expected behaviour, if it is then
> perhaps the path forward is for me to try harder to write a compact
> test for this, but if you discover that the behaviour is unexpected
> then I think the best course of action is to fix it to properly
> flatten the
> Append which will mean there's no need for a test case, and perhaps
> the Assert that was removed can be put back again.
> 
> [1] 
> https://www.postgresql.org/message-id/CAKJS1f9MQd5ntg6xXg7jJe0jB_bWvCDRmaH6yNzZGF%2BTVa5M%3DA%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/8600.1533765148%40sss.pgh.pa.us
> [3] 
> https://www.postgresql.org/message-id/CAKJS1f_j9BSJ7svDF7uid6EMm%2Bfu%2BcAhonZ7hcqiYU4ssv3rtg%40mail.gmail.com
>  
> 

On behalf of the RMT, I just want to make sure this keeps moving along.
It sounds like the next step is for Robert to verify that [3] is the expected
behavior and then David can decide what to do from there.

Thanks!

Jonathan



signature.asc
Description: Message signed with OpenPGP

Re: Memory leak with CALL to Procedure with COMMIT.

2018-08-27 Thread Jonathan S. Katz


> On Aug 27, 2018, at 5:13 PM, Peter Eisentraut 
>  wrote:
> 
> On 23/08/2018 17:35, Jonathan S. Katz wrote:
>> Applied the patch against head. make check passed, all the tests I ran
>> earlier in this thread passed as well.
>> 
>> Reviewed the code - looks to match above reasoning, and comments
>> are clear.
>> 
>> From my perspective it looks good, but happy to hear from someone
>> more familiar than me with this area of the code.
> 
> committed

Awesome, thank you! I’ve removed this from the Open Items list.

Jonathan




signature.asc
Description: Message signed with OpenPGP

Re: Memory leak with CALL to Procedure with COMMIT.

2018-08-27 Thread Peter Eisentraut

On 23/08/2018 17:35, Jonathan S. Katz wrote:
> Applied the patch against head. make check passed, all the tests I ran
> earlier in this thread passed as well.
> 
> Reviewed the code - looks to match above reasoning, and comments
> are clear.
> 
> From my perspective it looks good, but happy to hear from someone
> more familiar than me with this area of the code.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re[3]: Adding a note to protocol.sgml regarding CopyData

2018-08-27 Thread Bradley DeJong


On 2018-08-27, Fabien COELHO wrote ...
> ... references to the protocol version lacks homogeneity.
> ... I'd suggest to keep "the vX.0 protocol" for a short version,
> and "the version X.0 protocol" for long ...

I agree. Change made.


protocol.v3.patch
Description: Binary data

Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-27 Thread Fabrízio de Royes Mello

On Mon, Aug 27, 2018 at 3:35 PM Robert Haas  wrote:
>
> On Mon, Aug 27, 2018 at 1:29 AM Michael Paquier 
wrote:
> > It seems to me that you would pass down just a string which gets
> > allocated for "options", and injection risks are something to be careful
> > about.  Another possibility would be an array with comma-separated
> > arguments, say:
> > options = 'option1=foo,option2=bar'
> > There is already some work done with comma-separated arguments for the
> > parameter "extensions", now that's more work.
>
> I like the direction of your thinking, but it seems to me that this
> would cause a problem if you want to set search_path=foo,bar.
>

Maybe we can use multiple "options". Something like:

... OPTIONS ( host 'remhost1', port '5433', dbname 'demodb',
option='option1=foo', option='option2=bar' );

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: More parallel pg_dump bogosities

2018-08-27 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> And lastly, the ROW SECURITY items are ready because they are not marked
> >> with any dependency at all, none, nada.  This seems bad.  In principle
> >> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
> >> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
> >> practice that can't happen today, because CREATE TABLE commands get
> >> emitted before we've switched into parallel restore mode at all.  But it's
> >> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
> >> we've restored the table's data.  Won't that break things?
> 
> > We certainly wouldn't want the ROW SECURITY items to be emitted before
> > the table itself, that wouldn't work.  Emitting ENABLE RLS before
> > restoring the table's data shouldn't actually break anything though as
> > the owner of the table (which is who we're restoring the data as,
> > at that point, right?) will bypass RLS.
> 
> Hmm.  We'd typically be running a restore either as superuser or as the
> table owner ...

Right.

> > Now, if what's actually set is
> > FORCE RLS, then we may have an issue if that is emitted before the table
> > data since that would cause RLS to be in effect even if we're the owner
> > of the table.
> 
> ... but if FORCE RLS is applied to the table, we gotta problem anyway,
> because that command is included right with the initial table creation.
> Once the ENABLE comes out, inserts will fail, because there are no
> policies yet (those *do* have table dependencies that get moved to
> point at the TABLE DATA).

Yeah, I don't think the pg_dump aspect was considered very carefully
when we reworked how the 'force' RLS option works.

> So it's a bug all right, but a sufficiently corner-case one that it's
> not too surprising it hasn't been reported.  Even though the ENABLE RLS
> item is "ready to restore", it'll still be at the end of the work list,
> so you'd need a very high parallel worker count to have much chance of
> it coming out before the table's data item gets processed.

Ok.

> >> I think this is easy enough to fix, just force a dependency on the table
> >> to be attached to a ROW SECURITY item; but I wanted to confirm my
> >> conclusion that we need one.
> 
> > I do think we should have one.
> 
> I'll do that, but ...

Ok.

> > In an ideal world, I think we'd want:
> 
> > CREATE TABLE
> > TABLE DATA
> > ENABLE RLS
> > ADD RLS POLICIES
> > GRANT RIGHTS
> 
> ... as things stand, even with this fix, a parallel restore will emit
> CREATE POLICY and ENABLE RLS commands in an unpredictable order.  Not sure
> how much it matters, though.  The GRANTs should come out last anyway.

Yeah, as long as GRANT's come out last, it really shouldn't matter when
the ENABLE happens vs. the CREATE POLICY's, except in the odd FORCE
case.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-27 Thread Robert Haas

On Mon, Aug 27, 2018 at 1:29 AM Michael Paquier  wrote:
> It seems to me that you would pass down just a string which gets
> allocated for "options", and injection risks are something to be careful
> about.  Another possibility would be an array with comma-separated
> arguments, say:
> options = 'option1=foo,option2=bar'
> There is already some work done with comma-separated arguments for the
> parameter "extensions", now that's more work.

I like the direction of your thinking, but it seems to me that this
would cause a problem if you want to set search_path=foo,bar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: More parallel pg_dump bogosities

2018-08-27 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> And lastly, the ROW SECURITY items are ready because they are not marked
>> with any dependency at all, none, nada.  This seems bad.  In principle
>> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
>> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
>> practice that can't happen today, because CREATE TABLE commands get
>> emitted before we've switched into parallel restore mode at all.  But it's
>> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
>> we've restored the table's data.  Won't that break things?

> We certainly wouldn't want the ROW SECURITY items to be emitted before
> the table itself, that wouldn't work.  Emitting ENABLE RLS before
> restoring the table's data shouldn't actually break anything though as
> the owner of the table (which is who we're restoring the data as,
> at that point, right?) will bypass RLS.

Hmm.  We'd typically be running a restore either as superuser or as the
table owner ...

> Now, if what's actually set is
> FORCE RLS, then we may have an issue if that is emitted before the table
> data since that would cause RLS to be in effect even if we're the owner
> of the table.

... but if FORCE RLS is applied to the table, we gotta problem anyway,
because that command is included right with the initial table creation.
Once the ENABLE comes out, inserts will fail, because there are no
policies yet (those *do* have table dependencies that get moved to
point at the TABLE DATA).

So it's a bug all right, but a sufficiently corner-case one that it's
not too surprising it hasn't been reported.  Even though the ENABLE RLS
item is "ready to restore", it'll still be at the end of the work list,
so you'd need a very high parallel worker count to have much chance of
it coming out before the table's data item gets processed.

>> I think this is easy enough to fix, just force a dependency on the table
>> to be attached to a ROW SECURITY item; but I wanted to confirm my
>> conclusion that we need one.

> I do think we should have one.

I'll do that, but ...

> In an ideal world, I think we'd want:

> CREATE TABLE
> TABLE DATA
> ENABLE RLS
> ADD RLS POLICIES
> GRANT RIGHTS

... as things stand, even with this fix, a parallel restore will emit
CREATE POLICY and ENABLE RLS commands in an unpredictable order.  Not sure
how much it matters, though.  The GRANTs should come out last anyway.

regards, tom lane

Re: More parallel pg_dump bogosities

2018-08-27 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> So I started poking at the idea of sorting by size during parallel
> restore instead of sorting pg_dump's TOC that way.  While investigating
> just where to do that, I discovered that, using the regression database
> as test case, restore_toc_entries_parallel() finds these objects to
> be *immediately* ready to restore at the start of the parallel phase:
> 
> all TABLE DATA objects --- as expected
> all SEQUENCE SET objects --- as expected
> BLOBS --- as expected
> CONSTRAINT idxpart_another idxpart_another_pkey
> INDEX mvtest_aa
> INDEX mvtest_tm_type
> INDEX mvtest_tvmm_expr
> INDEX mvtest_tvmm_pred
> ROW SECURITY ec1
> ROW SECURITY rls_tbl
> ROW SECURITY rls_tbl_force
> 
> I wasn't expecting any POST_DATA objects to be ready at this point,
> so I dug into the reasons why these other ones are ready, and found
> that:
> 
> idxpart_another_pkey is an index on a partitioned table (new feature
> in v11).  According to the dump archive, it has a dependency on the
> partitioned table.  Normally, repoint_table_dependencies() would change
> an index's table dependency to reference the table's TABLE DATA item,
> preventing it from being restored before the data is loaded.  But a
> partitioned table has no TABLE DATA item, so that doesn't happen.
> I guess this is okay, really, but it's a bit surprising.
> 
> The other four indexes are on materialized views, which likewise don't
> have TABLE DATA items.  This means that when restoring materialized
> views, we make their indexes before we REFRESH the matviews.  I guess
> that's probably functionally okay (the same thing happens in non-parallel
> restores) but it's leaving some parallelism on the table, because it means
> more work gets crammed into the REFRESH action.  Maybe somebody would like
> to fix that.  I'm not volunteering right now, though.

Hrmpf, I agree, that certainly doesn't seem ideal.

> And lastly, the ROW SECURITY items are ready because they are not marked
> with any dependency at all, none, nada.  This seems bad.  In principle
> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
> practice that can't happen today, because CREATE TABLE commands get
> emitted before we've switched into parallel restore mode at all.  But it's
> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
> we've restored the table's data.  Won't that break things?

We certainly wouldn't want the ROW SECURITY items to be emitted before
the table itself, that wouldn't work.  Emitting ENABLE RLS before
restoring the table's data shouldn't actually break anything though as
the owner of the table (which is who we're restoring the data as,
at that point, right?) will bypass RLS.  Now, if what's actually set is
FORCE RLS, then we may have an issue if that is emitted before the table
data since that would cause RLS to be in effect even if we're the owner
of the table.

> I think this is easy enough to fix, just force a dependency on the table
> to be attached to a ROW SECURITY item; but I wanted to confirm my
> conclusion that we need one.

I do think we should have one.  In an ideal world, I think we'd want:

CREATE TABLE
TABLE DATA
ENABLE RLS
ADD RLS POLICIES
GRANT RIGHTS

Though, ultimately, those last two could be flipped since RLS has a
'default deny' policy if no policies exist on the table but RLS is
enabled.  We really shouldn't issue GRANT statements before ENABLE'ing
RLS though, since that might allow someone to access all the rows in the
table when they really should be limited to only some subset.  This all
only applies in the cases where we aren't running the restore in a
single transaction, of course, but that's implied by talking about
paralell restore.  I'm not sure about how much fun it would be to make
that all work that way though.  I do think we need the ENABLE RLS bits
to depend on the table to exist though.

Thanks!

Stephen

signature.asc
Description: PGP signature

More parallel pg_dump bogosities

2018-08-27 Thread Tom Lane

So I started poking at the idea of sorting by size during parallel
restore instead of sorting pg_dump's TOC that way.  While investigating
just where to do that, I discovered that, using the regression database
as test case, restore_toc_entries_parallel() finds these objects to
be *immediately* ready to restore at the start of the parallel phase:

all TABLE DATA objects --- as expected
all SEQUENCE SET objects --- as expected
BLOBS --- as expected
CONSTRAINT idxpart_another idxpart_another_pkey
INDEX mvtest_aa
INDEX mvtest_tm_type
INDEX mvtest_tvmm_expr
INDEX mvtest_tvmm_pred
ROW SECURITY ec1
ROW SECURITY rls_tbl
ROW SECURITY rls_tbl_force

I wasn't expecting any POST_DATA objects to be ready at this point,
so I dug into the reasons why these other ones are ready, and found
that:

idxpart_another_pkey is an index on a partitioned table (new feature
in v11).  According to the dump archive, it has a dependency on the
partitioned table.  Normally, repoint_table_dependencies() would change
an index's table dependency to reference the table's TABLE DATA item,
preventing it from being restored before the data is loaded.  But a
partitioned table has no TABLE DATA item, so that doesn't happen.
I guess this is okay, really, but it's a bit surprising.

The other four indexes are on materialized views, which likewise don't
have TABLE DATA items.  This means that when restoring materialized
views, we make their indexes before we REFRESH the matviews.  I guess
that's probably functionally okay (the same thing happens in non-parallel
restores) but it's leaving some parallelism on the table, because it means
more work gets crammed into the REFRESH action.  Maybe somebody would like
to fix that.  I'm not volunteering right now, though.

And lastly, the ROW SECURITY items are ready because they are not marked
with any dependency at all, none, nada.  This seems bad.  In principle
it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
practice that can't happen today, because CREATE TABLE commands get
emitted before we've switched into parallel restore mode at all.  But it's
definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
we've restored the table's data.  Won't that break things?

I think this is easy enough to fix, just force a dependency on the table
to be attached to a ROW SECURITY item; but I wanted to confirm my
conclusion that we need one.

regards, tom lane

Re: pg_dump test instability

2018-08-27 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> However, at least for the directory-format case (which I think is the
> >> only one supported for parallel restore), we could make it compare the
> >> file sizes of the TABLE DATA items.  That'd work pretty well as a proxy
> >> for both the amount of effort needed for table restore, and the amount
> >> of effort needed to build indexes on the tables afterwards.
> 
> > Parallel restore also works w/ custom-format dumps.
> 
> Really.  Well then the existing code is even more broken, because it
> only does this sorting for directory output:
> 
>   /* If we do a parallel dump, we want the largest tables to go first */
>   if (archiveFormat == archDirectory && numWorkers > 1)
>   sortDataAndIndexObjectsBySize(dobjs, numObjs);
> 
> so that parallel restore is completely left in the lurch with a
> custom-format dump.

Sorry for not being clear- it's only possible to parallel *dump* to a
directory-format dump, and the above code is for performing that
sort-by-size before executing a parallel dump.  One might wonder why
there's the check for archiveFormat at all though- numWorkers shouldn't
be able to be >1 except in the case where the archiveFormat supports
parallel dump, and if it supports parallel dump, then we should try to
dump out the tables largest-first.

Parallel *restore* can be done from either a custom-format dump or from
a directory-format dump.  I agree that we should seperate the concerns
and perform independent sorting on the restore side of things based on
the relative sizes of tables in the dump (be it custom format or
directory format).  While compression might make us not exactly correct
on the restore side, I expect that we'll generally be close enough to
avoid most cases where a single worker gets stuck working on a large
table at the end after all the other work is done.

> But I imagine we can get some measure of table data size out of a custom
> dump too.

I would think so.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-27 Thread Alexander Korotkov

Hi!

Thank you for feedback.

On Sun, Aug 26, 2018 at 4:09 AM Robert Haas  wrote:
> On Tue, Aug 21, 2018 at 9:10 AM, Alexander Korotkov
>  wrote:
> > After heap truncation using this algorithm, shared buffers may contain
> > past-OEF buffers.  But those buffers are empty (no used items) and
> > clean.  So, real-only queries shouldn't hint those buffers dirty
> > because there are no used items.  Normally, these buffers will be just
> > evicted away from the shared buffer arena.  If relation extension will
> > happen short after heap truncation then some of those buffers could be
> > found after relation extension.  I think this situation could be
> > handled.  For instance, we can teach vacuum to claim page as new once
> > all the tuples were gone.
>
> I think this all sounds pretty dangerous and fragile, especially in
> view of the pluggable storage work.  If we start to add new storage
> formats, deductions based on the specifics of the current heap's
> hint-bit behavior may turn out not to be valid.  Now maybe you could
> speculate that it won't matter because perhaps truncation will work
> differently in other storage formats too, but it doesn't sound to me
> like we'd be wise to bet on it working out that way.

Hmm, I'm not especially concerned about pluggable storages here.
Pluggable storages are deciding themselves how do they manage vacuum
including relation truncation if needed.  They might reuse or not
reuse function for relation truncation, which we have for heap.  The
thing we should do for that relation truncation function is
understandable and predictable interface.  So, if relation truncation
function cuts relation tailing pages, which are previously cleaned as
new.  For me, that looks fair enough.

The aspect I'm more concerned here about is whether we miss ability
for detecting some of IO errors, if we don't distinguish new pages
from pages whose tuples were removed by vacuum.

> IIRC, Andres had some patches revising shared buffer management that
> allowed the ending block number to be maintained in shared memory.
> I'm waving my hands here, but with that kind of a system you can
> imagine that maybe there could also be a flag bit indicating whether a
> truncation is in progress.  So, reduce the number of page and set the
> bit; then zap all the pages above that value that are still present in
> shared_buffers; then clear the bit.  Or maybe we don't even need the
> bit, but I think we do need some kind of race-free mechanism to make
> sure that we never try to read pages that either have been truncated
> away or in the process of being truncated away.

If we would have some pre-relation shared memory information, then we
can make it work even without special write barrier bit.  Instead we
can place a mark to the whole relation, which would say "please hold
on with writes past following pending truncation point".  Also having
ending block number in the shared memory can save us from trying to
read block past EOF.  So, I'm sure that when Andres work for revising
shared buffer management will be complete, we would be able to solve
these problems better.  But there is also a question of time.  As I
get, revising shared buffer management could not realistically get
committed to PostgreSQL 12.  And we have pretty nasty set of problems
here.  For me it would be nice to do something with them during this
release cycle.  But for sure, we should keep in the mind how this
solution should be revising once we have new shared buffer management.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pg_dump test instability

2018-08-27 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> However, at least for the directory-format case (which I think is the
>> only one supported for parallel restore), we could make it compare the
>> file sizes of the TABLE DATA items.  That'd work pretty well as a proxy
>> for both the amount of effort needed for table restore, and the amount
>> of effort needed to build indexes on the tables afterwards.

> Parallel restore also works w/ custom-format dumps.

Really.  Well then the existing code is even more broken, because it
only does this sorting for directory output:

/* If we do a parallel dump, we want the largest tables to go first */
if (archiveFormat == archDirectory && numWorkers > 1)
sortDataAndIndexObjectsBySize(dobjs, numObjs);

so that parallel restore is completely left in the lurch with a
custom-format dump.

But I imagine we can get some measure of table data size out of a custom
dump too.

regards, tom lane

Re: pg_dump test instability

2018-08-27 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Eisentraut  writes:
> > In a non-data_only dump, the order of the tables doesn't matter, because
> > the foreign keys are added at the very end.  In parallel dumps, the
> > tables are in addition sorted by size, so the resultant order is
> > different from a single-threaded dump.  This can be seen by comparing
> > the dumped TOCs of the defaults_dir_format and defaults_parallel cases.
> > But it all happens to pass the tests right now.
> 
> I noticed that business about sorting the TOC by size yesterday.
> I think that's a completely bletcherous hack, and we ought to get
> rid of it in favor of keeping the TOC order the same between parallel
> and non-parallel cases, and instead doing size comparisons during
> parallel worker dispatch.

So instead of dumping things by the order of the TOC, we'll perform the
sorting later on before handing out jobs to workers?  That seems alright
to me for the most part.  One thing I do wonder about is if we should
also be sorting by tablespace and not just size, to try and maximize
throughput (that is, assign out parallel workers to each tablespace,
each going after the largest table in that tablespace, before coming
back around to assigning the next-largest file to the second worker on a
given tablespace, presuming we have more workers than tablespaces),
that's what we've seen works rather well in pgbackrest.

> However, at least for the directory-format case (which I think is the
> only one supported for parallel restore), we could make it compare the
> file sizes of the TABLE DATA items.  That'd work pretty well as a proxy
> for both the amount of effort needed for table restore, and the amount
> of effort needed to build indexes on the tables afterwards.

Parallel restore also works w/ custom-format dumps.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: some more error location support

2018-08-27 Thread Peter Eisentraut

On 27/08/2018 10:53, Fabien COELHO wrote:
> There is a "make_parsestate", but no corresponding free. The usual 
> pattern, although there seems to be a few exception, is to "make" and 
> "free".
> 
> Even if there is some under-the-hood garbage collection, I'd suggest to 
> add a free after the call to ComputePartitionAttrs.

Hmm, I didn't know about free_parsestate().  It doesn't seem to be used
consistently.  I suppose you'll want to use it when you have a target
relation that will be closed by it, but otherwise, for DDL commands,
it's not all that useful.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: has_table_privilege for a table in unprivileged schema causes an error

2018-08-27 Thread David G. Johnston

On Sat, Aug 25, 2018 at 8:29 PM, Tom Lane  wrote:

> There's a backwards-compatibility argument for not changing this behavior,
> sure, but I don't buy the other arguments you made here.  And I don't
> think there's all that much to the backwards-compatibility argument,
> considering that the current behavior is to fail.
>

+1; any code using these functions can reasonably be expected to handle
true and false responses.  Converting a present error into a false under
that presumption results in similar, and arguably improved, semantics.

While null is something to be avoided generally this is one of those cases
where if we did want to have a "cannot answer the question because
pre-conditions are not met" response I'd strongly consider using null to
represent that response instead of an error - using coalesce is
considerably easier than performing error handling.  That isn't an option
here and the while there is information loss involved in the proposed
change its seems worth it to me to make such a change to make using the
function for its primary behavior easier at the cost of a removing a
hard-to-use side effect.  Adding a new default argument or function is not
desirable.

To be clear, I do not consider this is not a backpatchable change.

David J.

simplify index tuple descriptor initialization

2018-08-27 Thread Peter Eisentraut

Whenever some pg_attribute field is added or changed, a lot of
repetitive changes all over the code are necessary.  Here is a small
change to remove one such place.

We have two code paths for initializing the tuple descriptor for a new
index: For a normal index, we copy the tuple descriptor from the table
and reset a number of fields that are not applicable to indexes.  For an
expression index, we make a blank tuple descriptor and fill in the
needed fields based on the provided expressions.  As pg_attribute has
grown over time, the number of fields that we need to reset in the first
case is now bigger than the number of fields we actually want to copy,
so it's sensible to do it the other way around: Make a blank descriptor
and copy just the fields we need.  This also allows more code sharing
between the two branches, and it avoids having to touch this code for
almost every unrelated change to the pg_attribute structure.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7a8d8bd13fd1cdaf18de21f4d45df4269aa71d10 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 27 Aug 2018 15:50:50 +0200
Subject: [PATCH] Simplify index tuple descriptor initialization

We have two code paths for initializing the tuple descriptor for a new
index: For a normal index, we copy the tuple descriptor from the table
and reset a number of fields that are not applicable to indexes.  For an
expression index, we make a blank tuple descriptor and fill in the
needed fields based on the provided expressions.  As pg_attribute has
grown over time, the number of fields that we need to reset in the first
case is now bigger than the number of fields we actually want to copy,
so it's sensible to do it the other way around: Make a blank descriptor
and copy just the fields we need.  This also allows more code sharing
between the two branches, and it avoids having to touch this code for
almost every unrelated change to the pg_attribute structure.
---
 src/backend/catalog/index.c | 57 +++--
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b256054908..636fe9c740 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -319,9 +319,7 @@ ConstructTupleDescriptor(Relation heapRelation,
indexTupDesc = CreateTemplateTupleDesc(numatts, false);
 
/*
-* For simple index columns, we copy the pg_attribute row from the 
parent
-* relation and modify it as necessary.  For expressions we have to cons
-* up a pg_attribute row the hard way.
+* Fill in the pg_attribute row.
 */
for (i = 0; i < numatts; i++)
{
@@ -332,6 +330,19 @@ ConstructTupleDescriptor(Relation heapRelation,
Form_pg_opclass opclassTup;
Oid keyType;
 
+   MemSet(to, 0, ATTRIBUTE_FIXED_PART_SIZE);
+   to->attnum = i + 1;
+   to->attstattarget = -1;
+   to->attcacheoff = -1;
+   to->attislocal = true;
+   to->attcollation = (i < numkeyatts) ?
+   collationObjectId[i] : InvalidOid;
+
+   /*
+* For simple index columns, we copy some pg_attribute fields 
from the
+* parent relation.  For expressions we have to look at the 
expression
+* result.
+*/
if (atnum != 0)
{
/* Simple index column */
@@ -356,36 +367,20 @@ ConstructTupleDescriptor(Relation heapRelation,
 
AttrNumberGetAttrOffset(atnum));
}
 
-   /*
-* now that we've determined the "from", let's copy the 
tuple desc
-* data...
-*/
-   memcpy(to, from, ATTRIBUTE_FIXED_PART_SIZE);
-
-   /*
-* Fix the stuff that should not be the same as the 
underlying
-* attr
-*/
-   to->attnum = i + 1;
-
-   to->attstattarget = -1;
-   to->attcacheoff = -1;
-   to->attnotnull = false;
-   to->atthasdef = false;
-   to->atthasmissing = false;
-   to->attidentity = '\0';
-   to->attislocal = true;
-   to->attinhcount = 0;
-   to->attcollation = (i < numkeyatts) ?
-   collationObjectId[i] : InvalidOid;
+   namecpy(&to->attname, &from->attname);
+   to->atttypid = from->atttypid;
+   to->attlen = from->attlen;
+

Re: has_table_privilege for a table in unprivileged schema causes an error

2018-08-27 Thread Tom Lane

Yugo Nagata  writes:
> Tom Lane  wrote:
>> There's a backwards-compatibility argument for not changing this behavior,
>> sure, but I don't buy the other arguments you made here.  And I don't
>> think there's all that much to the backwards-compatibility argument,
>> considering that the current behavior is to fail.

> With regarding to keeping the backwards-compatibility, to add a new paramater
> to has_*_privilege functions is a solution as proposed previously.
>  has_table_privilege(user, table, privilege[, consider_schema=false]) 
> How do you think this proposal?

I think that'd be a disaster, because we already have variants of these
functions with several different parameter counts.  Adding optional
arguments to them will cause ambiguous-function errors where there were
none before.

Also, it's just plain ugly.  We should either decide we want this change,
or decide we don't.  Trying to have it both ways is not going to be
better.

regards, tom lane

Re: pg_dump test instability

2018-08-27 Thread Tom Lane

Peter Eisentraut  writes:
> In a non-data_only dump, the order of the tables doesn't matter, because
> the foreign keys are added at the very end.  In parallel dumps, the
> tables are in addition sorted by size, so the resultant order is
> different from a single-threaded dump.  This can be seen by comparing
> the dumped TOCs of the defaults_dir_format and defaults_parallel cases.
> But it all happens to pass the tests right now.

I noticed that business about sorting the TOC by size yesterday.
I think that's a completely bletcherous hack, and we ought to get
rid of it in favor of keeping the TOC order the same between parallel
and non-parallel cases, and instead doing size comparisons during
parallel worker dispatch.

The primary reason why it's a bletcherous hack is that it's designed
around the assumption that parallel dumps will be restored in parallel
and non-parallel dumps will be restored non-parallel.  That assumption
is wrong on its face.  But it explains why the code sorts both tables
and indexes, even though pg_dump doesn't have any need to parallelize
index dumps: it's expecting that a subsequent parallel restore will have
use for building indexes largest-first.  Of course, if you made the dump
non-parallel, you're out of luck on that.

Another reason why the code is bogus is that it sorts only a consecutive
sequence of DO_INDEX items.  This fails to work at all for indexes that
are constraints, and even for ones that aren't, there's no very good
reason to assume that they aren't interspersed with other sorts of
objects due to dependencies.  So I suspect an awful lot of parallelism
is being left on the table so far as indexes are concerned.

So if we made this less of a quick-n-dirty kluge, we could remove the
hazard of different-looking dump results in parallel and serial cases,
and probably improve restore performance as well.

A small problem with postponing sorting to the restore side is that
I don't think we record relpages info anywhere in the archive format.
However, at least for the directory-format case (which I think is the
only one supported for parallel restore), we could make it compare the
file sizes of the TABLE DATA items.  That'd work pretty well as a proxy
for both the amount of effort needed for table restore, and the amount
of effort needed to build indexes on the tables afterwards.

regards, tom lane

Re: Built-in connection pooling

2018-08-27 Thread Konstantin Knizhnik



New versions of built-in connection pool is attached to this mail.
Now client's startup package is received by one of listener workers and 
postmater knows database/user name of the recevied connection and so is 
able to marshal it to the proper connection pool. Right now SSL is not 
supported.


Also I provided some general mechanism for moving static variables to 
session context. File
include/storage/sessionvars.h contains list of such variables which are 
stored to session context on reschedule.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/test_decoding/sql/messages.sql b/contrib/test_decoding/sql/messages.sql
index cf3f773..14c4163 100644
--- a/contrib/test_decoding/sql/messages.sql
+++ b/contrib/test_decoding/sql/messages.sql
@@ -23,6 +23,8 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'for
 
 -- test db filtering
 \set prevdb :DBNAME
+show session_pool_size;
+show session_pool_ports;
 \c template1
 
 SELECT 'otherdb1' FROM pg_logical_emit_message(false, 'test', 'otherdb1');
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5d13e6a..5a93c7e 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -178,7 +178,6 @@ static List *overrideStack = NIL;
  * committed its creation, depending on whether myTempNamespace is valid.
  */
 static Oid	myTempNamespace = InvalidOid;
-
 static Oid	myTempToastNamespace = InvalidOid;
 
 static SubTransactionId myTempNamespaceSubID = InvalidSubTransactionId;
@@ -193,6 +192,7 @@ char	   *namespace_search_path = NULL;
 /* Local functions */
 static void recomputeNamespacePath(void);
 static void InitTempTableNamespace(void);
+static Oid  GetTempTableNamespace(void);
 static void RemoveTempRelations(Oid tempNamespaceId);
 static void RemoveTempRelationsCallback(int code, Datum arg);
 static void NamespaceCallback(Datum arg, int cacheid, uint32 hashvalue);
@@ -460,9 +460,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (strcmp(newRelation->schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(newRelation->schemaname, false);
@@ -471,9 +469,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 	else if (newRelation->relpersistence == RELPERSISTENCE_TEMP)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 	else
 	{
@@ -482,8 +478,7 @@ RangeVarGetCreationNamespace(const RangeVar *newRelation)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -2921,9 +2916,7 @@ LookupCreationNamespace(const char *nspname)
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
 		/* Initialize temp namespace if first time through */
-		if (!OidIsValid(myTempNamespace))
-			InitTempTableNamespace();
-		return myTempNamespace;
+		return GetTempTableNamespace();
 	}
 
 	namespaceId = get_namespace_oid(nspname, false);
@@ -2986,9 +2979,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (strcmp(schemaname, "pg_temp") == 0)
 		{
 			/* Initialize temp namespace if first time through */
-			if (!OidIsValid(myTempNamespace))
-InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		/* use exact schema given */
 		namespaceId = get_namespace_oid(schemaname, false);
@@ -3001,8 +2992,7 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 		if (activeTempCreationPending)
 		{
 			/* Need to initialize temp namespace */
-			InitTempTableNamespace();
-			return myTempNamespace;
+			return GetTempTableNamespace();
 		}
 		namespaceId = activeCreationNamespace;
 		if (!OidIsValid(namespaceId))
@@ -3254,16 +3244,28 @@ int
 GetTempNamespaceBackendId(Oid namespaceId)
 {
 	int			result;
-	char	   *nspname;
+	char	   *nspname,
+			   *addlevel;
 
 	/* See if the namespace name starts with "pg_temp_" or "pg_toast_temp_" */
 	nspname = get_namespace_name(namespaceId);
 	if (!nspname)
 		return InvalidBackendId;	/* no such namespace? */
 	if (strncmp(nspname, "pg_temp_", 8) == 0)
-		result = atoi(nspname + 8);
+	{
+		/* check for session id */
+		if ((addlevel = strstr(nspname + 8, "_")) != NULL)
+			result = atoi(addlevel + 1);
+		else
+			result = atoi(nspname + 8);
+	}
 	else if (strncmp(nspname, "pg_toast_temp_", 14) == 0)
-		result = atoi(nspname + 14);
+	{
+		if ((addlevel = strstr(nspname + 14, "_")) != NULL)
+			result = atoi(addlevel + 1);
+		else

Re: simplehash.h comment

2018-08-27 Thread Tom Lane

Thomas Munro  writes:
> ... and a patch.  Thoughts?

It's not real obvious why you're adding inclusion guards over just a
subset of the header.  A short comment about that would be worthwhile.

regards, tom lane

Re: pg_dump test instability

2018-08-27 Thread Stephen Frost

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> During the development of an unrelated feature, I have experienced
> failures in a pg_dump test case, specifically
> 
> t/002_pg_dump.pl ... 1825/6052
> #   Failed test 'defaults_parallel: should not dump COPY
> fk_reference_test_table second'
> #   at t/002_pg_dump.pl line 3454.
> 
> This test sets up two tables connected by a foreign key and checks that
> a data_only dump dumps them ordered so that the primary key table comes
> first.
> 
> But because of the way the tests are set up, it also checks that in all
> other dumps (i.e., non-data_only) it does *not* dump them in that order.
>  This is kind of irrelevant to the test, but there is no way to express
> in the pg_dump tests to not check certain scenarios.

Hmmm, yeah, that's a good point.  Most of the checks are set up that way
because it makes writing checks simpler and we have fewer of them, but
in this case we don't want that requirement to be levied on
non-data-only dumps.

> In a non-data_only dump, the order of the tables doesn't matter, because
> the foreign keys are added at the very end.  In parallel dumps, the
> tables are in addition sorted by size, so the resultant order is
> different from a single-threaded dump.  This can be seen by comparing
> the dumped TOCs of the defaults_dir_format and defaults_parallel cases.
> But it all happens to pass the tests right now.

Occationally I get lucky, apparently. :)  Though I think this might have
been different before I reworked those tests to be simpler.

> In my hacking I have added another test table to the pg_dump test set,
> which seems to have thrown off the sorting and scheduling, so that the
> two tables now happen to come out in primary-key-first order anyway,
> which causes the test to fail.

Ok.

> I have developed the attached rough patch to add a third option to
> pg_dump test cases: besides like and unlike, add a "skip" option to
> disregard the result of the test.

If I read this correctly, the actual test isn't run at all (though, of
course, the tables are created and such).

In any case though, this doesn't completely solve the problem, does it?
Skipping 'defaults' also causes 'defaults_parallel' to be skipped and
therefore avoids the issue for now, but if some other change caused the
ordering to be different in the regular (non-data_only) cases then this
test would blow up again.

Looking back at the 9.6 tests, tests were only run for the runs where
they were explicitly specified, which lead to tests being missed that
shouldn't have been, and that lead to the approach now used where every
test is against every run.

As this test should really only ever be applied to the 'data_only' run,
it seems like we should have a 'run' list and for this test that would
be just 'data_only'.  I haven't looked into what's supposed to happen
here in a parallel data-only test, but if we expect the ordering to
still be honored then we should probably have a test for that.

So, in short, I don't really agree with this 'skip' approach as it
doesn't properly solve this problem but would rather see a 'only_runs'
option or similar where this test is only tried against the data_only
run (and possibly a data_only_parallel_restore one, if one such was
added).  In any case, please be sure to also update the documentation
under 'Definition of the tests to run.' (about line 335) whenever the
set of options that can be specified for the tests is changed, and
let's strongly discourage the use of this feature for most tests as
these kinds of one-off's are quite rare.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: libpq debug log

2018-08-27 Thread Yugo Nagata

On Fri, 24 Aug 2018 04:38:22 +
"Iwata, Aya"  wrote:

> Hi,
> 
> I'm going to propose libpq debug log for analysis of queries on the 
> application side.
> I think that it is useful to determine whether the cause is on the 
> application side or the server side when a slow query occurs. 

Do you mean you want to monitor the protocol message exchange at
the client side to analyze performance issues, right?  Actually,
this might be useful to determin where is the problem, for example,
the client application, the backend PostgreSQL, or somewhere in the
network.

Such logging can be implemented in the application, but if libpq
provides the standard way, it would be helpful to resolve a problem
without modifying the application code.

> The provided information is "date and time" at which execution of processing 
> is started, "query", "application side processing", which is picked up 
> information from PQtrace(), and "connection id", which is for uniquely 
> identifying the connection. 

I couldn't image how this is like. Could you show us a sample of log lines
in your head?

> To collect the log, set the connection string or environment variable.
> - logdir or PGLOGDIR : directory where log file written
> - logsize or PGLOGSIZE : maximum log size

How we can specify the log file name?  What should happen if a file size
exceeds to PGLOGSIZE?

Regards,
-- 
Yugo Nagata

Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)

2018-08-27 Thread Yugo Nagata

On Mon, 27 Aug 2018 13:34:12 +0200
Michael Banck  wrote:

> Hi,
> 
> On Mon, Aug 27, 2018 at 07:53:36PM +0900, Yugo Nagata wrote:
> > On Fri, 24 Aug 2018 18:01:09 +0200
> > Peter Eisentraut  wrote:
> > > I'm curious about this option:
> > > 
> > >   -r RELFILENODE check only relation with specified relfilenode
> > > 
> > > but there is no facility to specify a database.
> > > 
> > > Also, referring to the relfilenode of a mapped relation seems a bit
> > > inaccurate.
> > > 
> > > Maybe reframing this in terms of the file name of the file you want
> > > checked would be better?
> > 
> > If we specified 1234 to -r option, pg_verify_shceksums checks not only 1234
> > but also 1234_vm, 1234_fsm, and 1234.1, 1234.2, ... and so on, so I think
> > it makes senses to allow to specify a relfilenode instead of a file name.
> > 
> > I think it is reasonable to add a option to specify a database, although
> > I don't know which character is good because both -d and -D are already 
> > used
> 
> Maybe the -d (debug) option should be revisited as well. Mentioning
> every scanned block generates a huge amount of output which might be
> useful during development but does not seem very useful for a stable
> release. AFAICT there is no other debug output for now.
> 
> So it could be renamed to -v (verbose) and only mention each scanned
> file, e.g. (errors/checksum mismatches are still reported of course).
> 
> Then -d could (in the future, I guess that is too late for v11) be used
> for -d/--dbname (or make that only a long option, if the above does not
> work).

I realized after sending the previous post that we can not specify a database
by name because pg_verify_checksum is run in offline and this can not get the
OID from the database name.  Also, there are global and pg_tblspc directories
not only base/. So, it seems to me good to specify a directories
to scan which is under PGDATA. We would be able to use -d ( or --directory ?)
for this purpose.


Regards,
-- 
Yugo Nagata

Re: table_privileges view under information_schema doesn't show privileges on materialized views

2018-08-27 Thread Ashutosh Sharma

On Fri, Aug 24, 2018 at 9:06 PM, Tom Lane  wrote:
> Ashutosh Sharma  writes:
>> Currently, table_privileges view in information_schema.sql doesn't
>> show privileges on materialized views for currently enabled roles. As
>> per the documentation-[1], it should be showing the all privileges
>> granted on tables and views (the documentation doesn't says it has to
>> be normal view). Shouldn't we allow it to show privileges on
>> materialized views as well.
>
> The spec is quite clear that rows in table_privileges must correspond
> to rows in information_schema.tables, but we don't show materialized
> views there.
>

Okay, In that case, I've changed the patch so that both tables and
tables_privileges shows the materialized view. PFA patch. Sorry, I
just missed that point earlier.

> Perhaps there's a case for showing MVs in the "tables" view, and thence
> also in table_privileges, but this patch by itself is flat wrong.
>
> Anyway it seems to me we made that decision already; it's a bit late now
> to be revisiting whether MVs should be treated as tables here.
>
> regards, tom lane
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index f4e69f4..73eb9fe 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1875,7 +1875,7 @@ CREATE VIEW table_privileges AS
  ) AS grantee (oid, rolname)
 
 WHERE c.relnamespace = nc.oid
-  AND c.relkind IN ('r', 'v', 'f', 'p')
+  AND c.relkind IN ('r', 'v', 'f', 'p', 'm')
   AND c.grantee = grantee.oid
   AND c.grantor = u_grantor.oid
   AND c.prtype IN ('INSERT', 'SELECT', 'UPDATE', 'DELETE', 'TRUNCATE', 'REFERENCES', 'TRIGGER')
@@ -1922,6 +1922,7 @@ CREATE VIEW tables AS
   WHEN c.relkind IN ('r', 'p') THEN 'BASE TABLE'
   WHEN c.relkind = 'v' THEN 'VIEW'
   WHEN c.relkind = 'f' THEN 'FOREIGN'
+  WHEN c.relkind = 'm' THEN 'MATERIALIZED VIEW'
   ELSE null END
  AS character_data) AS table_type,
 
@@ -1933,7 +1934,7 @@ CREATE VIEW tables AS
CAST(t.typname AS sql_identifier) AS user_defined_type_name,
 
CAST(CASE WHEN c.relkind IN ('r', 'p') OR
-  (c.relkind IN ('v', 'f') AND
+  (c.relkind IN ('v', 'f', 'm') AND
-- 1 << CMD_INSERT
pg_relation_is_updatable(c.oid, false) & 8 = 8)
 THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_insertable_into,
@@ -1944,7 +1945,7 @@ CREATE VIEW tables AS
 FROM pg_namespace nc JOIN pg_class c ON (nc.oid = c.relnamespace)
LEFT JOIN (pg_type t JOIN pg_namespace nt ON (t.typnamespace = nt.oid)) ON (c.reloftype = t.oid)
 
-WHERE c.relkind IN ('r', 'v', 'f', 'p')
+WHERE c.relkind IN ('r', 'v', 'f', 'p', 'm')
   AND (NOT pg_is_other_temp_schema(nc.oid))
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER')

pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)

2018-08-27 Thread Michael Banck

Hi,

On Mon, Aug 27, 2018 at 07:53:36PM +0900, Yugo Nagata wrote:
> On Fri, 24 Aug 2018 18:01:09 +0200
> Peter Eisentraut  wrote:
> > I'm curious about this option:
> > 
> >   -r RELFILENODE check only relation with specified relfilenode
> > 
> > but there is no facility to specify a database.
> > 
> > Also, referring to the relfilenode of a mapped relation seems a bit
> > inaccurate.
> > 
> > Maybe reframing this in terms of the file name of the file you want
> > checked would be better?
> 
> If we specified 1234 to -r option, pg_verify_shceksums checks not only 1234
> but also 1234_vm, 1234_fsm, and 1234.1, 1234.2, ... and so on, so I think
> it makes senses to allow to specify a relfilenode instead of a file name.
> 
> I think it is reasonable to add a option to specify a database, although
> I don't know which character is good because both -d and -D are already 
> used

Maybe the -d (debug) option should be revisited as well. Mentioning
every scanned block generates a huge amount of output which might be
useful during development but does not seem very useful for a stable
release. AFAICT there is no other debug output for now.

So it could be renamed to -v (verbose) and only mention each scanned
file, e.g. (errors/checksum mismatches are still reported of course).

Then -d could (in the future, I guess that is too late for v11) be used
for -d/--dbname (or make that only a long option, if the above does not
work).

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-08-27 Thread Daniel Gustafsson

> On 23 Aug 2018, at 20:53, Tom Lane  wrote:

> I think this is just a timing problem: the signal gets sent,
> but it might or might not get received before the current statement ends.
> It's possible that a signal-sent-to-self can be expected to be received
> synchronously on all Unix platforms, but I wouldn't entirely bet on that
> (in particular, the POSIX text for kill(2) does NOT guarantee it); and
> our Windows signal implementation certainly doesn't guarantee anything
> of the sort.

That makes a lot of sense, I was thinking about it in too simplistic terms.
Thanks for the clarification.

> I don't think there's necessarily anything wrong with the code, but
> you can't test it with such a simplistic test as this and expect
> stable results.  If you feel an urgent need to have a test case,
> try building an isolationtester script.

During hacking on it I preferred to have tests for it.  Iff this makes it in,
the tests can be omitted per the judgement of the committers of course.  Since
the printed pid will vary between runs it’s hard to test more interesting
scenarios so they are of questionable worth.

cheers ./daniel

Re: pg_verify_checksums -r option

2018-08-27 Thread Yugo Nagata

On Fri, 24 Aug 2018 18:01:09 +0200
Peter Eisentraut  wrote:

> I'm curious about this option:
> 
>   -r RELFILENODE check only relation with specified relfilenode
> 
> but there is no facility to specify a database.
> 
> Also, referring to the relfilenode of a mapped relation seems a bit
> inaccurate.
> 
> Maybe reframing this in terms of the file name of the file you want
> checked would be better?

If we specified 1234 to -r option, pg_verify_shceksums checks not only 1234
but also 1234_vm, 1234_fsm, and 1234.1, 1234.2, ... and so on, so I think
it makes senses to allow to specify a relfilenode instead of a file name.

I think it is reasonable to add a option to specify a database, although
I don't know which character is good because both -d and -D are already used

Regards,
-- 
Yugo Nagata

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-08-27 Thread Daniel Gustafsson

> On 24 Aug 2018, at 03:37, Tom Lane  wrote:
> 
> Thomas Munro  writes:
>> On Fri, Aug 24, 2018 at 6:53 AM, Tom Lane  wrote:
>>> I think this is just a timing problem: the signal gets sent,
>>> but it might or might not get received before the current statement ends.
> 
>> How about we just wait forever if the function manages to return?
>> select case when pg_cancel_backend(pg_backend_pid(), '...') then
>> pg_sleep('infinity') end;
> 
> Hm, that might work.  I'd pick a long but not infinite timeout --- maybe
> 60 sec would be good.

I like that idea, so I’ve updated the patch to see what the cfbot thinks of it.

cheers ./daniel



0001-Refactor-backend-signalling-code-v15.patch
Description: Binary data


0002-Support-optional-message-in-backend-cancel-terminate-v15.patch
Description: Binary data

Re: Fix help option of contrib/oid2name

2018-08-27 Thread Michael Paquier

On Fri, Aug 24, 2018 at 01:32:47PM +0900, Tatsuro Yamada wrote:
> I revised the patch and created new tap tests.

Thanks, I have looked at the patch set.  And here are some notes about
issues found while reviewing.  For oid2name:
- Documentation had some typos, --tablename was used instead of --table.
- Using plural forms for some options makes more sense to me, like
--indexes, --system-objects, --tablespaces
- I have tweaked the description of -H in usage().
- An '=' was missing for --filenode
- I have reordered the options in alphabetical order.
- Added some basic tap tests, and added correct handling of the tests in
oid2name/Makefile.
- Inclusion of getopt_long.h has been missing, this would have caused
failures on Windows.
- getopt_long() has been reordered to be alphabetical.  Same thing for
long_options[].  This has the advantage to ease the review and code
read.
- Some formatting issues with option docs, leading to some noise diffs.

For vacuumlo:
- Some typos in documentation. 
- Documentation format was rather weird for some options, so I made the
whole consistent.
- I agree with the option names you put.
- Reorganization of the options.
- Added basic TAP tests, and fixed Makefile.
- default clause was missing from the option set.

Attached are updated patches, which I would be fine to commit.  What do
you think?
--
Michael
From 6f2a0931ddf9ab1b742d0bbc6bac3930d982b60b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 27 Aug 2018 19:00:51 +0900
Subject: [PATCH 1/2] Rework option set of oid2name

oid2name has done little effort to keep an interface consistent with
other binary utilities:
- -H was used instead of -h/-host.  This option is now marked as
deprecated, still its output is accepted to be backward-compatible.
- -P has been removed from the code, and was still documented.
- All options gain long aliases, making connection options more similar
to other binaries.
- Document environment variables which could be used: PGHOST, PGPORT and
PGUSER.

A basic set of TAP tests is added on the way.

Author: Tatsuro Yamada
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/c7e7f25c-1747-cd0f-9335-390bc97b2...@lab.ntt.co.jp
---
 contrib/oid2name/.gitignore |   2 +
 contrib/oid2name/Makefile   |   6 ++
 contrib/oid2name/oid2name.c | 124 ++--
 contrib/oid2name/t/001_basic.pl |  12 
 doc/src/sgml/oid2name.sgml  |  81 +++--
 5 files changed, 151 insertions(+), 74 deletions(-)
 create mode 100644 contrib/oid2name/t/001_basic.pl

diff --git a/contrib/oid2name/.gitignore b/contrib/oid2name/.gitignore
index fdefde108d..0410fb7afa 100644
--- a/contrib/oid2name/.gitignore
+++ b/contrib/oid2name/.gitignore
@@ -1 +1,3 @@
 /oid2name
+
+/tmp_check/
diff --git a/contrib/oid2name/Makefile b/contrib/oid2name/Makefile
index 3eef8f60be..77b72880f5 100644
--- a/contrib/oid2name/Makefile
+++ b/contrib/oid2name/Makefile
@@ -19,3 +19,9 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c
index 63e360c4c5..aa122ca0e9 100644
--- a/contrib/oid2name/oid2name.c
+++ b/contrib/oid2name/oid2name.c
@@ -14,6 +14,8 @@
 #include "fe_utils/connect.h"
 #include "libpq-fe.h"
 #include "pg_getopt.h"
+#include "getopt_long.h"
+
 
 /* an extensible array to keep track of elements to show */
 typedef struct
@@ -60,8 +62,28 @@ void		sql_exec_dumpalltbspc(PGconn *, struct options *);
 void
 get_opts(int argc, char **argv, struct options *my_opts)
 {
+	static struct option long_options[] = {
+		{"dbname", required_argument, NULL, 'd'},
+		{"host", required_argument, NULL, 'h'},
+		{"host", required_argument, NULL, 'H'}, /* deprecated */
+		{"filenode", required_argument, NULL, 'f'},
+		{"indexes", no_argument, NULL, 'i'},
+		{"oid", required_argument, NULL, 'o'},
+		{"port", required_argument, NULL, 'p'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"tablespaces", no_argument, NULL, 's'},
+		{"system-objects", no_argument, NULL, 'S'},
+		{"table", required_argument, NULL, 't'},
+		{"username", required_argument, NULL, 'U'},
+		{"version", no_argument, NULL, 'V'},
+		{"extended", no_argument, NULL, 'x'},
+		{"help", no_argument, NULL, '?'},
+		{NULL, 0, NULL, 0}
+	};
+
 	int			c;
 	const char *progname;
+	int			optindex;
 
 	progname = get_progname(argv[0]);
 
@@ -93,7 +115,7 @@ get_opts(int argc, char **argv, struct options *my_opts)
 	}
 
 	/* get opts */
-	while ((c = getopt(argc, argv, "H:p:U:d:t:o:f:qSxish")) != -1)
+	while ((c = getopt_long(argc, argv, "d:f:h:H:io:p:qsSt:U:x", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -102,54 +124,35 @@ get_opts(int argc, char **argv, struct options *my_opts)
 my_opts->dbname = pg_strdup(optarg);
 break;
 
-/* specify one tablename to show */
-			case 't':
-add_one_elt(optarg, my_opts->tables

Re: Refactor textToQualifiedNameList()

2018-08-27 Thread Yugo Nagata

On Fri, 24 Aug 2018 20:44:12 +0900
Yugo Nagata  wrote:

> Hi,
> 
> When working on other patch[1], I found there are almost same
> functions, texttoQualifiedNameList() and stringToQualifiedNameList().
> The only difference is the argument type, text or char*. I don't know
> why these functions are defined seperately, but I think the former 
> function can be rewritten using the latter code as the attached patch.
> Is this reasonable fix?

I forgot to add a link to the referred thread.

[1] 
https://www.postgresql.org/message-id/20180817075100.bc99378255943d3c3951ad63%40sraoss.co.jp

> 
> Regards,
> -- 
> Yugo Nagata 


-- 
Yugo Nagata

Re: has_table_privilege for a table in unprivileged schema causes an error

2018-08-27 Thread Yugo Nagata

On Sat, 25 Aug 2018 23:29:27 -0400
Tom Lane  wrote:

> Robert Haas  writes:
> > I'm not sure that it's a good idea to change this behavior.
> 
> > In the case of an unqualified name, the permissions on the schemas in
> > the search path can affect which table is chosen in the first place.
> > ... So I think this only matters for qualified names.
> 
> Yeah, that agrees with my expectations.

Yes. I consider only the cases of qualified names and the patch doesn't 
change any behavior about unqualified name cases.

> > Also, the current system generally tries not to reveal any information
> > about the contents of schemas for which you have no permissions.
> 
> I don't think that argument holds up, at least not if this is implemented
> the way I'd expect.  It would change the results for a schema on which
> you lack usage permission from "permission denied for schema a" to
> "false", but it still would not matter whether there is such a table
> inside "a".

Yes, Tom's explanation is right. The proposal functions doesn't reveal
any information about the contents of unprivileged schemas, either.

> 
> > And if you've got a qualified name, you know what schema it's in.  If
> > you are concerned about a.b, nothing keeps you from writing a test
> > against schema a's permissions as well as a test against table a.b's
> > permissions.  But on the other hand, if for some reason you want to
> > know about pg_class.relacl specifically, then having the function
> > consider both that and the schema's ACL could be awkward.
> 
> Mmm ... maybe, but I don't think that exactly holds water either, given
> that the current behavior is to fail if you lack permission on schema a.
> Yes, you can write "case when has_schema_privilege() then
> has_table_privilege() else false end", but if you're worried that you
> might possibly lack schema privilege, then the current behavior of
> has_table_privilege is useless to you: it doesn't matter whether or not
> you would like to know about pg_class.relacl specifically, because you
> won't be allowed to find out.
> 
> Also, in some use-cases the extra test would require writing code that can
> split a qualified name into pieces, which isn't really all that easy in
> SQL.

This is a reason why we proposed to fix the function. However, with regard to
splitting a qualified name, making a new SQL function to do this might resolve
it, for example, as below.
 
 select case when has_schema_privilege(x.nspname) 
   then has_table_privilege(x.objname)
   else false end
  from pg_split_qualified_name(tablename) x;

> There's a backwards-compatibility argument for not changing this behavior,
> sure, but I don't buy the other arguments you made here.  And I don't
> think there's all that much to the backwards-compatibility argument,
> considering that the current behavior is to fail.

With regarding to keeping the backwards-compatibility, to add a new paramater
to has_*_privilege functions is a solution as proposed previously.

 has_table_privilege(user, table, privilege[, consider_schema=false]) 

How do you think this proposal?

Regards,

-- 
Yugo Nagata

Re: some more error location support

2018-08-27 Thread Fabien COELHO





Here are three patches to add more detailed error location support to
some parts of CREATE TABLE (defaults, check constraints, partition
specifications) as well as the COPY command.  Examples can be seen in
the regression test output.

The first two are low-hanging fruit, since all the information was
already collected and just needed to be passed through one last hop.
The third one is a bit more invasive; it adds location information to
the Value node, which is used in a variety of commands, so COPY is just
a start here.


About patch 3: applies cleanly independently of the 2 others, compiles, 
"make check" is okay.


A few comments:

There seems to be several somehow unrelated changes: one about copy,
one about trigger and one about constraints? The two later changes do not 
seem to impact the tests, though.


In "CreateTrigger", you moved "make_parsestate" but removed 
"free_parsestate". I'd rather move it than remove it.


In "value.h", the added location field deserves a "/* token location, or 
-1 if unknown */" comment like others in "parsenode.h", "plannode.h" and 
"primnodes.h".


Copying and comparing values are updaed, but value in/out functions are 
not updated to read/write the location, although other objects have their 
location serialized. ISTM that it should be done as well.


--
Fabien.

Re: some more error location support

2018-08-27 Thread Fabien COELHO




Hello Peter,


Here are three patches to add more detailed error location support to
some parts of CREATE TABLE (defaults, check constraints, partition
specifications) as well as the COPY command.  Examples can be seen in
the regression test output.

The first two are low-hanging fruit, since all the information was
already collected and just needed to be passed through one last hop.
The third one is a bit more invasive; it adds location information to
the Value node, which is used in a variety of commands, so COPY is just
a start here.


About patch 2: applies cleanly independently of the first one, compiles, 
"make check" is ok.


There is a "make_parsestate", but no corresponding free. The usual 
pattern, although there seems to be a few exception, is to "make" and 
"free".


Even if there is some under-the-hood garbage collection, I'd suggest to 
add a free after the call to ComputePartitionAttrs.


--
Fabien.

Re: some more error location support

2018-08-27 Thread Fabien COELHO





I noticed that you provide NULL from "ALTER TABLE" which is calling the
create table machinery:


The structure of the ALTER TABLE code is such that it would be quite
complicated to pass through the required state.  It's doable, but would
be a bigger and independent project.


Ok, so no "easy" way about that.

I'd consider providing a comment about that.

--
Fabien.

Re: some more error location support

2018-08-27 Thread Peter Eisentraut

On 27/08/2018 10:41, Fabien COELHO wrote:
> I noticed that you provide NULL from "ALTER TABLE" which is calling the 
> create table machinery:

The structure of the ALTER TABLE code is such that it would be quite
complicated to pass through the required state.  It's doable, but would
be a bigger and independent project.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: some more error location support

2018-08-27 Thread Fabien COELHO




Hello Peter,


Here are three patches to add more detailed error location support to
some parts of CREATE TABLE (defaults, check constraints, partition
specifications) as well as the COPY command.  Examples can be seen in
the regression test output.

The first two are low-hanging fruit, since all the information was
already collected and just needed to be passed through one last hop.
The third one is a bit more invasive; it adds location information to
the Value node, which is used in a variety of commands, so COPY is just
a start here.


Patch 1 applies cleanly, compiles, "make check" is okay.

I noticed that you provide NULL from "ALTER TABLE" which is calling the 
create table machinery:


 postgres=# CREATE TABLE foo(id SERIAL CHECK (x = 0));
 ERROR:  column "x" does not exist
 LINE 1: CREATE TABLE foo(id SERIAL CHECK (x = 0));
  ^
 postgres=# CREATE TABLE foo();
 CREATE TABLE
 postgres=# ALTER TABLE foo ADD COLUMN id SERIAL CHECK (x = 0);
 ERROR:  column "x" does not exist
 

Would it be easily possible to provide the query in that case as well?

--
Fabien.

Re: Adding a note to protocol.sgml regarding CopyData

2018-08-27 Thread Tatsuo Ishii

> Hello Bradley & Tatsuo-san,
> 
> My 0.02€ on the text:
> 
>> Version 2.0 of the PostgreSQL protocol
>> In the v3.0 protocol,
>> the 3.0 protocol
>> version 3.0 of the copy-in/copy-out sub-protocol
>> the V2.0 protocol.
> 
> While reading nice English (I learned "holdover"), it occurs to me
> that references to the protocol version lacks homogeneity.
> 
> Should it use v, V or version?
> 
> I'd suggest to keep "the vX.0 protocol" for a short version, and "the
> version X.0 protocol" for long if really desired.

+1

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

some more error location support

2018-08-27 Thread Peter Eisentraut

Here are three patches to add more detailed error location support to
some parts of CREATE TABLE (defaults, check constraints, partition
specifications) as well as the COPY command.  Examples can be seen in
the regression test output.

The first two are low-hanging fruit, since all the information was
already collected and just needed to be passed through one last hop.
The third one is a bit more invasive; it adds location information to
the Value node, which is used in a variety of commands, so COPY is just
a start here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 002dccbbe1df65347034d41a9582093a0cc72ba4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Aug 2018 08:42:49 +0200
Subject: [PATCH 1/3] Error position support for defaults and check constraints

Add support for error position reporting for the expressions contained
in defaults and check constraint definitions.  This currently works only
for CREATE TABLE, not ALTER TABLE, because the latter is not set up to
pass around the original query string.
---
 src/backend/catalog/heap.c | 4 +++-
 src/backend/commands/tablecmds.c   | 9 +
 src/include/catalog/heap.h | 3 ++-
 src/test/regress/output/constraints.source | 2 ++
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ac5a677c5f..910d3db063 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2460,7 +2460,8 @@ AddRelationNewConstraints(Relation rel,
  List *newConstraints,
  bool allow_merge,
  bool is_local,
- bool is_internal)
+ bool is_internal,
+ const char *queryString)
 {
List   *cookedConstraints = NIL;
TupleDesc   tupleDesc;
@@ -2489,6 +2490,7 @@ AddRelationNewConstraints(Relation rel,
 * rangetable entry.  We need a ParseState for transformExpr.
 */
pstate = make_parsestate(NULL);
+   pstate->p_sourcetext = queryString;
rte = addRangeTableEntryForRelation(pstate,

rel,

NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a12d64aeb..4761bb911e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -985,7 +985,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 */
if (rawDefaults || stmt->constraints)
AddRelationNewConstraints(rel, rawDefaults, stmt->constraints,
- true, true, 
false);
+ true, true, 
false, queryString);
 
ObjectAddressSet(address, RelationRelationId, relationId);
 
@@ -5587,7 +5587,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
 * _list_ of defaults, but we just do one.
 */
AddRelationNewConstraints(rel, list_make1(rawEnt), NIL,
- false, true, 
false);
+ false, true, 
false, NULL);
 
/* Make the additional catalog changes visible */
CommandCounterIncrement();
@@ -6178,7 +6178,7 @@ ATExecColumnDefault(Relation rel, const char *colName,
 * _list_ of defaults, but we just do one.
 */
AddRelationNewConstraints(rel, list_make1(rawEnt), NIL,
- false, true, 
false);
+ false, true, 
false, NULL);
}
 
ObjectAddressSubSet(address, RelationRelationId,
@@ -7215,7 +7215,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo 
*tab, Relation rel,

list_make1(copyObject(constr)),

recursing | is_readd,   /* allow_merge */

!recursing, /* is_local */
-   
is_readd);  /* is_internal */
+   
is_readd,   /* is_internal */
+

59 matches

Mail list logo