date:20160504

Amit Kapila  writes:
> How about using 512 bytes as a write size and perform direct writes rather
> than going via OS buffer cache for control file?

Wouldn't that fail outright under a lot of implementations of direct write;
ie the request needs to be page-aligned, for some not-very-determinate
value of page size?

To repeat, I'm pretty hesitant to change this logic.  While this is not
the first report we've ever heard of loss of pg_control, I believe I could
count those reports without running out of fingers on one hand --- and
that's counting since the last century. It will take quite a lot of
evidence to convince me that some other implementation will be more
reliable.  If you just come and present a patch to use direct write, or
rename, or anything else for that matter, I'm going to reject it out of
hand unless you provide very strong evidence that it's going to be more
reliable than the current code across all the systems we support.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Amit Kapila

On Wed, May 4, 2016 at 8:03 PM, Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov 
> > wrote:
> >> On 03.05.2016 2:17, Tom Lane wrote:
> >>> Writing a single sector ought to be atomic too.
>
> >> pg_control is 8k long(i think it is legth of one page in default PG
> >> compile settings).
>
> > The actual data written is always sizeof(ControlFileData) which should
be
> > less than one sector.
>
> Yes.  We don't care what happens to the rest of the file as long as the
> first sector's worth is updated atomically.  See the comments for
> PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
>
> We could change to a different PG_CONTROL_SIZE pretty easily, and there's
> certainly room to argue that reducing it to 512 or 1024 would be more
> efficient.  I think the motivation for setting it at 8K was basically
> "we're already assuming that 8K writes are efficient, so let's assume
> it here too".  But since the file is only written once per checkpoint,
> efficiency is not really a key selling point anyway.  If you could make
> an argument that some other size would reduce the risk of failures,
> it would be interesting --- but I suspect any such argument would be
> very dependent on the quirks of a specific file system.
>

How about using 512 bytes as a write size and perform direct writes rather
than going via OS buffer cache for control file?   Alex, is the issue
reproducible (to ensure that if we try to solve it in some way, do we have
way to test it as well)?

>
> One point worth considering is that on most file systems, rewriting
> a fraction of a page is *less* efficient than rewriting a full page,
> because the kernel first has to read in the old contents to fill
> the disk buffer it's going to partially overwrite with new data.
> This motivates against trying to reduce the write size too much.
>

Yes, you are very much right and I have observed that recently during my
work on WAL Re-Writes [1].  However, I think that won't be the issue if we
use direct writes for control file.


[1] -
http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jbtjxbfyd67r5dlcqfyomj4f-qmfxbp1...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-05-04 Thread David Rowley

On 5 May 2016 at 06:54, Tom Lane  wrote:
> David Rowley  writes:
>> On 4 May 2016 at 09:18, David Rowley  wrote:
>>> On 4 May 2016 at 02:10, Tomas Vondra  wrote:
 There are probably a few reasonably simple things we could do - e.g. ignore
 foreign keys with just a single column, as the primary goal of the patch is
 improving estimates with multi-column foreign keys. I believe that covers a
 vast majority of foreign keys in the wild.
>
>> I've spent a few hours looking at this and I've come up with the
>> attached patch, which flags each ForeignKeyOptInfo to say whether its
>> possible to be referenced in any join condition, with the logic that
>> if the referenced relation is in the simple_rte_array, then it could
>> be referenced.
>
> My fundamental problem with the FK-selectivity patch is that it looks
> an awful lot like a preliminary proof-of-concept that got committed.

I don't disagree that there were some mistakes made. The last time I
saw this work was when I proposed some changes to Tomas' patch, which
was quite a long time ago now. I'd not gotten to look at it since
then.

> I do not like the basic design: it's about as brute-force as could
> possibly be.  It adds code that is executed:
>
>   * at least once per join relation created (hence, significantly more than
>   the number of rels in the query; see also get_joinrel_parampathinfo)
> * for each inner relation in the initial input joinrel pair
>   * for each outer relation in the initial joinrel pair
> * for each foreign key constraint on this inner and outer rel
>   * for each key column in that FK
> * for each join qual for the current input joinrel pair
>   * for each member of the relevant EquivalenceClass
>
> This is at least O(N^3) in the number of baserels in the query, not to
> mention the other multipliers.  I'm not very impressed by tests that scale
> only one of the multipliers (like the number of FK constraints); where the
> pain is going to come in is when all of these factors are large.

Yes, it's quite a lot of nesting. The patch I sent yesterday helps to
reduce the number of foreign keys considered. The part I'm not all
that happy with now is the fact that quite a bit of repeat work gets
done during the join search. It would be nice to cache some of this
but nothing came to mind, as we need to record the position of each
joinqual so that we only estimate the non-matched ones with the
standard estimation technique. The order of, or items in that list
won't be fixed as more relations are added to the mix during the join
search.

> I spent some time trying to make a test case that was impossibly slow,
> without any really impressive result: I saw at most about a 25% growth in
> planning time, for a 27-way join with one or two foreign keys per table.
> I noted however that with a simple FROM list of tables, you don't really
> get the full force of the combinatorial explosion, because
> join_search_one_level prefers to generate left-deep trees first, and so
> the first creation of a given joinrel is always N left-side rels against 1
> right-side rel, causing the second level of looping to always iterate just
> once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
> devise join order constraints that would result in a lot of high-level
> joinrels being formed with many relations on both sides, which would cause
> both of the second and third levels to iterate O(N) times not just once.
> I didn't have much luck, but I have zero confidence that our users won't
> find such cases.

Did you test that with or without my patch from yesterday?

> The reason it's so brute force is that it's rediscovering the same facts
> over and over.  In all simple (non-outer-join) cases, the only clauses
> that are of any interest are derived from EquivalenceClasses, and all
> that you really need to know is "are the vars mentioned on each side
> of this FK present in the same EquivalenceClass?".  ISTM that could be
> determined once per FK per query and cached in or near the EC, not
> expensively rediscovered at each level of joining.  I'm not sure whether
> it'd be worth considering non-EC-derived equalities (ie, outer join
> clauses) at all, and note that the current patch fails to do so anyway;
> see below.

That's interesting, but requires a good bit of thought to how it might work.

> My other design-level complaint is that basing this on foreign keys is
> fundamentally the wrong thing.  What actually matters is the unique index
> underlying the FK; that is, if we have "a.x = b.y" and there's a
> compatible unique index on b.y, we can conclude that no A row will match
> more than one B row, whether or not an explicit FK relationship has been
> declared.  So we should drive this off unique indexes instead of FKs,
> first because we will find more cases,

Re: [HACKERS] pg_dump vs. TRANSFORMs

Peter Eisentraut  writes:
> On 5/4/16 2:39 PM, Stephen Frost wrote:
>> These checks are looking for the functions used by the transform in the
>> list of functions that pg_dump has loaded, but in 9.5, we don't load any
>> of the function in pg_catalog, and even with my patches, we only dump
>> the functions in pg_catalog that have an ACL which has been changed from
>> the default.

> This issue is not specific to transforms.  For example, if you create a 
> user-defined cast using a built-in function, the cast won't be dumped. 
> Obviously, this is a problem, but it needs a more general solution.

Actually, I believe it will be dumped.  selectDumpableCast believes it
should dump casts with OID >= FirstNormalObjectId.  That's a kluge no
doubt, but reasonably effective; looks like we've been doing that since
9.0.

pg_dump appears not to have a special-case selectDumpableTransform
routine.  Instead it falls back on the generic selectDumpableObject
for transforms.  That's a bad idea because the only useful bit of
knowledge selectDumpableObject has is to look at the containing
namespace ... and transforms don't have one, just as casts don't.

My recommendation is to clone that cast logic for transforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On 2016-05-05 06:08:39 +0300, Ants Aasma wrote:
> On 5 May 2016 1:28 a.m., "Andres Freund"  wrote:
> > On 2016-05-04 18:22:27 -0400, Robert Haas wrote:
> > > How would the semantics change?
> >
> > Right now the time for computing the snapshot is relevant, if
> > maintenance of xids is moved, it'll likely be tied to the time xids are
> > assigned. That seems perfectly alright, but it'll change behaviour.
> 
> FWIW moving the maintenance to a clock tick process will not change user
> visible semantics in any significant way. The change could easily be made
> in the next release.

I'm not convinced of that - right now the timeout is computed as a
offset to the time a snapshot with a certain xmin horizon is
taken. Moving the computation to GetNewTransactionId() or a clock tick
process will make it relative to the time an xid has been generated
(minus a fuzz factor).  That'll behave differently in a number of cases, no?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

2016-05-04 Thread Ants Aasma

On 5 May 2016 1:28 a.m., "Andres Freund"  wrote:
> On 2016-05-04 18:22:27 -0400, Robert Haas wrote:
> > How would the semantics change?
>
> Right now the time for computing the snapshot is relevant, if
> maintenance of xids is moved, it'll likely be tied to the time xids are
> assigned. That seems perfectly alright, but it'll change behaviour.

FWIW moving the maintenance to a clock tick process will not change user
visible semantics in any significant way. The change could easily be made
in the next release.

Regards,
Ants Aasma

Re: [HACKERS] Postgres 9.6 scariest patch tournament

2016-05-04 Thread Josh berkus

On 05/04/2016 06:56 PM, Robert Haas wrote:
> On Wed, May 4, 2016 at 9:41 PM, Noah Misch  wrote:
>> On Mon, Apr 18, 2016 at 03:37:21PM -0300, Alvaro Herrera wrote:
>>> The RMT will publish aggregate, unattributed results after the poll
>>> closes.
>>
>> Thanks for voting.  Join me in congratulating our top finishers:
>>
>> 1. fd31cd2 Dont vacuum all-frozen pages.
>> 2. "Parallel Query"
>> 3(tie). 3fc6e2d Make the upper part of the planner work by generating and 
>> comparing Paths.
>> 3(tie). 848ef42 Add the "snapshot too old" feature
> 
> Congratulations Kevin, Tom, me, and me!
> 
> I feel like I went to the Olympics and won both the gold *and* silver
> medals in the same event.  Beat that!
> 

Maybe we *should* call this 10.0.  That way people will be ready for
lots of breakage. ;-b

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres 9.6 scariest patch tournament

On Wed, May 4, 2016 at 9:41 PM, Noah Misch  wrote:
> On Mon, Apr 18, 2016 at 03:37:21PM -0300, Alvaro Herrera wrote:
>> The RMT will publish aggregate, unattributed results after the poll
>> closes.
>
> Thanks for voting.  Join me in congratulating our top finishers:
>
> 1. fd31cd2 Dont vacuum all-frozen pages.
> 2. "Parallel Query"
> 3(tie). 3fc6e2d Make the upper part of the planner work by generating and 
> comparing Paths.
> 3(tie). 848ef42 Add the "snapshot too old" feature

Congratulations Kevin, Tom, me, and me!

I feel like I went to the Olympics and won both the gold *and* silver
medals in the same event.  Beat that!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres 9.6 scariest patch tournament

2016-05-04 Thread Noah Misch

On Mon, Apr 18, 2016 at 03:37:21PM -0300, Alvaro Herrera wrote:
> The RMT will publish aggregate, unattributed results after the poll
> closes.

Thanks for voting.  Join me in congratulating our top finishers:

1. fd31cd2 Dont vacuum all-frozen pages.
2. "Parallel Query"
3(tie). 3fc6e2d Make the upper part of the planner work by generating and 
comparing Paths.
3(tie). 848ef42 Add the "snapshot too old" feature

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

On 2016-05-05 13:30:42 +1200, Thomas Munro wrote:
> That was a red herring.  I was confused because SUSv2 and POSIX call
> this argument 'errorfds' and say that sockets *also* tell you about
> errors this way.  (Many/most real OSs call the argument 'exceptfds'
> instead and only use it to tell you about out-of-band data and
> possibly implementation specific events for devices, pseudo-terminals
> etc.  If you want to know about errors on a socket it's enough to have
> it in readfds/writefds, and insufficient to have it only in
> errorfds/exceptfds unless you can find a computer that actually
> conforms to POSIX.)

Correct, exceptfds is pretty much meaningless for anything we do in
postgres. We rely on select returning a socket as read/writeable if the
socket has hung up.  That's been the case *before* the recent
WaitEventSet refactoring, so I think we're fairly solid on relying on
that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-05-04 Thread Thomas Munro

On Tue, Mar 29, 2016 at 8:17 PM, Michael Paquier
 wrote:
> On Tue, Mar 29, 2016 at 1:11 PM, Thomas Munro  
> wrote:
>> (BTW, isn't the select call in libpq_select
>> lacking an exceptfds set, and can't it therefore block forever when
>> there is an error condition on the socket and no timeout?)
>
> Hm. I think you're right here when timeout is NULL... It would loop 
> infinitely.
> @Andres (in CC): your thoughts on that regarding the new
> WaitEventSetWaitBlock()? The same pattern is used there.

That was a red herring.  I was confused because SUSv2 and POSIX call
this argument 'errorfds' and say that sockets *also* tell you about
errors this way.  (Many/most real OSs call the argument 'exceptfds'
instead and only use it to tell you about out-of-band data and
possibly implementation specific events for devices, pseudo-terminals
etc.  If you want to know about errors on a socket it's enough to have
it in readfds/writefds, and insufficient to have it only in
errorfds/exceptfds unless you can find a computer that actually
conforms to POSIX.)

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On Wed, May 4, 2016 at 6:28 PM, Andres Freund  wrote:
> On 2016-05-04 18:22:27 -0400, Robert Haas wrote:
>> On Wed, May 4, 2016 at 6:06 PM, Andres Freund  wrote:
>> >> Some of the proposals involve fairly small tweaks to call
>> >> MaintainOldSnapshotTimeMapping() from elsewhere or only when
>> >> something changes (like crossing a minute boundary or seeing that a
>> >> new TransactionId has been assigned).  If you can disentangle those
>> >> ideas, it might not look so bad.
>> >
>> > Yea, if we can do that, I'm ok. I'm doubtful about releasing with the
>> > current state, even leaving performance aside, since fixing this will
>> > result in somewhat changed semantics, and I'm doubtful about significant
>> > development at this point of the release process. If it comes down to
>> > either one of those I'm clearly in favor of the latter.
>>
>> How would the semantics change?
>
> Right now the time for computing the snapshot is relevant, if
> maintenance of xids is moved, it'll likely be tied to the time xids are
> assigned. That seems perfectly alright, but it'll change behaviour.

Not following.

>> So, I was worried about this, too.  But I think there is an
>> overwhelming consensus on pgsql-release that getting a beta out early
>> trumps all, and that if that results in somewhat more post-beta1
>> change than we've traditionally had, so be it.
>
> *If* that's the policy - cool!  I just don't want to see the issue
> not being fixed due to only wanting conservative changes. And the
> discussion around fixing spinlock related issues in the patch certainly
> made me think the RMT aimed to be conservative.

Understand that I am conveying what I understand the sentiment of the
community to be, not guaranteeing any specific outcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs

* Noah Misch (n...@leadboat.com) wrote:
> On Wed, May 04, 2016 at 08:14:55AM -0400, Stephen Frost wrote:
> > * Noah Misch (n...@leadboat.com) wrote:
> > > On Mon, Apr 25, 2016 at 12:39:09AM -0400, Stephen Frost wrote:
> > > > * Noah Misch (n...@leadboat.com) wrote:
> > > > > On Fri, Apr 22, 2016 at 12:31:41PM -0400, Stephen Frost wrote:
> > > > > > After looking through the code a bit, I realized that there are a 
> > > > > > lot of
> > > > > > object types which don't have ACLs at all but which exist in 
> > > > > > pg_catalog
> > > > > > and were being analyzed because the bitmask for pg_catalog included 
> > > > > > ACLs
> > > > > > and therefore was non-zero.
> > > > > > 
> > > > > > Clearing that bit for object types which don't have ACLs improved 
> > > > > > the
> > > > > > performance for empty databases quite a bit (from about 3s to a bit
> > > > > > under 1s on my laptop).  That's a 42-line patch, with comment lines
> > > > > > being half of that, which I'll push once I've looked into the other
> > > > > > concerns which were brought up on this thread.
> > > > > 
> > > > > That's good news.
> > > > 
> > > > Attached patch-set includes this change in patch #2.
> > > 
> > > Timings for the 100-database pg_dumpall:
> > > 
> > > HEAD: 131s
> > > HEAD+patch:  33s
> > > 9.5:8.6s
> > > 
> > > Nice improvement for such a simple patch.
> > 
> > Patch #2 in the attached patchset includes that improvement and a
> > further one which returns the performance to very close to 9.5.
> 
> What timings did you measure?  (How close?)

On my laptop, with 100 databases, I get:

   9.5: 0m3.306s, 0m3.290s, 0m3.309s, avg: 3.302
HEAD+patch: 0m4.681s, 0m4.681s, 0m4.709s, avg: 4.690

9.5 was installed from Debian packages, HEAD+patch was built with -O2,
no debugging, no casserts, though it's entirely possible I've got
something else enabled in my builds that slow it down a bit.

In any case, as I was saying, that's far closer to 9.5 run-time.  I've
not measured the time added when things like TRANSFORMs were added, but
it wouldn't surprise me if adding a new query for every database to
pg_dump adds something similar to this difference.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Reviewing freeze map code

On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
> 77a1d1e Department of second thoughts: remove PD_ALL_FROZEN.

Nothing to say here.

> fd31cd2 Don't vacuum all-frozen pages.

Hm. I do wonder if it's going to bite us that we don't have a way to
actually force vacuuming of the whole table (besides manually rm'ing the
VM). I've more than once seen VACUUM used to try to do some integrity
checking of the database.  How are we actually going to test that the
feature works correctly? They'd have to write checks ontop of
pg_visibility to see whether things are borked.

/*
 * Compute whether we actually scanned the whole relation. If we did, we
 * can adjust relfrozenxid and relminmxid.
 *
 * NB: We need to check this before truncating the relation, because 
that
 * will change ->rel_pages.
 */

Comment is out-of-date now.

-   if (blkno == next_not_all_visible_block)
+   if (blkno == next_unskippable_block)
{
-   /* Time to advance next_not_all_visible_block */
-   for (next_not_all_visible_block++;
-next_not_all_visible_block < nblocks;
-next_not_all_visible_block++)
+   /* Time to advance next_unskippable_block */
+   for (next_unskippable_block++;
+next_unskippable_block < nblocks;
+next_unskippable_block++)

Hm. So we continue with the course of re-processing pages, even if
they're marked all-frozen. For all-visible there at least can be a
benefit by freezing earlier, but for all-frozen pages there's really no
point.  I don't really buy the arguments for the skipping logic. But
even disregarding that, maybe we should skip processing a block if it's
all-frozen (without preventing the page from being read?); as there's no
possible benefit?  Acquring the exclusive/content lock and stuff is far
from free.

Not really related to this patch, but the FORCE_CHECK_PAGE is rather
ugly.

+   /*
+* The current block is potentially skippable; if we've 
seen a
+* long enough run of skippable blocks to justify 
skipping it, and
+* we're not forced to check it, then go ahead and skip.
+* Otherwise, the page must be at least all-visible if 
not
+* all-frozen, so we can set 
all_visible_according_to_vm = true.
+*/
+   if (skipping_blocks && !FORCE_CHECK_PAGE())
+   {
+   /*
+* Tricky, tricky.  If this is in aggressive 
vacuum, the page
+* must have been all-frozen at the time we 
checked whether it
+* was skippable, but it might not be any more. 
 We must be
+* careful to count it as a skipped all-frozen 
page in that
+* case, or else we'll think we can't update 
relfrozenxid and
+* relminmxid.  If it's not an aggressive 
vacuum, we don't
+* know whether it was all-frozen, so we have 
to recheck; but
+* in this case an approximate answer is OK.
+*/
+   if (aggressive || VM_ALL_FROZEN(onerel, blkno, 
))
+   vacrelstats->frozenskipped_pages++;
continue;
+   }

Hm. This indeed seems a bit tricky.  Not sure how to make it easier
though without just ripping out the SKIP_PAGES_THRESHOLD stuff.

Hm. This also doubles the number of VM accesses. While I guess that's
not noticeable most of the time, it's still not nice; especially when a
large relation is entirely frozen, because it'll mean we'll sequentially
go through the visibilityma twice.

I wondered for a minute whether #14057 could cause really bad issues
here
http://www.postgresql.org/message-id/20160331103739.8956.94...@wrigleys.postgresql.org
but I don't see it being more relevant here.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Attached patch implements this change to not LOCK the table in cases
> > where we don't need to.  I'll push this with my other changes to pg_dump
> > tomorrow (and I've included it in an updated, complete, set of patches
> > sent on the thread where those changes were being discussed already).
> 
> > Wanted to include it here also for completeness.
> 
> > Comments welcome, of course.
> 
> Minor suggestion: instead of putting these comments and hardwired
> knowledge here, I'd suggest putting them adjacent to the list of
> DUMP_COMPONENT #defines, creating a symbol along the lines of
> DUMP_COMPONENTS_REQUIRING_TABLE_LOCK.  That approach would make it
> far more likely that somebody changing the list of DUMP_COMPONENT
> elements in future would notice the possible need to adjust the
> requires-lock list.

Good thought, I'll do that.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_dump dump catalog ACLs

2016-05-04 Thread Noah Misch

On Wed, May 04, 2016 at 08:14:55AM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Mon, Apr 25, 2016 at 12:39:09AM -0400, Stephen Frost wrote:
> > > * Noah Misch (n...@leadboat.com) wrote:
> > > > On Fri, Apr 22, 2016 at 12:31:41PM -0400, Stephen Frost wrote:
> > > > > After looking through the code a bit, I realized that there are a lot 
> > > > > of
> > > > > object types which don't have ACLs at all but which exist in 
> > > > > pg_catalog
> > > > > and were being analyzed because the bitmask for pg_catalog included 
> > > > > ACLs
> > > > > and therefore was non-zero.
> > > > > 
> > > > > Clearing that bit for object types which don't have ACLs improved the
> > > > > performance for empty databases quite a bit (from about 3s to a bit
> > > > > under 1s on my laptop).  That's a 42-line patch, with comment lines
> > > > > being half of that, which I'll push once I've looked into the other
> > > > > concerns which were brought up on this thread.
> > > > 
> > > > That's good news.
> > > 
> > > Attached patch-set includes this change in patch #2.
> > 
> > Timings for the 100-database pg_dumpall:
> > 
> > HEAD:   131s
> > HEAD+patch:  33s
> > 9.5:  8.6s
> > 
> > Nice improvement for such a simple patch.
> 
> Patch #2 in the attached patchset includes that improvement and a
> further one which returns the performance to very close to 9.5.

What timings did you measure?  (How close?)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On 2016-05-04 18:22:27 -0400, Robert Haas wrote:
> On Wed, May 4, 2016 at 6:06 PM, Andres Freund  wrote:
> >> Some of the proposals involve fairly small tweaks to call
> >> MaintainOldSnapshotTimeMapping() from elsewhere or only when
> >> something changes (like crossing a minute boundary or seeing that a
> >> new TransactionId has been assigned).  If you can disentangle those
> >> ideas, it might not look so bad.
> >
> > Yea, if we can do that, I'm ok. I'm doubtful about releasing with the
> > current state, even leaving performance aside, since fixing this will
> > result in somewhat changed semantics, and I'm doubtful about significant
> > development at this point of the release process. If it comes down to
> > either one of those I'm clearly in favor of the latter.
> 
> How would the semantics change?

Right now the time for computing the snapshot is relevant, if
maintenance of xids is moved, it'll likely be tied to the time xids are
assigned. That seems perfectly alright, but it'll change behaviour.


> So, I was worried about this, too.  But I think there is an
> overwhelming consensus on pgsql-release that getting a beta out early
> trumps all, and that if that results in somewhat more post-beta1
> change than we've traditionally had, so be it.

*If* that's the policy - cool!  I just don't want to see the issue
not being fixed due to only wanting conservative changes. And the
discussion around fixing spinlock related issues in the patch certainly
made me think the RMT aimed to be conservative.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On Wed, May 4, 2016 at 6:06 PM, Andres Freund  wrote:
>> Some of the proposals involve fairly small tweaks to call
>> MaintainOldSnapshotTimeMapping() from elsewhere or only when
>> something changes (like crossing a minute boundary or seeing that a
>> new TransactionId has been assigned).  If you can disentangle those
>> ideas, it might not look so bad.
>
> Yea, if we can do that, I'm ok. I'm doubtful about releasing with the
> current state, even leaving performance aside, since fixing this will
> result in somewhat changed semantics, and I'm doubtful about significant
> development at this point of the release process. If it comes down to
> either one of those I'm clearly in favor of the latter.

How would the semantics change?

> Hm. We're pretty close to a go/no go point, namely beta1.  I don't want
> to be in the situation that we don't fix the issue before release, just
> because beta has passed.

So, I was worried about this, too.  But I think there is an
overwhelming consensus on pgsql-release that getting a beta out early
trumps all, and that if that results in somewhat more post-beta1
change than we've traditionally had, so be it.  And I think that's
pretty fair.  We need to be really careful, as we get closer to
release, about what impact the changes we make might have even on
people not using the features in question, because otherwise we might
invalidate the results of testing already performed.  We also need to
be careful about whether late changes are going to be stable, because
instability at a late date will postpone the release.  But I don't
believe that rules out all meaningful change.  I think we can decide
to revert this feature, or rework it somewhat, after beta1, and that's
OK.

Similarly with the other commits that currently have multiple
outstanding bugs.  If they continue to breed bugs, we can shoot them
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New pgbench functions are misnamed

On Wed, May 4, 2016 at 5:41 PM, Tom Lane  wrote:
> I noticed that commit 7e137f846 added functions named max() and min()
> to pgbench's expression syntax.  Unfortunately, these functions have
> zilch to do with what max() and min() do in SQL.  They're actually more
> like the greatest() and least() server-side functions.
>
> While I can't imagine that we'd ever want to implement true aggregates
> in pgbench expressions, it still seems like this is a recipe for
> confusion.  Shouldn't we rename these to greatest() and least()?

Yeah, that's probably a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:
> I spent some time trying to make a test case that was impossibly slow,
> without any really impressive result: I saw at most about a 25% growth in
> planning time, for a 27-way join with one or two foreign keys per table.
> I noted however that with a simple FROM list of tables, you don't really
> get the full force of the combinatorial explosion, because
> join_search_one_level prefers to generate left-deep trees first, and so
> the first creation of a given joinrel is always N left-side rels against 1
> right-side rel, causing the second level of looping to always iterate just
> once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
> devise join order constraints that would result in a lot of high-level
> joinrels being formed with many relations on both sides, which would cause
> both of the second and third levels to iterate O(N) times not just once.
> I didn't have much luck, but I have zero confidence that our users won't
> find such cases.

Have you looked at the patch David Rowley proposed to fix this by
doing some caching?  I am not crazy about accepting even a 25% growth
in planning time on a 27-way join, although maybe 27-way joins are
rare enough and vulnerable enough to bad plans that it would be worth
it if we could convince ourselves that plan quality would go up.  But
if that patch drops it to some much lesser number, we should consider
that as a possible fix.

> Bugs in quals_match_foreign_key():
>
> * Test for clause->opno == fkinfo->conpfeqop[i] fails to consider
> cross-type operators, ie, what's in the clause might be int2 = int4
> while the conpfeqop is int4 = int2.
>
> * parent_ec check isn't really the right thing, since EC-derived clauses
> might not have that set.  I think it may be adequate given that you only
> deal with simple Vars, but at least a comment about that would be good.
>
> * Come to think of it, you could probably dodge the commutator operator
> problem altogether for clauses with nonnull parent_ec, because they must
> contain a relevant equality operator.  (Although if it's redesigned as I
> suggest above, the code path for a clause with parent_ec set would look
> totally different from this anyway.)
>
> * Maintaining the fkmatches bitmapset is useless expense, just use a
> counter of matched keys.  Or for that matter, why not just fail
> immediately if i'th key is not found?

Technically, only the first of these is a clear bug, IMHO.  But it
seems like they should all be fixed.

> find_best_foreign_key_quals():
>
> * Test on enable_fkey_estimates should be one call level up to dodge the
> useless doubly-nested loop in clauselist_join_selectivity (which would
> make the "fast path" exit here pretty much useless)

Yes, that's pretty stupid, and should be fixed.  Coding style is not
per project spec, either.  Also, the header comment for
find_best_foreign_key_quals and in fact the name of the function look
pretty poor.  It seems that the return value is the number of columns
in the foreign key and that an out parameter, joinqualsbitmap, whose
exact meaning doesn't seem to be documented in any comment anywhere in
the patch.

> clauselist_join_selectivity():
>
> * "either could be zero, but not both" is a pretty unhelpful comment given
> the if-test just above it.  What *could* have used some explanation is
> what the next two dozen lines are doing, because they're as opaque as can
> be; and the XXX comment doesn't leave a warm feeling that the author
> understands it either.  I'm not prepared to opine on whether this segment
> of code is correct at all without better commentary.

I'm pretty baffled by this code, too.  I think what the overlap stuff
is doing is trying to calculate selectivity when we match multiple
foreign key constraints, but it doesn't look very principled.
find_best_foreign_key_quals discards "shorter" matches entirely,
picking arbitrarily among longer ones, but then we try to deal with
all of the ones that survive that stage even if they overlap.  It's
hard to judge whether any of this makes sense without more
explanation.

> calc_joinrel_size_estimate():
>
> * why is clauselist_join_selectivity applied to pushed-down clauses and
> not local ones in an outer join?  If that's not an oversight, it at least
> requires an explanatory comment.  Note that this means we are not applying
> FK knowledge for "a left join b on x = y", though it seems like we could.
>
> compute_semi_anti_join_factors isn't using clauselist_join_selectivity
> either.  I don't know whether it should be, but if not, a comment about
> why not seems appropriate.  More generally, I wonder why this logic
> wasn't just folded into clauselist_selectivity.

Good questions.

> guc.c:
> undocumented GUCs are not acceptable

Agreed.

> paths.h:
> patch introduces an extern that's referenced noplace

Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] what to revert

Hi,

On 2016-05-04 16:22:41 -0500, Kevin Grittner wrote:
> On Wed, May 4, 2016 at 3:42 PM, Andres Freund  wrote:
> 
> > My concern isn't performing checks at snapshot build time and at buffer
> > access time. That seems fairly reasonable.  My concern is that the
> > time->xid mapping used to determine the xid horizon is built at snapshot
> > access time. The relevant function for that is
> > MaintainOldSnapshotTimeMapping() called by GetSnapshotData(), if you
> > want to look at it.  Building the mapping when building the snapshot
> > requires that an exclusive lwlock is taken.  Given that there's very few
> > access patterns in which more xids are assigned than snapshots taken,
> > that just doesn't make much sense; even leaving the fact that adding an
> > exclusive lock in a function that's already one of the biggest
> > bottlenecks in postgres, isn't a good idea.
> 
> It seems to me that you are confusing the structure with the point
> from which the function to call it is filled (and how often).

Well, you say tomato I say tomato. Sure we'd still use some form of
mapping, but filling it from somewhere else, with somewhat different
contents (xact timestamp presumably), with a somewhat different
replacement policy wouldn't leave all that much of the current structure
in place.  But ok, let's then call it a question of where from and how
the mapping is maintained.

> Some of the proposals involve fairly small tweaks to call
> MaintainOldSnapshotTimeMapping() from elsewhere or only when
> something changes (like crossing a minute boundary or seeing that a
> new TransactionId has been assigned).  If you can disentangle those
> ideas, it might not look so bad.

Yea, if we can do that, I'm ok. I'm doubtful about releasing with the
current state, even leaving performance aside, since fixing this will
result in somewhat changed semantics, and I'm doubtful about significant
development at this point of the release process. If it comes down to
either one of those I'm clearly in favor of the latter.

> >> Surely nobody here is blind to the fact that holding back xmin often
> >> creates a lot of bloat for no reason - many or all of the retained old
> >> row versions may never be accessed.
> >
> > Definitely. I *like* the feature.
> 
> And I think we have agreed in off-list discussions that even with
> these NUMA issues the patch, as it stands, would help some -- the
> poor choice of a call site for MaintainOldSnapshotTimeMapping()
> just unnecessarily limits how many workloads can benefit.

Yes, totally.

> Hopefully you can understand the reasons I chose to focus on the
> performance issues when it is disabled, and the hash index issue
> before moving on to this.

Well, somewhat. For me addressing architectural issues (and where to
fill the mapping from is that, independent of whether you call it a data
structure issue) is pretty important too, because I personally don't
believe we can release or even go to beta before that. But I'd not be
all that bothered to release beta1 with a hash index issue that we know
how to address.

> > Sure, it's not a guarantee. But I think a "promise" (signed in blood of
> > course) of a senior contributor makes quite the difference.
> 
> How about we see where we are as we get closer to a go/no go point
> and see where performance has settled and what kind of promise
> would satisfy you.

Hm. We're pretty close to a go/no go point, namely beta1.  I don't want
to be in the situation that we don't fix the issue before release, just
because beta has passed.

> I'm uncomfortable with the demand for a rather
> non-specific promise, and find myself flashing back to some
> sections of the Catch 22 novel.

Gotta read that one day (ordered).

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

2016-05-04 Thread Peter Geoghegan

On Wed, May 4, 2016 at 1:42 PM, Andres Freund  wrote:
>> Surely nobody here is blind to the fact that holding back xmin often
>> creates a lot of bloat for no reason - many or all of the retained old
>> row versions may never be accessed.
>
> Definitely. I *like* the feature.

+1.

>> I do not think it's a show-stopper if you wish he'd worked harder on
>> the performance under heavy concurrency with very short transactions.
>> You could have raised that issue at any time, but instead waited until
>> after feature freeze.
>
> Sorry, I don't buy that argument. There were senior community people
> giving feedback on the patch, the potential for performance regressions
> wasn't called out, the patch was updated pretty late in the CF.  I find
> it really scary that review didn't call out the potential for
> regressions here, at the very least the ones with the feature disabled
> really should have been caught.  Putting exclusive lwlocks and spinlocks
> in critical paths isn't a subtle, easy to miss, issue.

As one of the people that looked at the patch before it went in,
perhaps I can shed some light. I focused exclusively on the
correctness of the core mechanism. It simply didn't occur to me to
check for problems of the type you describe, that affect the system
even when the feature is disabled. I didn't check because Kevin is a
committer, that I assumed wouldn't have made such an error. Also, time
was limited.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

Stephen Frost  writes:
> Attached patch implements this change to not LOCK the table in cases
> where we don't need to.  I'll push this with my other changes to pg_dump
> tomorrow (and I've included it in an updated, complete, set of patches
> sent on the thread where those changes were being discussed already).

> Wanted to include it here also for completeness.

> Comments welcome, of course.

Minor suggestion: instead of putting these comments and hardwired
knowledge here, I'd suggest putting them adjacent to the list of
DUMP_COMPONENT #defines, creating a symbol along the lines of
DUMP_COMPONENTS_REQUIRING_TABLE_LOCK.  That approach would make it
far more likely that somebody changing the list of DUMP_COMPONENT
elements in future would notice the possible need to adjust the
requires-lock list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

On Wed, May 4, 2016 at 5:51 PM, Robert Haas  wrote:
> On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:
>> I spent some time trying to make a test case that was impossibly slow,
>> without any really impressive result: I saw at most about a 25% growth in
>> planning time, for a 27-way join with one or two foreign keys per table.
>> I noted however that with a simple FROM list of tables, you don't really
>> get the full force of the combinatorial explosion, because
>> join_search_one_level prefers to generate left-deep trees first, and so
>> the first creation of a given joinrel is always N left-side rels against 1
>> right-side rel, causing the second level of looping to always iterate just
>> once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
>> devise join order constraints that would result in a lot of high-level
>> joinrels being formed with many relations on both sides, which would cause
>> both of the second and third levels to iterate O(N) times not just once.
>> I didn't have much luck, but I have zero confidence that our users won't
>> find such cases.
>
> I'

Well, that wasn't my best-ever email to the list.  Sigh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:
> I spent some time trying to make a test case that was impossibly slow,
> without any really impressive result: I saw at most about a 25% growth in
> planning time, for a 27-way join with one or two foreign keys per table.
> I noted however that with a simple FROM list of tables, you don't really
> get the full force of the combinatorial explosion, because
> join_search_one_level prefers to generate left-deep trees first, and so
> the first creation of a given joinrel is always N left-side rels against 1
> right-side rel, causing the second level of looping to always iterate just
> once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
> devise join order constraints that would result in a lot of high-level
> joinrels being formed with many relations on both sides, which would cause
> both of the second and third levels to iterate O(N) times not just once.
> I didn't have much luck, but I have zero confidence that our users won't
> find such cases.

I'

>
> The reason it's so brute force is that it's rediscovering the same facts
> over and over.  In all simple (non-outer-join) cases, the only clauses
> that are of any interest are derived from EquivalenceClasses, and all
> that you really need to know is "are the vars mentioned on each side
> of this FK present in the same EquivalenceClass?".  ISTM that could be
> determined once per FK per query and cached in or near the EC, not
> expensively rediscovered at each level of joining.  I'm not sure whether
> it'd be worth considering non-EC-derived equalities (ie, outer join
> clauses) at all, and note that the current patch fails to do so anyway;
> see below.
>
> My other design-level complaint is that basing this on foreign keys is
> fundamentally the wrong thing.  What actually matters is the unique index
> underlying the FK; that is, if we have "a.x = b.y" and there's a
> compatible unique index on b.y, we can conclude that no A row will match
> more than one B row, whether or not an explicit FK relationship has been
> declared.  So we should drive this off unique indexes instead of FKs,
> first because we will find more cases, and second because the planner
> already examines indexes and doesn't need any additional catalog lookups
> to get the required data.  (IOW, the relcache additions that were made in
> this patch series should go away too.)
>
> Aside from the design being basically wrong, the code quality seems pretty
> low.  Aside from the missing IsA check that started this thread, I found
> the following problems in a quick once-over of 137805f89:
>
> Bugs in quals_match_foreign_key():
>
> * Test for clause->opno == fkinfo->conpfeqop[i] fails to consider
> cross-type operators, ie, what's in the clause might be int2 = int4
> while the conpfeqop is int4 = int2.
>
> * parent_ec check isn't really the right thing, since EC-derived clauses
> might not have that set.  I think it may be adequate given that you only
> deal with simple Vars, but at least a comment about that would be good.
>
> * Come to think of it, you could probably dodge the commutator operator
> problem altogether for clauses with nonnull parent_ec, because they must
> contain a relevant equality operator.  (Although if it's redesigned as I
> suggest above, the code path for a clause with parent_ec set would look
> totally different from this anyway.)
>
> * Maintaining the fkmatches bitmapset is useless expense, just use a
> counter of matched keys.  Or for that matter, why not just fail
> immediately if i'th key is not found?
>
> find_best_foreign_key_quals():
>
> * Test on enable_fkey_estimates should be one call level up to dodge the
> useless doubly-nested loop in clauselist_join_selectivity (which would
> make the "fast path" exit here pretty much useless)
>
> clauselist_join_selectivity():
>
> * "either could be zero, but not both" is a pretty unhelpful comment given
> the if-test just above it.  What *could* have used some explanation is
> what the next two dozen lines are doing, because they're as opaque as can
> be; and the XXX comment doesn't leave a warm feeling that the author
> understands it either.  I'm not prepared to opine on whether this segment
> of code is correct at all without better commentary.
>
> calc_joinrel_size_estimate():
>
> * why is clauselist_join_selectivity applied to pushed-down clauses and
> not local ones in an outer join?  If that's not an oversight, it at least
> requires an explanatory comment.  Note that this means we are not applying
> FK knowledge for "a left join b on x = y", though it seems like we could.
>
> compute_semi_anti_join_factors isn't using clauselist_join_selectivity
> either.  I don't know whether it should be, but if not, a comment about
> why not seems appropriate.  More generally, I wonder why this logic
> wasn't just folded into clauselist_selectivity.
>
> guc.c:
> undocumented GUCs are not acceptable
>
> paths.h:
> patch introduces

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

On Wed, May 4, 2016 at 5:02 PM, Tom Lane  wrote:
> Very good point, but unless I'm missing something, that is not what the
> current patch does.  I'm not sure offhand whether that's an important
> estimation failure mode currently, or if it is whether it would be
> sensible to try to implement that rule entirely separately from the "at
> most one" aspect, or if it isn't sensible, whether that's a sufficiently
> strong reason to confine the "at most one" logic to working only with FKs
> and not with bare unique indexes.

Tomas seems to feel that that is what the current patch does, and
indeed that it's the main point of the current patch, but you seem to
think that it doesn't do that.  Either I'm misinterpreting what one of
you is saying, or you are missing something, or his patch fails to
accomplish its intended purpose.  It seems important to figure out
which of those things is true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

Hi Jeff,

On 2016-05-04 14:00:01 -0700, Jeff Janes wrote:
> On Tue, May 3, 2016 at 4:05 PM, Andres Freund  wrote:
> > Hm. I appear to have trouble reproducing this issue (continuing to try)
> > on master as of 8826d8507.  Is there any chance you could package up a
> > data directory after the issue hit?
> 
> I'll look into.  I haven't been saving them, as they are very large
> (tens of GB) by the time the errors happen.

Hm. Any chance that's SSH accessible?

What compiler-version & OS are you using, with what exact
CFLAGS/configure input? I'd like to try to replicate the setup as close
as possible; in the hope of just making it locally reproducible.

> In case I can't find a way to transfer that much data, is there
> something I could do in situ to debug it?

Yes. It'd be good to get a look at the borked page/tuple with
pageinspect. That might require some manual search to find the affected
tuple, and possibly the problem is transient.  I was wondering whether
we could just put an Assert() into those error messages, to get a stack
dump. But unfortunately your tooling would likely generate far too many
of those.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New pgbench functions are misnamed

I noticed that commit 7e137f846 added functions named max() and min()
to pgbench's expression syntax.  Unfortunately, these functions have
zilch to do with what max() and min() do in SQL.  They're actually more
like the greatest() and least() server-side functions.

While I can't imagine that we'd ever want to implement true aggregates
in pgbench expressions, it still seems like this is a recipe for
confusion.  Shouldn't we rename these to greatest() and least()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Seems reasonable otherwise.

Attached patch implements this change to not LOCK the table in cases
where we don't need to.  I'll push this with my other changes to pg_dump
tomorrow (and I've included it in an updated, complete, set of patches
sent on the thread where those changes were being discussed already).

Wanted to include it here also for completeness.

Comments welcome, of course.

Thanks!

Stephen
From 32645eac291b546df2acb23685c0a5f123131efa Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Wed, 4 May 2016 13:40:36 -0400
Subject: [PATCH 3/4] Only issue LOCK TABLE commands when necessary

Reviewing the cases where we need to LOCK a given table during a dump,
it was pointed out by Tom that we really don't need to LOCK a table if
we are only looking to dump the ACL for it, or certain other
components.  After reviewing the queries run for all of the component
pieces, a list of components were determined to not require LOCK'ing
of the table.

This implements a check to avoid LOCK'ing those tables.

Initial complaint from Rushabh Lathia, discussed with Robert and Tom,
the patch is mine.
---
 src/bin/pg_dump/pg_dump.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d826b4d..2c75b70 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5947,14 +5947,37 @@ getTables(Archive *fout, int *numTables)
 		 * Read-lock target tables to make sure they aren't DROPPED or altered
 		 * in schema before we get around to dumping them.
 		 *
+		 * Note that some components only require looking at the information
+		 * in the pg_catalog tables and, for those components, we do not need
+		 * to lock the table.  Be careful here though- some components use
+		 * server-side functions which pull the latest information from
+		 * SysCache and in those cases we *do* need to lock the table.
+		 *
 		 * Note that we don't explicitly lock parents of the target tables; we
 		 * assume our lock on the child is enough to prevent schema
 		 * alterations to parent tables.
 		 *
 		 * NOTE: it'd be kinda nice to lock other relations too, not only
 		 * plain tables, but the backend doesn't presently allow that.
+		 *
+		 * We do not need locks for the COMMENT and SECLABEL components as
+		 * those simply query their associated tables without using any
+		 * server-side functions.  We do not need locks for the ACL component
+		 * as we pull that information from pg_class without using any
+		 * server-side functions that use SysCache.  The USERMAP component
+		 * is only relevant for FOREIGN SERVERs and not tables, so no sense
+		 * locking a table for that either (that can happen if we are going
+		 * to dump "ALL" components for a table).
+		 *
+		 * We DO need locks for DEFINITION, due to various server-side
+		 * functions that are used and POLICY due to pg_get_expr().  We set
+		 * this up to grab the lock except in the cases we know to be safe.
 		 */
-		if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION)
+		if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION &&
+(tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT |
+		  DUMP_COMPONENT_SECLABEL |
+		  DUMP_COMPONENT_ACL |
+		  DUMP_COMPONENT_USERMAP)))
 		{
 			resetPQExpBuffer(query);
 			appendPQExpBuffer(query,
-- 
2.5.0


signature.asc
Description: Digital signature

Re: [HACKERS] pg_dump dump catalog ACLs

Noah, all,

* Stephen Frost (sfr...@snowman.net) wrote:
> The test suite is now covering 57% of pg_dump.c.  I was hoping to get
> that number higher, but time marches on and more tests can certainly be
> added later.

I've managed to get the test suite up another 10%, to 67% of pg_dump.c.
Still quite a bit to do, but it turned up a bug in CREATE TRANSFORM,
which lead to Peter discovering a similar issue in CREATE CAST, so I'd
say it's certainly showing that it's worthwhile to have.

Updated patch-set which includes the additional tests and also a patch
to avoid LOCK'ing tables when we don't need to, as discussed with Tom
and Robert on another thread.

Thanks!

Stephen
From cea33c3fdbdfa4ff27c0ff6ed7b708705dfa17fe Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Sun, 24 Apr 2016 23:59:23 -0400
Subject: [PATCH 1/4] Correct pg_dump WHERE clause for functions/aggregates

The query to grab the function/aggregate information is now joining
to pg_init_privs, so we can simplify (and correct) the WHERE clause
used to determine if a given function's ACL has changed from the
initial ACL on the function.

Bug found by Noah, patch by me.
---
 src/bin/pg_dump/pg_dump.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d3f5157..bb33075 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4673,11 +4673,7 @@ getAggregates(Archive *fout, int *numAggs)
 		  "p.pronamespace != "
 		  "(SELECT oid FROM pg_namespace "
 		  "WHERE nspname = 'pg_catalog') OR "
-		  "EXISTS (SELECT * FROM pg_init_privs pip "
-		  "WHERE p.oid = pip.objoid AND pip.classoid = "
-		  "(SELECT oid FROM pg_class "
-		  "WHERE relname = 'pg_proc') "
-		  "AND p.proacl IS DISTINCT FROM pip.initprivs)",
+		  "p.proacl IS DISTINCT FROM pip.initprivs",
 		  username_subquery,
 		  acl_subquery->data,
 		  racl_subquery->data,
@@ -4923,11 +4919,7 @@ getFuncs(Archive *fout, int *numFuncs)
 		  "pronamespace != "
 		  "(SELECT oid FROM pg_namespace "
 		  "WHERE nspname = 'pg_catalog') OR "
-		  "EXISTS (SELECT * FROM pg_init_privs pip "
-		  "WHERE p.oid = pip.objoid AND pip.classoid = "
-		  "(SELECT oid FROM pg_class "
-		  "WHERE relname = 'pg_proc') "
-		  "AND p.proacl IS DISTINCT FROM pip.initprivs)",
+		  "p.proacl IS DISTINCT FROM pip.initprivs",
 		  acl_subquery->data,
 		  racl_subquery->data,
 		  initacl_subquery->data,
-- 
2.5.0

From 707707ba1c2ee601ce893258eaaf5c54341806f6 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 25 Apr 2016 00:00:15 -0400
Subject: [PATCH 2/4] pg_dump performance and other fixes

Do not try to dump objects which do not have ACLs when only ACLs are
being requested.  This results in a significant performance improvement
as we can avoid querying for further information on these objects when
we don't need to.

When limiting the components to dump for an extension, consider what
components have been requested.  Initially, we incorrectly hard-coded
the components of the extension objects to dump, which would mean that
we wouldn't dump some components even with they were asked for and in
other cases we would dump components which weren't requested.

Correct defaultACLs to use 'dump_contains' instead of 'dump'.  The
defaultACL is considered a member of the namespace and should be
dumped based on the same set of components that the other objects in
the schema are, not based on what we're dumping for the namespace
itself (which might not include ACLs, if the namespace has just the
default or initial ACL).

Use DUMP_COMPONENT_ACL for from-initdb objects, to allow users to
change their ACLs, should they wish to.  This just extends what we
are doing for the pg_catalog namespace to objects which are not
members of namespaces.
---
 src/bin/pg_dump/pg_dump.c | 176 +++---
 1 file changed, 149 insertions(+), 27 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index bb33075..d826b4d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1297,14 +1297,17 @@ checkExtensionMembership(DumpableObject *dobj, Archive *fout)
 	 * contents rather than replace the extension contents with something
 	 * different.
 	 */
-	if (!fout->dopt->binary_upgrade && fout->remoteVersion >= 90600)
-		dobj->dump = DUMP_COMPONENT_ACL |
-			DUMP_COMPONENT_SECLABEL |
-			DUMP_COMPONENT_POLICY;
-	else if (!fout->dopt->binary_upgrade)
-		dobj->dump = DUMP_COMPONENT_NONE;
-	else
+	if (fout->dopt->binary_upgrade)
 		dobj->dump = ext->dobj.dump;
+	else
+	{
+		if (fout->remoteVersion < 90600)
+			dobj->dump = DUMP_COMPONENT_NONE;
+		else
+			dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL |
+	DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_POLICY);
+
+	}

 	return true;
 }
@@ -1452,7 +1455,7 @@

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-05-04 Thread Tomas Vondra


Hi,

On 05/04/2016 11:02 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:

My other design-level complaint is that basing this on foreign keys is
fundamentally the wrong thing.  What actually matters is the unique index
underlying the FK; that is, if we have "a.x = b.y" and there's a
compatible unique index on b.y, we can conclude that no A row will match
more than one B row, whether or not an explicit FK relationship has been
declared.  So we should drive this off unique indexes instead of FKs,
first because we will find more cases, and second because the planner
already examines indexes and doesn't need any additional catalog lookups
to get the required data.  (IOW, the relcache additions that were made in
this patch series should go away too.)



Without prejudice to anything else in this useful and detailed review,
I have a question about this.  A unique index proves that no A row
will match more than one B row, and I agree that deriving that from
unique indexes is sensible.  However, ISTM that an FK provides
additional information: we know that, modulo filter conditions on B,
every A row will match *exactly* one row B row, which can prevent us
from *underestimating* the size of the join product.  A unique index
can't do that.


Very good point, but unless I'm missing something, that is not what the
current patch does.  I'm not sure offhand whether that's an important
estimation failure mode currently, or if it is whether it would be
sensible to try to implement that rule entirely separately from the "at
most one" aspect, or if it isn't sensible, whether that's a sufficiently
strong reason to confine the "at most one" logic to working only with FKs
and not with bare unique indexes.


FWIW it's a real-world problem with multi-column FKs. As David pointed 
out upthread, a nice example of this issue is Q9 in the TPC-H bench, 
where the underestimate leads to HashAggregate and then OOM failure.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On Wed, May 4, 2016 at 3:42 PM, Andres Freund  wrote:

> My concern isn't performing checks at snapshot build time and at buffer
> access time. That seems fairly reasonable.  My concern is that the
> time->xid mapping used to determine the xid horizon is built at snapshot
> access time. The relevant function for that is
> MaintainOldSnapshotTimeMapping() called by GetSnapshotData(), if you
> want to look at it.  Building the mapping when building the snapshot
> requires that an exclusive lwlock is taken.  Given that there's very few
> access patterns in which more xids are assigned than snapshots taken,
> that just doesn't make much sense; even leaving the fact that adding an
> exclusive lock in a function that's already one of the biggest
> bottlenecks in postgres, isn't a good idea.

It seems to me that you are confusing the structure with the point
from which the function to call it is filled (and how often).
Some of the proposals involve fairly small tweaks to call
MaintainOldSnapshotTimeMapping() from elsewhere or only when
something changes (like crossing a minute boundary or seeing that a
new TransactionId has been assigned).  If you can disentangle those
ideas, it might not look so bad.

> If we instead built the map somewhere around GetNewTransactionId() we'd
> only pay the cost when advancing an xid. It'd also make it possible to
> avoid another GetCurrentIntegerTimestamp() per snapshot, including the
> acquiration of oldSnapshotControl->mutex_current. In addition the data
> structure would be easier to maintain, because xids grow monotonically
> (yes, yes wraparound, but that's not a problem) - atm we need somwehat
> more complex logic to determine into which bucket an xid/timestamp pair
> has to be mapped to.

Right.

> So I'm really not just concerned with the performance, I think that's
> just fallout from choosing the wrong data representation.

Wrong.  It's a matter of when the calls are made to maintain the
structure.

> If you look at at bottom of
> http://www.postgresql.org/message-id/20160413171955.i53me46fqqhdl...@alap3.anarazel.de
> which shows performance numbers for old_snapshot_threshold=0
> vs. old_snapshot_threshold=10. While that wasn't intended at the time,
> old_snapshot_threshold=0 shows the cost of the feature without
> maintaining the mapping, and old_snapshot_threshold=10 shows it while
> building the mapping. Pretty dramatic difference - that's really the
> cost of map maintenance.

Right.

> FWIW, it's not just me doubting the data structure here, Ants did as
> well.

It is true that the patch he posted added one more field to a struct.

>> Surely nobody here is blind to the fact that holding back xmin often
>> creates a lot of bloat for no reason - many or all of the retained old
>> row versions may never be accessed.
>
> Definitely. I *like* the feature.

And I think we have agreed in off-list discussions that even with
these NUMA issues the patch, as it stands, would help some -- the
poor choice of a call site for MaintainOldSnapshotTimeMapping()
just unnecessarily limits how many workloads can benefit.
Hopefully you can understand the reasons I chose to focus on the
performance issues when it is disabled, and the hash index issue
before moving on to this.

 So I think accepting the promise that this feature would be improved
 in a future release to better support that case is good enough.
>>>
>>> I've not heard any such promise.
>>
>> The question Alvaro is raising isn't whether such a promise has been
>> made but whether it would suffice.  In fairness, such promises are not
>> enforceable.  Kevin can promise to work on this and then be run over
>> by a rogue Mr. Softee truck tomorrow, and the work won't get done; or
>> he can go to work for some non-PostgreSQL-supporting company and
>> vanish.
>
> Sure, it's not a guarantee. But I think a "promise" (signed in blood of
> course) of a senior contributor makes quite the difference.

How about we see where we are as we get closer to a go/no go point
and see where performance has settled and what kind of promise
would satisfy you.  I'm uncomfortable with the demand for a rather
non-specific promise, and find myself flashing back to some
sections of the Catch 22 novel.

>> But personally, I generally agree with Alvaro's analysis.  If this
>> patch is slow when turned on, and you don't like that, don't use it.
>
>> If you need to use it and want it to scale better, then you can write
>> a patch to make it do that. Nobody is more qualified than you.
>
> I think that's what ticks me off about this. By introducing a feature
> implemented with the wrong structure, the onus of work is placed on
> others. It's imo perfectly reasonable not to unneccesarily perform
> micro-optimization before benchmarks show a problem, and if it were just
> a question of doing that in 9.7, I'd be fine. But what we're talking
> about is rewriting the central part of the feature.

If you see a need for more than adding a

Re: [HACKERS] pg_dump vs. TRANSFORMs

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 5/4/16 2:39 PM, Stephen Frost wrote:
> >These checks are looking for the functions used by the transform in the
> >list of functions that pg_dump has loaded, but in 9.5, we don't load any
> >of the function in pg_catalog, and even with my patches, we only dump
> >the functions in pg_catalog that have an ACL which has been changed from
> >the default.
> 
> This issue is not specific to transforms.  For example, if you
> create a user-defined cast using a built-in function, the cast won't
> be dumped. Obviously, this is a problem, but it needs a more general
> solution.

Ah, I hadn't gotten to casts yet with my test framework.

I'm certainly open to suggestions on how to address this, but I will
point out that we already have a number of cases where we issue
additional queries against the database from the dump*() functions when
we need to collect additional information.  That's not wonderful because
it ends up being slow, but it does work.

Trying to figure out what functions we need at getFuncs() time isn't
terribly easy.  On the other hand, it might not be *that* bad to just
pull in all functions, if we use the same technique that we use for
tables, and mark the ones we aren't dumping as not "interesting."

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

Robert Haas  writes:
> On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:
>> My other design-level complaint is that basing this on foreign keys is
>> fundamentally the wrong thing.  What actually matters is the unique index
>> underlying the FK; that is, if we have "a.x = b.y" and there's a
>> compatible unique index on b.y, we can conclude that no A row will match
>> more than one B row, whether or not an explicit FK relationship has been
>> declared.  So we should drive this off unique indexes instead of FKs,
>> first because we will find more cases, and second because the planner
>> already examines indexes and doesn't need any additional catalog lookups
>> to get the required data.  (IOW, the relcache additions that were made in
>> this patch series should go away too.)

> Without prejudice to anything else in this useful and detailed review,
> I have a question about this.  A unique index proves that no A row
> will match more than one B row, and I agree that deriving that from
> unique indexes is sensible.  However, ISTM that an FK provides
> additional information: we know that, modulo filter conditions on B,
> every A row will match *exactly* one row B row, which can prevent us
> from *underestimating* the size of the join product.  A unique index
> can't do that.

Very good point, but unless I'm missing something, that is not what the
current patch does.  I'm not sure offhand whether that's an important
estimation failure mode currently, or if it is whether it would be
sensible to try to implement that rule entirely separately from the "at
most one" aspect, or if it isn't sensible, whether that's a sufficiently
strong reason to confine the "at most one" logic to working only with FKs
and not with bare unique indexes.

On the whole, that seems like another argument why this needs more time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-04 Thread Jeff Janes

On Tue, May 3, 2016 at 4:05 PM, Andres Freund  wrote:
> Hi Jeff,
>
> On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:
>> I've bisected the errors I was seeing, discussed in
>> http://www.postgresql.org/message-id/CAMkU=1xqehc0ok4d+tkjfq1nvuho37pyrkhjp6q8oxifmx7...@mail.gmail.com
>>
>> It look like they first appear in:
>>
>> commit 48354581a49c30f5757c203415aa8412d85b0f70
>> Author: Andres Freund 
>> Date:   Sun Apr 10 20:12:32 2016 -0700
>>
>> Allow Pin/UnpinBuffer to operate in a lockfree manner.
>>
>>
>> I get the errors:
>>
>> ERROR:  attempted to delete invisible tuple
>> STATEMENT:  update foo set count=count+1,text_array=$1 where text_array @> $2
>>
>> And also:
>>
>> ERROR:  unexpected chunk number 1 (expected 2) for toast value
>> 85223889 in pg_toast_16424
>> STATEMENT:  update foo set count=count+1 where text_array @> $1
>
> Hm. I appear to have trouble reproducing this issue (continuing to try)
> on master as of 8826d8507.  Is there any chance you could package up a
> data directory after the issue hit?

I'll look into.  I haven't been saving them, as they are very large
(tens of GB) by the time the errors happen.  In case I can't find a
way to transfer that much data, is there something I could do in situ
to debug it?

>
>
>> (This was all run using Teodor's test-enabling patch
>> gin_alone_cleanup-4.patch, so as not to change horses in midstream.
>> Now that a version of that patch has been committed, I will try to
>> repeat this in HEAD)
>
> Any news on that front?

I couldn't reproduce it in 82881b2b432c9433b45a (which is what HEAD
was at the time).

The last commit I saw the problem in was 8f1911d5e6d5, and in that
commit it took longer than usual to see the error, and I never saw at
all in one run (which lead me down the wrong path in git bisect) but
then saw errors upon another try.  Up until that commit, it seemed to
give the errors like clockwork, always after 8 to 10 hours of running.

I also have never seen the errors with the crashing turned off.  I
even tried it with autovac off and
autovacuum_freeze_max_age=15 (to emulate the way vacuum never
gets a chance to run to completion in the crashing mode) and then I
don't get any errors up to the point I run out of disk space.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-05-04 Thread Peter Eisentraut


On 5/4/16 2:39 PM, Stephen Frost wrote:

These checks are looking for the functions used by the transform in the
list of functions that pg_dump has loaded, but in 9.5, we don't load any
of the function in pg_catalog, and even with my patches, we only dump
the functions in pg_catalog that have an ACL which has been changed from
the default.


This issue is not specific to transforms.  For example, if you create a 
user-defined cast using a built-in function, the cast won't be dumped. 
Obviously, this is a problem, but it needs a more general solution.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline following for logical slots

2016-05-04 Thread Alvaro Herrera

Alvaro Herrera wrote:

> Here's a proposed revert patch.  Many minor changes (mostly comment
> additions) that were applied as part of this series are kept intact.
> The test_slot_timeline test module and corresponding recovery test
> script are also reverted.

Pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

Hi,

On 2016-05-04 16:01:18 -0400, Robert Haas wrote:
> On Wed, May 4, 2016 at 3:51 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> The PostgreSQL 9.6 release management team has determined that there
> >> is insufficient consensus at this time to revert any of the patches
> >> mentioned in 
> >> http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
> >> because, with the exception of "snapshot too old", none of those
> >> patches have attracted more than a single vote to revert.  While
> >> "snapshot too old" has attracted three votes to revert (Tom, Bruce,
> >> Andres), one of those was on the grounds of not liking the feature i
> >> general rather than any specific problem with the implementation (Tom)
> >> and another gave no reason at all (Bruce).  When originally proposed,
> >> there was clear consensus that the feature was useful, so any revert
> >> should be on the grounds that the current implementation is flawed.
> >
> > ... which, indeed, is precisely what Andres is asserting, no?  I do
> > not understand your conclusion.
>
> Yes, and "asserting" is the right word, per my complaints in the first
> paragraph of:
>
> http://www.postgresql.org/message-id/ca+tgmoyxb8mx9qtmz-ytaael4svrvqe32yt66cwon3x7kbx...@mail.gmail.com

Uh. I *did* previously explain what I think was wrong (including quoting
the salient code), and I wasn't asked for further details. And the issue
is pretty obvious. I've the growing feeling that people simply aren't
bothering to actually look at what's been pointed out.

I also want to reiterate that I didn't immediately call for a revert,
initially - before recognizing the architectural issue - I offered to
write code to address the regressions due to the spinlocks.

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-05-04 Thread Tomas Vondra


Hi,

On 05/04/2016 08:54 PM, Tom Lane wrote:

David Rowley  writes:

On 4 May 2016 at 09:18, David Rowley  wrote:

On 4 May 2016 at 02:10, Tomas Vondra  wrote:

There are probably a few reasonably simple things we could do - e.g. ignore
foreign keys with just a single column, as the primary goal of the patch is
improving estimates with multi-column foreign keys. I believe that covers a
vast majority of foreign keys in the wild.



I've spent a few hours looking at this and I've come up with the
attached patch, which flags each ForeignKeyOptInfo to say whether its
possible to be referenced in any join condition, with the logic that
if the referenced relation is in the simple_rte_array, then it could
be referenced.


My fundamental problem with the FK-selectivity patch is that it looks
an awful lot like a preliminary proof-of-concept that got committed.

I do not like the basic design: it's about as brute-force as could
possibly be.  It adds code that is executed:

  * at least once per join relation created (hence, significantly more than
  the number of rels in the query; see also get_joinrel_parampathinfo)
* for each inner relation in the initial input joinrel pair
  * for each outer relation in the initial joinrel pair
* for each foreign key constraint on this inner and outer rel
  * for each key column in that FK
* for each join qual for the current input joinrel pair
  * for each member of the relevant EquivalenceClass

This is at least O(N^3) in the number of baserels in the query, not
to mention the other multipliers. I'm not very impressed by tests
that scale only one of the multipliers (like the number of FK
constraints); where the pain is going to come in is when all of these
factors are large.

I spent some time trying to make a test case that was impossibly
slow, without any really impressive result: I saw at most about a 25%
growth in planning time, for a 27-way join with one or two foreign
keys per table. I noted however that with a simple FROM list of
tables, you don't really get the full force of the combinatorial
explosion, because join_search_one_level prefers to generate
left-deep trees first, and so the first creation of a given joinrel
is always N left-side rels against 1 right-side rel, causing the
second level of looping to always iterate just once. (GEQO behaves
likewise, I think.) I spent a little time trying to devise join order
constraints that would result in a lot of high-level joinrels being
formed with many relations on both sides, which would cause both of
the second and third levels to iterate O(N) times not just once. I
didn't have much luck, but I have zero confidence that our users
won't find such cases.


Don't know. We haven't found such extreme example either.



The reason it's so brute force is that it's rediscovering the same
facts over and over. In all simple (non-outer-join) cases, the only
clauses that are of any interest are derived from EquivalenceClasses,
and all that you really need to know is "are the vars mentioned on
each side of this FK present in the same EquivalenceClass?". ISTM
that could be determined once per FK per query and cached in or near
the EC, not expensively rediscovered at each level of joining. I'm
not sure whether it'd be worth considering non-EC-derived equalities
(ie, outer join clauses) at all, and note that the current patch
fails to do so anyway; see below.


I'm not sure it's that simple, as it also depends on the join order, so 
if we only detect that once per query we'll get incorrect estimates for 
the intermediate results. I think the approach with cache proposed by 
David few days ago is the best approach.




My other design-level complaint is that basing this on foreign keys is
fundamentally the wrong thing.  What actually matters is the unique index
underlying the FK; that is, if we have "a.x = b.y" and there's a
compatible unique index on b.y, we can conclude that no A row will match
more than one B row, whether or not an explicit FK relationship has been
declared.  So we should drive this off unique indexes instead of FKs,
first because we will find more cases, and second because the planner
already examines indexes and doesn't need any additional catalog lookups
to get the required data.  (IOW, the relcache additions that were made in
this patch series should go away too.)


No, that's not what the patch does, and it can't use unique indexes 
instead. The patch improves estimation with multi-column foreign keys, 
when the join matches the constraint. Currently we treat the conditions 
as independent and multiply the estimated selectivities, completely 
ignoring the guarantee provided by the FK, which leads to significant 
under-estimates.


So when you have:

CREATE TABLE t1 (a1 INT, a2 INT, primary key (a1,a2));
CREATE TABLE t2 (b1 INT, b2 INT,
 FOREIGN KEY

Re: [HACKERS] what to revert

Hi,

On 2016-05-04 14:25:14 -0400, Robert Haas wrote:
> On Wed, May 4, 2016 at 12:42 PM, Andres Freund  wrote:
> > On 2016-05-04 13:35:02 -0300, Alvaro Herrera wrote:
> >> Honestly, I don't see any strong ground in which to base a revert threat
> >> for this feature.
> >
> > It's datastructures are badly designed. But releasing it there's no
> > pressure to fix that.  If this were an intrinsic cost - ok. But it's
> > not.
> 
> I don't want to rule out the possibility that you are right, because
> you're frequently right about lots of things.  However, you haven't
> provided much detail.  AFAIK, the closest you've come to articulating
> a case for reverting this patch is to say that the tax ought to be
> paid by the write-side, rather than the read-side.

I think I did go into some more detail than that, but let me explain
here:

My concern isn't performing checks at snapshot build time and at buffer
access time. That seems fairly reasonable.  My concern is that the
time->xid mapping used to determine the xid horizon is built at snapshot
access time. The relevant function for that is
MaintainOldSnapshotTimeMapping() called by GetSnapshotData(), if you
want to look at it.  Building the mapping when building the snapshot
requires that an exclusive lwlock is taken.  Given that there's very few
access patterns in which more xids are assigned than snapshots taken,
that just doesn't make much sense; even leaving the fact that adding an
exclusive lock in a function that's already one of the biggest
bottlenecks in postgres, isn't a good idea.

If we instead built the map somewhere around GetNewTransactionId() we'd
only pay the cost when advancing an xid. It'd also make it possible to
avoid another GetCurrentIntegerTimestamp() per snapshot, including the
acquiration of oldSnapshotControl->mutex_current. In addition the data
structure would be easier to maintain, because xids grow monotonically
(yes, yes wraparound, but that's not a problem) - atm we need somwehat
more complex logic to determine into which bucket an xid/timestamp pair
has to be mapped to.

So I'm really not just concerned with the performance, I think that's
just fallout from choosing the wrong data representation.

If you look at at bottom of
http://www.postgresql.org/message-id/20160413171955.i53me46fqqhdl...@alap3.anarazel.de
which shows performance numbers for old_snapshot_threshold=0
vs. old_snapshot_threshold=10. While that wasn't intended at the time,
old_snapshot_threshold=0 shows the cost of the feature without
maintaining the mapping, and old_snapshot_threshold=10 shows it while
building the mapping. Pretty dramatic difference - that's really the
cost of map maintenance.

FWIW, it's not just me doubting the data structure here, Ants did as
well.

> Surely nobody here is blind to the fact that holding back xmin often
> creates a lot of bloat for no reason - many or all of the retained old
> row versions may never be accessed.

Definitely. I *like* the feature.

> So I would like to hear more detail about why you think that the data
> structures are badly designed, and how they could be designed
> differently without changing what the patch does (which amounts to
> wishing that Kevin had implemented a different feature than the one
> you think he should have implemented).

Well, there'd be some minor differences what determines "too old" based
on the above, but I think it'd just be a bit easier to explain.

> >> It doesn't scale as well as we would like in the case
> >> where a high-level is fully stressed with a read-only load -- okay.  But
> >> that's unlikely to be a case where this feature is put to work.
> >
> > It'll be just the same in a read mostly workload, which is part of the
> > case for this feature.
> 
> So what?  SSI doesn't scale either, and nobody's saying we have to rip
> it out.  It's still useful to people.  pgbench is not the world.

Sure, pgbench is not the world.  But this isn't a particularly pgbench
specific issue.

> >> So I think accepting the promise that this feature would be improved
> >> in a future release to better support that case is good enough.
> >
> > I've not heard any such promise.
> 
> The question Alvaro is raising isn't whether such a promise has been
> made but whether it would suffice.  In fairness, such promises are not
> enforceable.  Kevin can promise to work on this and then be run over
> by a rogue Mr. Softee truck tomorrow, and the work won't get done; or
> he can go to work for some non-PostgreSQL-supporting company and
> vanish.

Sure, it's not a guarantee. But I think a "promise" (signed in blood of
course) of a senior contributor makes quite the difference.

> But personally, I generally agree with Alvaro's analysis.  If this
> patch is slow when turned on, and you don't like that, don't use it.

> If you need to use it and want it to scale better, then you can write
> a patch to make it do that. Nobody is more qualified than you.

I think that's what ticks me

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

On Wed, May 4, 2016 at 2:54 PM, Tom Lane  wrote:
> My other design-level complaint is that basing this on foreign keys is
> fundamentally the wrong thing.  What actually matters is the unique index
> underlying the FK; that is, if we have "a.x = b.y" and there's a
> compatible unique index on b.y, we can conclude that no A row will match
> more than one B row, whether or not an explicit FK relationship has been
> declared.  So we should drive this off unique indexes instead of FKs,
> first because we will find more cases, and second because the planner
> already examines indexes and doesn't need any additional catalog lookups
> to get the required data.  (IOW, the relcache additions that were made in
> this patch series should go away too.)

Without prejudice to anything else in this useful and detailed review,
I have a question about this.  A unique index proves that no A row
will match more than one B row, and I agree that deriving that from
unique indexes is sensible.  However, ISTM that an FK provides
additional information: we know that, modulo filter conditions on B,
every A row will match *exactly* one row B row, which can prevent us
from *underestimating* the size of the join product.  A unique index
can't do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

2016-05-04 Thread Joshua D. Drake


On 05/04/2016 01:03 PM, Robert Haas wrote:

On Wed, May 4, 2016 at 4:00 PM, Joshua D. Drake  wrote:

Just my .02, pretty sure the majority of the community says, "TGL just sent
-1, argument over." That may or may not be a good thing but his experience
and depth of knowledge of our code base pretty much seals it for most of us.


Sure, but Tom is also human, and sometimes doesn't like things that
other people still think are good ideas.  Tom questioned whether
parallel query was really a good idea, and also Hot Standby.  If we'd


Like you said, he is human ;)


given up on having those features when Tom opened his mouth, we'd be
worse off today.

That is not to say that we shouldn't defer to Tom's judgement in many
cases.  And I do.


Right and I agree with that. I was just saying that in this particular 
case, I think a lot of people were just accepting on the single -1.


JD






--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Hi,

On 2016-05-04 15:54:32 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, May 4, 2016 at 3:35 PM, Tom Lane  wrote:
> >> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
> >> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
> >> has it).  As long as we have an answer for Windows, it's hard to argue
> >> we can't require poll() elsewhere.

Yea, I think that's the case. I just ran the query for a longer period,
and there's not been any report since 2014 of an !windows animal without
poll support.


> > I don't think we'd necessarily need to completely de-support people
> > who still depend on select().  We'd just need to say, well,
> > WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
> > may not, depending on how modern your platform is.  In the use cases I
> > foresee, that would occasionally result in less-timely detection of
> > FDW connection loss, but nothing worse.  I'm not prepared to get very
> > excited about that.
> 
> I'm not either, but ...

I'm not worried about slightly degraded capabilities there either, more
about the (lack of) ability to test that reasonably.


> > But if we are confident that everything supports poll() and it's
> > always better than select(), another, possibly superior option is to
> > remove support for select() and see if anything breaks.  If not, then
> > we only need to support three platform-specific implementations
> > instead of four, which I would find it difficult to complain about.
> 
> ... the evidence available suggests that the select() code path has
> probably received zero buildfarm testing.  Do we really want to ship
> a fourth implementation that we can't even vouch for?

I mean I added code to make it easier to test the select path in 9.6,
but yea, outside of my machines running the latest linux kernel there
is, that code path appears to have had barely any testing over the last
few years.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

On Wed, May 4, 2016 at 3:54 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 4, 2016 at 3:35 PM, Tom Lane  wrote:
>>> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
>>> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
>>> has it).  As long as we have an answer for Windows, it's hard to argue
>>> we can't require poll() elsewhere.
>
>> I don't think we'd necessarily need to completely de-support people
>> who still depend on select().  We'd just need to say, well,
>> WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
>> may not, depending on how modern your platform is.  In the use cases I
>> foresee, that would occasionally result in less-timely detection of
>> FDW connection loss, but nothing worse.  I'm not prepared to get very
>> excited about that.
>
> I'm not either, but ...
>
>> But if we are confident that everything supports poll() and it's
>> always better than select(), another, possibly superior option is to
>> remove support for select() and see if anything breaks.  If not, then
>> we only need to support three platform-specific implementations
>> instead of four, which I would find it difficult to complain about.
>
> ... the evidence available suggests that the select() code path has
> probably received zero buildfarm testing.  Do we really want to ship
> a fourth implementation that we can't even vouch for?

I'm more than happy to rip it out, either now or after the tree opens
for 9.7 development.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

On Wed, May 4, 2016 at 4:00 PM, Joshua D. Drake  wrote:
> Just my .02, pretty sure the majority of the community says, "TGL just sent
> -1, argument over." That may or may not be a good thing but his experience
> and depth of knowledge of our code base pretty much seals it for most of us.

Sure, but Tom is also human, and sometimes doesn't like things that
other people still think are good ideas.  Tom questioned whether
parallel query was really a good idea, and also Hot Standby.  If we'd
given up on having those features when Tom opened his mouth, we'd be
worse off today.

That is not to say that we shouldn't defer to Tom's judgement in many
cases.  And I do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

On Wed, May 4, 2016 at 3:51 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> The PostgreSQL 9.6 release management team has determined that there
>> is insufficient consensus at this time to revert any of the patches
>> mentioned in 
>> http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
>> because, with the exception of "snapshot too old", none of those
>> patches have attracted more than a single vote to revert.  While
>> "snapshot too old" has attracted three votes to revert (Tom, Bruce,
>> Andres), one of those was on the grounds of not liking the feature i
>> general rather than any specific problem with the implementation (Tom)
>> and another gave no reason at all (Bruce).  When originally proposed,
>> there was clear consensus that the feature was useful, so any revert
>> should be on the grounds that the current implementation is flawed.
>
> ... which, indeed, is precisely what Andres is asserting, no?  I do
> not understand your conclusion.

Yes, and "asserting" is the right word, per my complaints in the first
paragraph of:

http://www.postgresql.org/message-id/ca+tgmoyxb8mx9qtmz-ytaael4svrvqe32yt66cwon3x7kbx...@mail.gmail.com

> If the threshold is "more than one vote to revert", I'm sure that can
> be arranged.  For the most part I think people have assumed that if
> one senior hacker complains about something, it's not really necessary
> for other people to duplicate that person's review effort.  We don't
> have a surplus of manpower available for such things, and I believe
> most of us are going flat out right now anyway trying to get ready
> for beta.  Duplicate reviews are hard to come by.

I don't think that >1 person saying something necessarily constitutes
a consensus, but 1 person saying something and 1 other person offering
a counter-argument sure doesn't.  I have not yet read
http://www.postgresql.org/message-id/16807.1462388...@sss.pgh.pa.us
and that may shed a different light on the situation, but as of a few
minutes ago the arguments thus far advanced for reverting any of the
patches I mentioned in
http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
had convinced zero of three RMT members.

I believe that it is, and has long been, the community's general
policy that in doubtful cases, the discretion of the relevant
committer prevails.  This may or may not be the best policy, but it is
usually how we operate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

2016-05-04 Thread Joshua D. Drake

On 05/04/2016 12:51 PM, Tom Lane wrote:

Robert Haas writes:

The PostgreSQL 9.6 release management team has determined that there
is insufficient consensus at this time to revert any of the patches
mentioned in
http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
because, with the exception of "snapshot too old", none of those
patches have attracted more than a single vote to revert. While
"snapshot too old" has attracted three votes to revert (Tom, Bruce,
Andres), one of those was on the grounds of not liking the feature i
general rather than any specific problem with the implementation (Tom)
and another gave no reason at all (Bruce). When originally proposed,
there was clear consensus that the feature was useful, so any revert
should be on the grounds that the current implementation is flawed.

... which, indeed, is precisely what Andres is asserting, no? I do
not understand your conclusion.

If the threshold is "more than one vote to revert", I'm sure that can
be arranged. For the most part I think people have assumed that if
one senior hacker complains about something, it's not really necessary
for other people to duplicate that person's review effort. We don't
have a surplus of manpower available for such things, and I believe
most of us are going flat out right now anyway trying to get ready
for beta. Duplicate reviews are hard to come by.

Just my .02, pretty sure the majority of the community says, "TGL just
sent -1, argument over." That may or may not be a good thing but his
experience and depth of knowledge of our code base pretty much seals it
for most of us.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Robert Haas  writes:
> On Wed, May 4, 2016 at 3:35 PM, Tom Lane  wrote:
>> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
>> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
>> has it).  As long as we have an answer for Windows, it's hard to argue
>> we can't require poll() elsewhere.

> I don't think we'd necessarily need to completely de-support people
> who still depend on select().  We'd just need to say, well,
> WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
> may not, depending on how modern your platform is.  In the use cases I
> foresee, that would occasionally result in less-timely detection of
> FDW connection loss, but nothing worse.  I'm not prepared to get very
> excited about that.

I'm not either, but ...

> But if we are confident that everything supports poll() and it's
> always better than select(), another, possibly superior option is to
> remove support for select() and see if anything breaks.  If not, then
> we only need to support three platform-specific implementations
> instead of four, which I would find it difficult to complain about.

... the evidence available suggests that the select() code path has
probably received zero buildfarm testing.  Do we really want to ship
a fourth implementation that we can't even vouch for?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release management team statement on patch reverts

Robert Haas  writes:
> The PostgreSQL 9.6 release management team has determined that there
> is insufficient consensus at this time to revert any of the patches
> mentioned in 
> http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
> because, with the exception of "snapshot too old", none of those
> patches have attracted more than a single vote to revert.  While
> "snapshot too old" has attracted three votes to revert (Tom, Bruce,
> Andres), one of those was on the grounds of not liking the feature i
> general rather than any specific problem with the implementation (Tom)
> and another gave no reason at all (Bruce).  When originally proposed,
> there was clear consensus that the feature was useful, so any revert
> should be on the grounds that the current implementation is flawed.

... which, indeed, is precisely what Andres is asserting, no?  I do
not understand your conclusion.

If the threshold is "more than one vote to revert", I'm sure that can
be arranged.  For the most part I think people have assumed that if
one senior hacker complains about something, it's not really necessary
for other people to duplicate that person's review effort.  We don't
have a surplus of manpower available for such things, and I believe
most of us are going flat out right now anyway trying to get ready
for beta.  Duplicate reviews are hard to come by.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

On Wed, May 4, 2016 at 3:35 PM, Tom Lane  wrote:
> I wrote:
>> Andres Freund  writes:
>>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>>> usual baseline - and windows is not an issue for latch, I think it'd
>>> be ok to rely on it.
>
>> I think it's entirely reasonable to say that "if you want high performance
>> you should have poll(3)".  Failing to build without it would be a harder
>> sell, probably.
>
> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
> has it).  As long as we have an answer for Windows, it's hard to argue
> we can't require poll() elsewhere.

I don't think we'd necessarily need to completely de-support people
who still depend on select().  We'd just need to say, well,
WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
may not, depending on how modern your platform is.  In the use cases I
foresee, that would occasionally result in less-timely detection of
FDW connection loss, but nothing worse.  I'm not prepared to get very
excited about that.

But if we are confident that everything supports poll() and it's
always better than select(), another, possibly superior option is to
remove support for select() and see if anything breaks.  If not, then
we only need to support three platform-specific implementations
instead of four, which I would find it difficult to complain about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] release management team statement on patch reverts

The PostgreSQL 9.6 release management team has determined that there
is insufficient consensus at this time to revert any of the patches
mentioned in 
http://www.postgresql.org/message-id/CA+TgmoYOWTtBQEL+Bv=w93bvUjbXSUw3uGnp+R29dduZ==8...@mail.gmail.com
because, with the exception of "snapshot too old", none of those
patches have attracted more than a single vote to revert.  While
"snapshot too old" has attracted three votes to revert (Tom, Bruce,
Andres), one of those was on the grounds of not liking the feature i
general rather than any specific problem with the implementation (Tom)
and another gave no reason at all (Bruce).  When originally proposed,
there was clear consensus that the feature was useful, so any revert
should be on the grounds that the current implementation is flawed.

The release management team notes that, while all of these patches
have multiple reported issues, it currently appears that all of those
issues are being worked on by authors of those patches.  Many of those
issues already have proposed patches, and those patches should be
given due consideration before proceeding with a revert.

The release management team encourages senior hackers and committers
to study these patches and the defects reported against them in detail
and to offer further opinions on whether fixing or reverting is more
appropriate in each case.  The technical reasons for those judgements
should be set forth in detail so that everyone can understand why the
patch is or is not a major risk to the stability of PostgreSQL 9.6.

-- 
Robert Haas
PostgreSQL 9.6 Release Management Team


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

I wrote:
> Andres Freund  writes:
>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>> usual baseline - and windows is not an issue for latch, I think it'd
>> be ok to rely on it.

> I think it's entirely reasonable to say that "if you want high performance
> you should have poll(3)".  Failing to build without it would be a harder
> sell, probably.

Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
been our minimum baseline spec for a long time (even my pet dinosaur HPUX
has it).  As long as we have an answer for Windows, it's hard to argue
we can't require poll() elsewhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

2016-05-04 Thread Merlin Moncure

On Wed, May 4, 2016 at 2:31 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>> usual baseline - and windows is not an issue for latch, I think it'd
>> be ok to rely on it.
>
> I think it's entirely reasonable to say that "if you want high performance
> you should have poll(3)".  Failing to build without it would be a harder
> sell, probably.

There are some decent cross platform libraries that expose high
performance polling.  In particular, libev.
http://software.schmorp.de/pkg/libev.html.  I would definitely look
there before contemplating direct epoll calls.  (note, aside from
seeing epoll, I haven't been following this thread in detail).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Andres Freund  writes:
> Given that poll() has been introduced in SRV3 - which IIRC was below our
> usual baseline - and windows is not an issue for latch, I think it'd
> be ok to rely on it.

I think it's entirely reasonable to say that "if you want high performance
you should have poll(3)".  Failing to build without it would be a harder
sell, probably.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying WaitEventSets (was: Performance degradation in commit ac1d794)

Hi,

On 2016-05-03 23:09:28 -0400, Robert Haas wrote:
> So what's the best API for that?  One idea is to change
> ModifyWaitEvent to accept events = 0 instead of failing an assertion
> inside WaitEventAdjustEpoll.  We don't want to wait for EPOLLERR |
> EPOLLHUP in that case since we'd have to wait event to return if one
> of those things occurred.  It would be easy enough to rejigger that
> code so that we pass epoll_ev.events as 0 in that case, but I think it
> doesn't help: epoll_ctl(2) says epoll_wait(2) always waits for those
> events.  Instead I think we'd have to use EPOLL_CTL_DEL in that case,
> which is a problem: when the user next calls ModifyWaitEvent, we would
> need to use EPOLL_CTL_ADD rather than EPOLL_CTL_MOD, but how would we
> know that?  We'd have to add state for that somewhere.

Right.

> Yet another idea is to have a new event WL_SOCKET_ERROR which waits
> for only EPOLLERR | EPOLLHUP.  So, if you don't care about the socket
> being readable or writeable at the moment, but you still vaguely care
> about the file descriptor, you can do ModifyWaitEvent(set, pos,
> WL_SOCKET_ERROR, NULL).  That's actually kind of nice, because now you
> can still detect error/eof conditions on sockets that you are not
> otherwise waiting for, and report the errors promptly.  At the moment,
> I'm inclined to think this might be the best option...

I generally agree that this is the best option, but there's a noticeable
catch: I'm afraid select() doesn't really generally support this kind of
operation; all the other providers do.

But after trawling the buildfarm logs, that isn't necessarily a problem:
It appears that we haven't had a single !windows animal reporting a
successfull configure run without HAVE_POLL support this year going by
the buildfarm status logs. Running for a longer period now, but it'll
take a long while.

It appears that OSX with 10.3 was one of the latest to introduce poll()
support.

Given that poll() has been introduced in SRV3 - which IIRC was below our
usual baseline - and windows is not an issue for latch, I think it'd
be ok to rely on it.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

On Wed, May 4, 2016 at 1:25 PM, Robert Haas  wrote:

> If somebody had even hinted that such a problem might exist, Kevin
> probably would have fixed it before commit, but nobody did.  As soon
> as it was raised, Kevin started working on it.  That's about all you
> can ask, I think.

Right; I have not been ignoring the issue -- but I prioritized it
below fixing correctness issues and performance issues when the
feature is off.  Since there are no known issues in either of those
areas remaining once I push the patch I posted earlier today, I'm
taking a close look at the three proposals from three different
people about ways to address it (along with any other ideas that
come to mind while working through those).  Fortunately, the access
problems to the EDB big NUMA machines have now been solved (by
tweaking firewall settings), so I should have some sort of
meaningful benchmarks of those three alternatives and anything else
the comes to mind within a few days.  (Until now I have been asking
others to do some runs, which doesn't gather the data nearly as
quickly as actually having access.)

Amit's proof-of-concept patch is very small and safe and yielded a
3x to 4x performance improvement with the old_snapshot_threshold =
1 on a big NUMA machine with concurrency in the 32 to 64 client
range.  I don't know whether we can do better before beta1, but it
is something.  I'll be happy to try any other suggestions.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Remove regress-python3-mangle.mk

2016-05-04 Thread Peter Eisentraut


On 4/19/16 10:23 PM, Noah Misch wrote:

3. Because we use sed we do not tests for plpython3 under Windows. And I
> have trouble with CMake too.

Even if removing regress-python3-mangle.mk happened to be good for PL/Python,
we need build systems flexible enough to implement steps like it without a
struggle.  Our current build systems, the src/tools/msvc system and the GNU
make system, have that flexibility.  We could port regress-python3-mangle.mk
to Perl and run it on Windows; it just hasn't been a priority.


Well, you could write the mangling script in Python, but that's harder 
than it sounds.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> * debugger ability to print variables symbolically
> 
> > I might be misunderstanding what you're getting at here, but if you want
> > to be able to use #define'd values using their name, you can get that by
> > compiling with -g3.  With -g3 and gdb, you can do things like:
> 
> > (gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT |
> > DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL)
> 
> > where all the DUMP_COMPONENT items are #define's.
> 
> Yes, but if you just do "p tblinfo[i].dobj.dump", you're only going to get
> a number, right?  The value-added for an enum type is that the debugger
> knows which symbol to substitute for a numeric value when printing.  But
> that stops working if what's in the variable is some OR of the values the
> debugger knows about.

Ah, yeah, that's true.  -g3, unsurprisingly, doesn't help with
displaying a random int value using some set of #define's.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> * debugger ability to print variables symbolically

> I might be misunderstanding what you're getting at here, but if you want
> to be able to use #define'd values using their name, you can get that by
> compiling with -g3.  With -g3 and gdb, you can do things like:

> (gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT |
> DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL)

> where all the DUMP_COMPONENT items are #define's.

Yes, but if you just do "p tblinfo[i].dobj.dump", you're only going to get
a number, right?  The value-added for an enum type is that the debugger
knows which symbol to substitute for a numeric value when printing.  But
that stops working if what's in the variable is some OR of the values the
debugger knows about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

David Rowley  writes:
> On 4 May 2016 at 09:18, David Rowley  wrote:
>> On 4 May 2016 at 02:10, Tomas Vondra  wrote:
>>> There are probably a few reasonably simple things we could do - e.g. ignore
>>> foreign keys with just a single column, as the primary goal of the patch is
>>> improving estimates with multi-column foreign keys. I believe that covers a
>>> vast majority of foreign keys in the wild.

> I've spent a few hours looking at this and I've come up with the
> attached patch, which flags each ForeignKeyOptInfo to say whether its
> possible to be referenced in any join condition, with the logic that
> if the referenced relation is in the simple_rte_array, then it could
> be referenced.

My fundamental problem with the FK-selectivity patch is that it looks
an awful lot like a preliminary proof-of-concept that got committed.

I do not like the basic design: it's about as brute-force as could
possibly be.  It adds code that is executed:

  * at least once per join relation created (hence, significantly more than
  the number of rels in the query; see also get_joinrel_parampathinfo)
* for each inner relation in the initial input joinrel pair
  * for each outer relation in the initial joinrel pair
* for each foreign key constraint on this inner and outer rel
  * for each key column in that FK
* for each join qual for the current input joinrel pair
  * for each member of the relevant EquivalenceClass

This is at least O(N^3) in the number of baserels in the query, not to
mention the other multipliers.  I'm not very impressed by tests that scale
only one of the multipliers (like the number of FK constraints); where the
pain is going to come in is when all of these factors are large.

I spent some time trying to make a test case that was impossibly slow,
without any really impressive result: I saw at most about a 25% growth in
planning time, for a 27-way join with one or two foreign keys per table.
I noted however that with a simple FROM list of tables, you don't really
get the full force of the combinatorial explosion, because
join_search_one_level prefers to generate left-deep trees first, and so
the first creation of a given joinrel is always N left-side rels against 1
right-side rel, causing the second level of looping to always iterate just
once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
devise join order constraints that would result in a lot of high-level
joinrels being formed with many relations on both sides, which would cause
both of the second and third levels to iterate O(N) times not just once.
I didn't have much luck, but I have zero confidence that our users won't
find such cases.

The reason it's so brute force is that it's rediscovering the same facts
over and over.  In all simple (non-outer-join) cases, the only clauses
that are of any interest are derived from EquivalenceClasses, and all
that you really need to know is "are the vars mentioned on each side
of this FK present in the same EquivalenceClass?".  ISTM that could be
determined once per FK per query and cached in or near the EC, not
expensively rediscovered at each level of joining.  I'm not sure whether
it'd be worth considering non-EC-derived equalities (ie, outer join
clauses) at all, and note that the current patch fails to do so anyway;
see below.

My other design-level complaint is that basing this on foreign keys is
fundamentally the wrong thing.  What actually matters is the unique index
underlying the FK; that is, if we have "a.x = b.y" and there's a
compatible unique index on b.y, we can conclude that no A row will match
more than one B row, whether or not an explicit FK relationship has been
declared.  So we should drive this off unique indexes instead of FKs,
first because we will find more cases, and second because the planner
already examines indexes and doesn't need any additional catalog lookups
to get the required data.  (IOW, the relcache additions that were made in
this patch series should go away too.)

Aside from the design being basically wrong, the code quality seems pretty
low.  Aside from the missing IsA check that started this thread, I found
the following problems in a quick once-over of 137805f89:

Bugs in quals_match_foreign_key():

* Test for clause->opno == fkinfo->conpfeqop[i] fails to consider
cross-type operators, ie, what's in the clause might be int2 = int4
while the conpfeqop is int4 = int2.

* parent_ec check isn't really the right thing, since EC-derived clauses
might not have that set.  I think it may be adequate given that you only
deal with simple Vars, but at least a comment about that would be good.

* Come to think of it, you could probably dodge the commutator operator
problem altogether for clauses with nonnull parent_ec, because they must
contain a relevant equality operator.  (Although if it's redesigned as I

[HACKERS] pg_dump vs. TRANSFORMs

Greetings all,

While testing pg_dump, I discovered that there seems to be an issue when
it comes to TRANSFORMs.  I'll be the first to admit that I'm not
terribly familiar with transforms, but I do know that if you create one
using functions from pg_catalog (as our regression tests do), the CREATE
TRANSFORM statement isn't dumped out and therefore we don't test the
dump/restore of transforms, even with our current pg_upgrade test
process.

In the regression tests, we do:

CREATE TRANSFORM FOR int LANGUAGE SQL (
FROM SQL WITH FUNCTION varchar_transform(internal),
TO SQL WITH FUNCTION int4recv(internal));

but if you do that and then run pg_dump, you don't see any mention of
any transforms in the dump file.

This is because there's a couple of checks towards the top of
dumpTransform():

/* Cannot dump if we don't have the transform functions' info */
if (OidIsValid(transform->trffromsql))
{   
fromsqlFuncInfo = findFuncByOid(transform->trffromsql);
if (fromsqlFuncInfo == NULL)
return;
}
if (OidIsValid(transform->trftosql))
{   
tosqlFuncInfo = findFuncByOid(transform->trftosql);
if (tosqlFuncInfo == NULL)
return;
}

These checks are looking for the functions used by the transform in the
list of functions that pg_dump has loaded, but in 9.5, we don't load any
of the function in pg_catalog, and even with my patches, we only dump
the functions in pg_catalog that have an ACL which has been changed from
the default.

Given that there are lots of functions in pg_catalog, we probably don't
want to just load them all, all the time, but what we should probably do
is grab any functions which are depended on by any transforms.  Or we
could change dumpTransform() to actually issue a query to get the
function information which it needs when we've decided to dump a given
transform.

For my 2c, this is a bug that needs to be fixed and back-patched to 9.5.

I'm going to continue working on my pg_dump test suite for now, but I
can look at fixing this after PGCon if no one else fixes it first.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] what to revert

On Wed, May 4, 2016 at 12:42 PM, Andres Freund  wrote:
> On 2016-05-04 13:35:02 -0300, Alvaro Herrera wrote:
>> Honestly, I don't see any strong ground in which to base a revert threat
>> for this feature.
>
> It's datastructures are badly designed. But releasing it there's no
> pressure to fix that.  If this were an intrinsic cost - ok. But it's
> not.

I don't want to rule out the possibility that you are right, because
you're frequently right about lots of things.  However, you haven't
provided much detail.  AFAIK, the closest you've come to articulating
a case for reverting this patch is to say that the tax ought to be
paid by the write-side, rather than the read-side.   I don't know
exactly what that means, though.  Snapshots are not stored in shared
memory; writers can't iterate through all snapshots and whack the ones
that are problematic - and even if they could, they can't know what
tuples will be read in the future, which determines whether or not
there is an actual problem.  We could dispense with a lot of this
machinery if we simply killed off transactions or sessions that stuck
around too long, but the whole point of this feature is to avoid
having to do that when it isn't really necessary.  Surely nobody here
is blind to the fact that holding back xmin often creates a lot of
bloat for no reason - many or all of the retained old row versions may
never be accessed.  So I would like to hear more detail about why you
think that the data structures are badly designed, and how they could
be designed differently without changing what the patch does (which
amounts to wishing that Kevin had implemented a different feature than
the one you think he should have implemented).

>> It doesn't scale as well as we would like in the case
>> where a high-level is fully stressed with a read-only load -- okay.  But
>> that's unlikely to be a case where this feature is put to work.
>
> It'll be just the same in a read mostly workload, which is part of the
> case for this feature.

So what?  SSI doesn't scale either, and nobody's saying we have to rip
it out.  It's still useful to people.  pgbench is not the world.

>> So I think accepting the promise that this feature would be improved
>> in a future release to better support that case is good enough.
>
> I've not heard any such promise.

The question Alvaro is raising isn't whether such a promise has been
made but whether it would suffice.  In fairness, such promises are not
enforceable.  Kevin can promise to work on this and then be run over
by a rogue Mr. Softee truck tomorrow, and the work won't get done; or
he can go to work for some non-PostgreSQL-supporting company and
vanish.  I hope neither of those things happens, but if we accept a
promise to improve it, it's going to be contingent on certain things
happening or not happening which are beyond the ability of the
PostgreSQL community to effect or forestall.  (FWIW, I believe I can
safely say that EnterpriseDB will support his continued work on this
feature for as long as the community has concerns about it and Kevin
works for EnterpriseDB.)

But personally, I generally agree with Alvaro's analysis.  If this
patch is slow when turned on, and you don't like that, don't use it.
If you need to use it and want it to scale better, then you can write
a patch to make it do that.  Nobody is more qualified than you.  I
think it's a show-stopper if the patch regresses performance more than
trivially when turned off, but we need fresh test results before
reaching a conclusion about that.  I also think it's a show-stopper if
you can hold out specific design issues which we can generally agree
are signs of serious flaws in Kevin's thinking.  I do not think it's a
show-stopper if you wish he'd worked harder on the performance under
heavy concurrency with very short transactions.  You could have raised
that issue at any time, but instead waited until after feature freeze.
If somebody had even hinted that such a problem might exist, Kevin
probably would have fixed it before commit, but nobody did.  As soon
as it was raised, Kevin started working on it.  That's about all you
can ask, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic pin/unpin causing errors

On 2016-05-04 18:12:45 +0300, Teodor Sigaev wrote:
> > > I get the errors:
> > > 
> > > ERROR:  attempted to delete invisible tuple
> > > STATEMENT:  update foo set count=count+1,text_array=$1 where text_array 
> > > @> $2
> > > 
> > > And also:
> > > 
> > > ERROR:  unexpected chunk number 1 (expected 2) for toast value
> > > 85223889 in pg_toast_16424
> > > STATEMENT:  update foo set count=count+1 where text_array @> $1
> > 
> > Hm. I appear to have trouble reproducing this issue (continuing to try)
> > on master as of 8826d8507.  Is there any chance you could package up a
> > data directory after the issue hit?
> 
> I've got
> ERROR:  unexpected chunk number 0 (expected 1) for toast value 10192986 in
> pg_toast_16424
> 
> The test required 10 hours to run on my notebook. postgresql was compiled
> with -O0 --enable-debug --enable-cassert.

Interesting. I just ran a test for a good bit longer, till it failed due
to an nearing wraparound. Presumably because the crashes are too
frequent to finish vacuuming.

I did however, because Jeff said he coulnd't reproduce with cassert, use
an optimized build.  Wonder if there's some barrier related issue,
making this dependant on the compiler's exact code generation. That'd
explain why different people can reproduce it in different
circumstances.

Any chance you could package up that data directory for me to download?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> * debugger ability to print variables symbolically

I might be misunderstanding what you're getting at here, but if you want
to be able to use #define'd values using their name, you can get that by
compiling with -g3.  With -g3 and gdb, you can do things like:

(gdb) p tblinfo[i].dobj.dump & ~(DUMP_COMPONENT_COMMENT |
DUMP_COMPONENT_SECLABEL | DUMP_COMPONENT_USERMAP | DUMP_COMPONENT_ACL)

where all the DUMP_COMPONENT items are #define's.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

Fabien COELHO  writes:
> I understood the point and I do not see real disadvantages. The C standard 
> really says that an enum is an int, and compilers just do that.

No, it doesn't say that, and compilers don't just do that.  A compiler
is specifically allowed to store an enum in char or short if the enum's
declared values would all fit into that width.  (Admittedly, if you're
choosing the values as powers of 2, an OR of them would still fit; but
if you think "oh, an enum is just an int", you will get burned.)

More to the point, once you allow OR'd values then none of the practical
benefits of an enum still hold good.  The typical things that I rely on
in an enum that you don't get from a collection of #define's are:

* compiler warnings if you forget some members of the enum in a switch

* debugger ability to print variables symbolically

Those benefits both go up in smoke as soon as you allow OR'd values.
At that point you might as well use the #defines rather than playing
language lawyer about whether what you're doing meets the letter of
the spec.  I note that C99 specifically mentions this as something
a compiler might warn about:

 -- A  value  is  given to an object of an enumeration type
other than by assignment  of  an  enumeration  constant
that  is  a  member  of  that  type,  or an enumeration
variable that has the same type,  or  the  value  of  a
function   that   returns  the  same  enumeration  type
(6.7.2.2).

which certainly looks like they don't consider
"enumvar = ENUMVAL1 | ENUMVAL2" to be strictly kosher.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

Stas Kelvich  writes:
>> On 04 May 2016, at 16:58, Tom Lane  wrote:
>> The other ones are not so problematic because they do not conflict with
>> SQL keywords.  It's only delete() and filter() that scare me.

> Okay, so changed functions to ts_setweight, ts_delete, ts_unnest, ts_filter.

Somehow, I don't think you read what I wrote.

Renaming the pre-existing setweight() function to ts_setweight() is
not going to happen; it's been like that for half a dozen years now.
It would make no sense to call the new variant ts_setweight() while
keeping setweight() for the existing function, either.

I also don't see that much point in ts_unnest(), since unnest()
in our implementation is a function not a keyword.  I don't have
a strong opinion about that one, though.

Also, I'd supposed that we'd rename to tsvector_something, since
the same patch also introduced tsvector_to_array() and
array_to_tsvector().  What's the motivation for using ts_ as the
prefix?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

Hi,

On 2016-05-04 13:35:02 -0300, Alvaro Herrera wrote:
> Honestly, I don't see any strong ground in which to base a revert threat
> for this feature.

It's datastructures are badly designed. But releasing it there's no
pressure to fix that.  If this were an intrinsic cost - ok. But it's
not.


> It doesn't scale as well as we would like in the case
> where a high-level is fully stressed with a read-only load -- okay.  But
> that's unlikely to be a case where this feature is put to work.

It'll be just the same in a read mostly workload, which is part of the
case for this feature.


> So I think accepting the promise that this feature would be improved
> in a future release to better support that case is good enough.

I've not heard any such promise.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

2016-05-04 Thread Alvaro Herrera

Andres Freund wrote:
> On 2016-05-04 00:01:20 +0300, Ants Aasma wrote:
> > On Tue, May 3, 2016 at 9:57 PM, Tomas Vondra
> >  wrote:
> > > If you tell me how to best test it, I do have a 4-socket server sitting 
> > > idly
> > > in the corner (well, a corner reachable by SSH). I can get us some 
> > > numbers,
> > > but I haven't been following the snapshot_too_old so I'll need some 
> > > guidance
> > > on what to test.
> > 
> > I worry about two contention points with the current implementation.
> > 
> > The main one is the locking within MaintainOldSnapshotTimeMapping()
> > that gets called every time a snapshot is taken. AFAICS this should
> > show up by setting old_snapshot_threshold to any positive value and
> > then running a simple within shared buffers scale factor read only
> > pgbench at high concurrency (number of CPUs or a small multiple). On a
> > single socket system this does not show up.
> 
> On a two socket system it does, check the bottom of:
> http://archives.postgresql.org/message-id/20160413171955.i53me46fqqhdlztq%40alap3.anarazel.de
> Note that this (accidentally) compares thresholds 0 and 10 (instead of
> -1 and 10),

In other words, we expect that when the feature is disabled, no
performance degradation occurs, because that function is not called at
all.

Honestly, I don't see any strong ground in which to base a revert threat
for this feature.  It doesn't scale as well as we would like in the case
where a high-level is fully stressed with a read-only load -- okay.  But
that's unlikely to be a case where this feature is put to work.  So I
think accepting the promise that this feature would be improved in a
future release to better support that case is good enough.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-05-04 Thread Fabien COELHO



Hello Tom,


There's not really a point in using an enum if you use neither the type
(which you shouldn't because if you or the bitmask value you have types
outside the range of the enum), nor the autogenerated numbers.



I do not think that there is such a constraint in C, you can use the enum
bitfield type, so you should.


I think you are failing to understand Andres' point.  If you're going
to OR together some bits, the result is no longer a member of the enum
type, and the advantages that an enum would have immediately turn into
disadvantages.


I understood the point and I do not see real disadvantages. The C standard 
really says that an enum is an int, and compilers just do that. I see it 
as a matter of interpretation whether enum members are strictly allowed 
values or just allowed bits, but maybe the standard says otherwise.


Anyway, the good news is that the bug is now fixed.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what to revert

2016-05-04 Thread Alvaro Herrera

Craig Ringer wrote:
> On 4 May 2016 at 13:03, Euler Taveira  wrote:
> 
> > Question is: is the actual code so useless that it can't even be a 1.0
> > release?
> 
> What's committed suffers from a design problem and cannot work correctly,
> nor can it be fixed without an API change and significant revision. The
> revised version I posted yesterday is that fix, but it's new code just
> before beta1. It's pretty much equivalent to reverting the original patch
> and then adding a new, corrected implementation. If considered as a new
> feature it'd never be accepted at this stage of the release.

To make it worse, we don't have test code for a portion of the new
functionality: it turned out that the test module only tests half of it.
And in order to test the other half, we have a pending patch for some
pg_recvlogical changes, but we still don't have the actual test script.
So we would need to

1. commit the pg_recvlogical patch, assuming it's OK now.
2. write the test script to use that
3. commit the fix patch written a few days ago (which is still
unreviewed).

We could also commit the fix without the test, but that doesn't seem a
great idea.

As Craig, I am not happy with this outcome.  But realistically I think
it's the best decision at this point.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-04 Thread Stas Kelvich

> On 04 May 2016, at 16:58, Tom Lane  wrote:
> 
> Stas Kelvich  writes:
>>> On 03 May 2016, at 00:59, David Fetter  wrote:
>>> I suspect that steering that ship would be a good idea starting with
>>> deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?
> 
>> In 9.5 there already were tsvector functions length(), numnode(), strip()
> 
>> Recent commit added setweight(), delete(), unnest(), tsvector_to_array(), 
>> array_to_tsvector(), filter().
> 
>> Last bunch can be painlessly renamed, for example to ts_setweight, 
>> ts_delete, ts_unnest, ts_filter.
> 
>> The question is what to do with old ones? Leave them as is? Rename to ts_* 
>> and create aliases with deprecation warning?
> 
> The other ones are not so problematic because they do not conflict with
> SQL keywords.  It's only delete() and filter() that scare me.
> 
>   regards, tom lane

Okay, so changed functions to ts_setweight, ts_delete, ts_unnest, ts_filter.



tsvector_ops_rename.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > - USERMAP
> 
> >   Uses pg_options_to_table(), but I don't think that actually uses
> >   SysCache at all, it's just taking the array provided and builds a
> >   table out of it, so I think this case is ok.
> 
> USERMAP seems a bit squishy and easily broken, perhaps.  Not sure there's
> an advantage to distinguishing this case --- why did you break it out
> from DEFINITION to start with?  Also, AFAICS, it does not apply to tables
> which are the only things we lock anyway.

When it comes to usermaps, they're dumped as part of FOREIGN SERVERs, so
I broke out the usermap as being an independent component from the
definition of the FOREIGN SERVER itself.  I realize we don't currently
have a way to ask pg_dump to exclude user mappings, but it seemed like
it wouldn't be unreasonable for us to have that in the future as user
mappings include role names and we have similar options for avoiding
specific role names in a dump (eg: --no-owner).

I agree that there's nothing to lock for usermaps though, as they aren't
associated with tables.

> Seems reasonable otherwise.

Ok, I'll write up a patch for that, should be pretty trivial.

> > Of course, the pg_dump would still end up including the ACLs for
> > pg_authid and whatever other tables the user has changed the ACLs on and
> > errors will be thrown during restore if the restore is done with a
> > non-superuser.
> 
> Right, but at least you have the option of ignoring such errors.

Agreed.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_dump broken for non-super user

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Why is it that we need to lock a table at all if we're just going to dump
>> its ACL?

> I think I'm coming around to agree with that, but it seems like it'd be
> better to look at each component and say "we know X is safe, so we won't
> lock the table if we're only doing X" rather than saying "we only need to
> lock the table for case X".

Agreed.  I did not realize you'd broken down the aspects of an object so
finely in pg_dump, but since you have, this is a good way to approach it.

> When considering the components:

> - DEFINITION
> - DATA
>   [ obviously need lock ]

> - COMMENT

>   Shouldn't require a lock, only uses a relatively simple query against
>   pg_description.

> - SECLABEL

>   Similar to COMMENT, shouldn't require a lock.

> - ACL

>   ACL info is collected from pg_class relacl without any server-side
>   functions being used which would impact the result.

> - POLICY

>   Uses pg_get_expr(), which at least gets the relation name from
>   SysCache, so we'll want to lock the table.

> - USERMAP

>   Uses pg_options_to_table(), but I don't think that actually uses
>   SysCache at all, it's just taking the array provided and builds a
>   table out of it, so I think this case is ok.

USERMAP seems a bit squishy and easily broken, perhaps.  Not sure there's
an advantage to distinguishing this case --- why did you break it out
from DEFINITION to start with?  Also, AFAICS, it does not apply to tables
which are the only things we lock anyway.

Seems reasonable otherwise.

> Of course, the pg_dump would still end up including the ACLs for
> pg_authid and whatever other tables the user has changed the ACLs on and
> errors will be thrown during restore if the restore is done with a
> non-superuser.

Right, but at least you have the option of ignoring such errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Why is it that we need to lock a table at all if we're just going to dump
> its ACL?  I understand the failure modes that motivate locking when we're
> going to dump data or schema, but the ACL is not really subject to that
> kind of problem: we are going to see a unitary, unchanging view of
> pg_class.relacl in our snapshot, and we aren't relying on any server-side
> logic to interpret that AFAIR.

I think I'm coming around to agree with that, but it seems like it'd be
better to look at each component and say "we know X is safe, so we won't
lock the table if we're only doing X" rather than saying "we only need to
lock the table for case X".

Also, we should document why we need to lock the table in some cases and
not others.  To that end, I'd like to make sure that it's clear what
cases we are considering here.

In particular, dumping the schema definition of a table may involve
calling server-side functions which will see changes to the table which
have happened after our transaction started, primairly due to SysCache
usage in those functions.  Any of the components which only look at the
tables in pg_catalog should be fine and not require us to lock the
table.

When considering the components:

- DEFINITION

  pg_get_viewdef(), pg_get_indexdef(), pg_get_constraintdef(),
  pg_get_triggerdef(), etc...

- DATA

  Depends on the table structure itself not changing.

- COMMENT

  Shouldn't require a lock, only uses a relatively simple query against
  pg_description.

- SECLABEL

  Similar to COMMENT, shouldn't require a lock.

- ACL

  ACL info is collected from pg_class relacl without any server-side
  functions being used which would impact the result.

- POLICY

  Uses pg_get_expr(), which at least gets the relation name from
  SysCache, so we'll want to lock the table.

- USERMAP

  Uses pg_options_to_table(), but I don't think that actually uses
  SysCache at all, it's just taking the array provided and builds a
  table out of it, so I think this case is ok.

If the above looks reasonable to others, I can write up a patch which
will skip locking the table if we are only looking to include components
that don't require a lock.  Since we only ever look at pg_catalog for
ACLs (except if a user explicitly asks to dump one of the tables in
pg_catalog), we shouldn't ever attempt to lock those tables.

Of course, the pg_dump would still end up including the ACLs for
pg_authid and whatever other tables the user has changed the ACLs on and
errors will be thrown during restore if the restore is done with a
non-superuser.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] atomic pin/unpin causing errors

2016-05-04 Thread Teodor Sigaev


I get the errors:

ERROR:  attempted to delete invisible tuple
STATEMENT:  update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR:  unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT:  update foo set count=count+1 where text_array @> $1


Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507.  Is there any chance you could package up a
data directory after the issue hit?


I've got
ERROR:  unexpected chunk number 0 (expected 1) for toast value 10192986 in 
pg_toast_16424


The test required 10 hours to run on my notebook. postgresql was compiled with 
-O0 --enable-debug --enable-cassert.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

Fabien COELHO  writes:
>> There's not really a point in using an enum if you use neither the type 
>> (which you shouldn't because if you or the bitmask value you have types 
>> outside the range of the enum), nor the autogenerated numbers.

> I do not think that there is such a constraint in C, you can use the enum 
> bitfield type, so you should.

I think you are failing to understand Andres' point.  If you're going
to OR together some bits, the result is no longer a member of the enum
type, and the advantages that an enum would have immediately turn into
disadvantages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Segmentation fault when max_parallel degree is very High

Dilip Kumar  writes:
> When parallel degree is set to very high say 7, there is a segmentation
> fault in parallel code,
> and that is because type casting is missing in the code..

I'd say the cause is not having a sane range limit on the GUC.

> or corrupt some memory. Need to typecast
> *i * PARALLEL_TUPLE_QUEUE_SIZE  --> (Size)i * **PARALLEL_TUPLE_QUEUE_SIZE *and
> this will fix

That might "fix" it on 64-bit machines, but not 32-bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector filter problem

2016-05-04 Thread Teodor Sigaev


Thank you, pushed.

Stas Kelvich wrote:

Hi.

As discovered by Oleg Bartunov, current filter() function for tsvector can 
crash backend.
Bug was caused erroneous usage of char type in memmove argument.









--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011

"David G. Johnston"  writes:
> This is a bit hard to reason about given that our implementation of
> inheritance is non-standard.

Yeah, that's a fairly key point.  We've solved those problems with
respect to inherited CHECK constraints, and it seems like what we
ought to do with NOT NULL is make it work the same as CHECK, rather
than invent some new concepts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

Stephen Frost  writes:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> But in Rushabh's example, he's not doing that.  He's trying to do a
>> full-database dump of a database that contains one object which the
>> dump user has rights to access.  Previously, that worked.  Now, it
>> fails with an error about a system catalog.  How is that not broken?

> As I mentioned up-thread, the optimization to skip tables which are not
> "interesting" has been improved in the patch-set posted this morning to
> skip over tables whose ACLs haven't changed from the defaults.  With
> that patch, we will skip over catalog tables whose ACLs haven't been
> changed and Rushabh's command will work as a non-superuser, assuming
> none of the ACLs on tables in pg_catalog have been changed.

> However, if any of the ACLs have been changed on tables in pg_catalog,
> we'll attempt to lock those tables and include those ACLs.  That will
> still work in many cases as you only need SELECT access to be able to
> lock a table in access share mode, but if the permissions on pg_authid
> are changed, the same failure will occur.

I think this is a bad idea, not only because of the issue about
permissions failures, but because the more things pg_dump locks
the greater risk it has of deadlock failures against other sessions.

Why is it that we need to lock a table at all if we're just going to dump
its ACL?  I understand the failure modes that motivate locking when we're
going to dump data or schema, but the ACL is not really subject to that
kind of problem: we are going to see a unitary, unchanging view of
pg_class.relacl in our snapshot, and we aren't relying on any server-side
logic to interpret that AFAIR.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is pg_control file crashsafe?

Amit Kapila  writes:
> On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov 
> wrote:
>> On 03.05.2016 2:17, Tom Lane wrote:
>>> Writing a single sector ought to be atomic too.

>> pg_control is 8k long(i think it is legth of one page in default PG
>> compile settings).

> The actual data written is always sizeof(ControlFileData) which should be
> less than one sector.

Yes.  We don't care what happens to the rest of the file as long as the
first sector's worth is updated atomically.  See the comments for
PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.

We could change to a different PG_CONTROL_SIZE pretty easily, and there's
certainly room to argue that reducing it to 512 or 1024 would be more
efficient.  I think the motivation for setting it at 8K was basically
"we're already assuming that 8K writes are efficient, so let's assume
it here too".  But since the file is only written once per checkpoint,
efficiency is not really a key selling point anyway.  If you could make
an argument that some other size would reduce the risk of failures,
it would be interesting --- but I suspect any such argument would be
very dependent on the quirks of a specific file system.

One point worth considering is that on most file systems, rewriting
a fraction of a page is *less* efficient than rewriting a full page,
because the kernel first has to read in the old contents to fill
the disk buffer it's going to partially overwrite with new data.
This motivates against trying to reduce the write size too much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5.2: "sql" as reserved word?

On Wed, May 4, 2016 at 8:58 AM, Marc Mamin  wrote:

> select 'x' sql;
>
> ERROR:  syntax error at or near "sql"
> LINE 1: select 'x' sql;

It's likely that you already know this, but for the benefit of
anyone finding the thread who doesn't -- you can avoid this sort of
error by either inserting the optional AS keyword or quoting the
column label:

select 'x' as sql;

or:

select 'x' "sql";

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] old_snapshot_threshold's interaction with hash index

On Tue, May 3, 2016 at 11:48 AM, Robert Haas  wrote:

> OK, I see now: the basic idea here is that we can't prune based on the
> newer XID unless the page LSN is guaranteed to advance whenever data
> is removed.  Currently, we attempt to limit bloat in non-unlogged,
> non-catalog tables.  You're saying we can instead attempt to limit
> bloat only in non-unlogged, non-catalog tables without hash indexes,
> and that will fix this issue.  Am I right?

As a first cut, something like the attached.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 4fecece..49a6c81 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -279,7 +279,6 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 		buf = so->hashso_curbuf;
 		Assert(BufferIsValid(buf));
 		page = BufferGetPage(buf);
-		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoffnum = PageGetMaxOffsetNumber(page);
 		for (offnum = ItemPointerGetOffsetNumber(current);
 			 offnum <= maxoffnum;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index eb8c9cd..4825558 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -189,7 +189,6 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
 	page = BufferGetPage(metabuf);
-	TestForOldSnapshot(scan->xs_snapshot, rel, page);
 	metap = HashPageGetMeta(page);
 
 	/*
@@ -243,7 +242,6 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
 	page = BufferGetPage(buf);
-	TestForOldSnapshot(scan->xs_snapshot, rel, page);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
 
@@ -350,7 +348,6 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 	_hash_readnext(rel, , , );
 	if (BufferIsValid(buf))
 	{
-		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoff = PageGetMaxOffsetNumber(page);
 		offnum = _hash_binsearch(page, so->hashso_sk_hash);
 	}
@@ -392,7 +389,6 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 	_hash_readprev(rel, , , );
 	if (BufferIsValid(buf))
 	{
-		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoff = PageGetMaxOffsetNumber(page);
 		offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
 	}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 432feef..79cc3df 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5313,6 +5313,52 @@ RelationIdIsInInitFile(Oid relationId)
 }
 
 /*
+ * Tells whether any index for the relation is unlogged.
+ *
+ * Any index using the hash AM is implicitly unlogged.
+ *
+ * Note: There doesn't seem to be any way to have an unlogged index attached
+ * to a permanent table except to create a hash index, but it seems best to
+ * keep this general so that it returns sensible results even when they seem
+ * obvious (like for an unlogged table) and to handle possible future unlogged
+ * indexes on permanent tables.
+ */
+bool
+RelationHasUnloggedIndex(Relation rel)
+{
+	List		   *indexoidlist;
+	ListCell	   *indexoidscan;
+	bool			result = false;
+
+	indexoidlist = RelationGetIndexList(rel);
+
+	foreach(indexoidscan, indexoidlist)
+	{
+		Oid			indexoid = lfirst_oid(indexoidscan);
+		HeapTuple	tp;
+		Form_pg_class reltup;
+
+		tp = SearchSysCache1(RELOID, ObjectIdGetDatum(indexoid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for relation %u", indexoid);
+		reltup = (Form_pg_class) GETSTRUCT(tp);
+
+		if (reltup->relpersistence == RELPERSISTENCE_UNLOGGED
+			|| reltup->relam == HASH_AM_OID)
+			result = true;
+
+		ReleaseSysCache(tp);
+
+		if (result == true)
+			break;
+	}
+
+	list_free(indexoidlist);
+
+	return result;
+}
+
+/*
  * Invalidate (remove) the init file during commit of a transaction that
  * changed one or more of the relation cache entries that are kept in the
  * local init file.
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 0a9a231..e1551a3 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1590,7 +1590,8 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		&& old_snapshot_threshold >= 0
 		&& RelationNeedsWAL(relation)
 		&& !IsCatalogRelation(relation)
-		&& !RelationIsAccessibleInLogicalDecoding(relation))
+		&& !RelationIsAccessibleInLogicalDecoding(relation)
+		&& !RelationHasUnloggedIndex(relation))
 	{
 		int64		ts = GetSnapshotCurrentTimestamp();
 		TransactionId xlimit = recentXmin;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index

[HACKERS] Update to contrib/chkpass

2016-05-04 Thread D'Arcy J.M. Cain

After all these years of inactivity I don't think that I have commit
access any more so perhaps someone can review the attached update to
chkpass.c and commit it if it looks OK.

There are two major changes to the existing chkpass type:
 1. It now handles MD5 as well as DES
 2. It has a GUC variable to select which

Cheers.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 788 2246 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net, VoIP: sip:da...@druid.net
/*
 * PostgreSQL type definitions for chkpass
 * Written by D'Arcy J.M. Cain
 * da...@druid.net
 * http://www.druid.net/darcy/
 *
 * contrib/chkpass/chkpass.c
 * best viewed with tabs set to 4
 */

#include "postgres.h"

#include 
#include 
#ifdef HAVE_CRYPT_H
#include 
#endif

#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/guc.h"

PG_MODULE_MAGIC;

/*
 * This type encrypts its input unless the first character is a colon.
 * The output is the encrypted form with a leading colon.  The output
 * format is designed to allow dump and reload operations to work as
 * expected without doing special tricks.
 */


/*
 * CHKPASS is now a variable-width type. The format for an md5-format
 * password is $1$$ where the hash is always 22 characters
 * when base-64 encoded. We allocate space for that, plus a leading :
 * plus a trailing null.
 */

typedef struct varlena chkpass;
#define SALTLEN 8
#define CHKPASSLEN (VARHDRSZ + SALTLEN + 6 + 22)

/* controlled by the GUC chkpass.use_md5 */
static bool use_md5 = false;

/*
 * Various forward declarations:
 */

Datum		chkpass_in(PG_FUNCTION_ARGS);
Datum		chkpass_out(PG_FUNCTION_ARGS);
Datum		chkpass_rout(PG_FUNCTION_ARGS);

/* Only equal or not equal make sense */
Datum		chkpass_eq(PG_FUNCTION_ARGS);
Datum		chkpass_ne(PG_FUNCTION_ARGS);


/* This function checks that the password is a good one
 * It's just a placeholder for now */
static int
verify_pass(const char *str)
{
	return 0;
}

/*
 * CHKPASS reader.
 */
PG_FUNCTION_INFO_V1(chkpass_in);
Datum
chkpass_in(PG_FUNCTION_ARGS)
{
	char	   *str = PG_GETARG_CSTRING(0);
	chkpass*result;
	char		mysalt[SALTLEN+6];
	int			i;
	static char salt_chars[] =
	"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

	result = (chkpass *) palloc(CHKPASSLEN);
	/* special case to let us enter encrypted passwords */
	if (*str == ':')
	{
		SET_VARSIZE(result, VARHDRSZ + strlen(str)-1);
		memcpy(VARDATA(result), str+1, VARSIZE(result)-VARHDRSZ);
		PG_RETURN_POINTER(result);
	}

	if (verify_pass(str) != 0)
		ereport(ERROR,
(errcode(ERRCODE_DATA_EXCEPTION),
 errmsg("password \"%s\" is weak", str)));

	if (use_md5)
	{
		memcpy(mysalt, "$1$", 3);
		for (i = 3; i < (3+SALTLEN); i++)
			mysalt[i] = salt_chars[random() & 0x3f];
		mysalt[3+SALTLEN] = '$';
		mysalt[4+SALTLEN] = 0;
	}
	else
	{
		mysalt[0] = salt_chars[random() & 0x3f];
		mysalt[1] = salt_chars[random() & 0x3f];
		mysalt[2] = 0;	/* technically the terminator is not necessary
		 * but I like to play safe */
	}
	strcpy(result->vl_dat, crypt(str, mysalt));
	SET_VARSIZE(result, VARHDRSZ + strlen(result->vl_dat));
	PG_RETURN_POINTER(result);
}

/*
 * CHKPASS output function.
 */

/* common output code */
static char *
out(chkpass *password)
{
	char	   *str;
	size_t	  sz;

	sz = VARSIZE(password) - VARHDRSZ;
	str = palloc(sz + 1);
	memcpy(str, VARDATA(password), sz);
	str[sz] = 0;
	return str;
}

PG_FUNCTION_INFO_V1(chkpass_out);
Datum
chkpass_out(PG_FUNCTION_ARGS)
{
	char	   *str, *result;

	str = out((chkpass *) PG_GETARG_POINTER(0));
	result = (char *) palloc(strlen(str) + 2);
	result[0] = ':';
	strcpy(result + 1, str);

	PG_RETURN_CSTRING(result);
}


/*
 * special output function that doesn't output the colon
 */

PG_FUNCTION_INFO_V1(chkpass_rout);
Datum
chkpass_rout(PG_FUNCTION_ARGS)
{
	char	   *str;

	str = out((chkpass *) PG_GETARG_POINTER(0));
	PG_RETURN_TEXT_P(cstring_to_text(str));
}


/*
 * Boolean tests
 */

PG_FUNCTION_INFO_V1(chkpass_eq);
Datum
chkpass_eq(PG_FUNCTION_ARGS)
{
	chkpass	   *a1 = (chkpass *) PG_GETARG_POINTER(0);
	text	  *a2 = PG_GETARG_TEXT_P(1);
	char	  *str;
	char	  *t;

	str = palloc(VARSIZE(a2)-VARHDRSZ+1);
	memcpy(str, VARDATA(a2), VARSIZE(a2)-VARHDRSZ);
	str[VARSIZE(a2)-VARHDRSZ] = 0;
	t = crypt(str, VARDATA(a1));

	PG_RETURN_BOOL(memcmp(VARDATA(a1), t, VARSIZE(a1)-VARHDRSZ) == 0);
}

PG_FUNCTION_INFO_V1(chkpass_ne);
Datum
chkpass_ne(PG_FUNCTION_ARGS)
{
	chkpass	*a1 = (chkpass *) PG_GETARG_POINTER(0);
	text	   *a2 = (text *) PG_GETARG_TEXT_P(1);
	char	   *str;
	char	   *t;

	str = palloc(VARSIZE(a2)-VARHDRSZ+1);
	memcpy(str, VARDATA(a2), VARSIZE(a2)-VARHDRSZ);
	str[VARSIZE(a2)-VARHDRSZ] = 0;
	t = crypt(str, a1->vl_dat);

	PG_RETURN_BOOL(memcmp(VARDATA(a1), t, VARSIZE(a1)-VARHDRSZ) != 0);
}

/* define a custom variable to control md5 hashing of passwords.
 * This should require a SIGHUP because it would be administratively
 * disasterous to allow

Re: [HACKERS] 9.5.2: "sql" as reserved word?

On Wed, May 4, 2016 at 9:58 AM, Marc Mamin  wrote:
> This query is working in 9.4.x, but not in 9.5.2:
>
> select 'x' sql;
>
> ERROR:  syntax error at or near "sql"
> LINE 1: select 'x' sql;
>
> Is this expected or a known issue?
> I could neither find any hint about it in the incompatibility list of the
> 9.5 release notes,
> nor is "sql" listed as reserved keyword.

Sadly, that fails for ANY keyword, not just reserved keywords.
EnterpriseDB has gotten a few customer complaints about this, but it's
not obvious how to fix it.

This commit made SQL a keyword:

commit cac76582053ef8ea07df65fed0757f352da23705
Author: Peter Eisentraut 
Date:   Sun Apr 26 10:33:14 2015 -0400

Add transforms feature

This provides a mechanism for specifying conversions between SQL data
types and procedural languages.  As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.

reviews by Pavel Stěhule and Andres Freund

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.5.2: "sql" as reserved word?

2016-05-04 Thread Marc Mamin

Hello,

This query is working in 9.4.x, but not in 9.5.2:

select 'x' sql;

ERROR:  syntax error at or near "sql"
LINE 1: select 'x' sql;


Is this expected or a known issue?
I could neither find any hint about it in the incompatibility list of the 9.5 
release notes,
nor is "sql" listed as reserved keyword.



best regards,

Marc Mamin

Re: [HACKERS] Is pg_control file crashsafe?

2016-05-04 Thread Amit Kapila

On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov 
wrote:

>
>
> On 03.05.2016 2:17, Tom Lane wrote:
>
>> Alex Ignatov  writes:
>>
>>> I think that rename can help a little bit. At least on some FS it is
>>> atomic operation.
>>>
>>
>> Writing a single sector ought to be atomic too.  I'm very skeptical that
>> it'll be an improvement to just move the risk from one filesystem
>> operation to another; especially not to one where there's not even a
>> terribly portable way to request fsync.
>>
>> regards, tom lane
>>
>>
>> pg_control is 8k long(i think it is legth of one page in default PG
> compile settings).
> I also think that 8k recording can be atomic. Even if recording of one
> sector is atomic nobody can say about what sector from 8k record of
> pg_control  should be written first. It can be last sector or say sector
> number 10 from 16.

The actual data written is always sizeof(ControlFileData) which should be
less than one sector.  I think it is only possible that we get a torn write
for pg_control, if while writing + fsyncing, the filesystem maps that data
to different sectors.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Naming of new tsvector functions

Stas Kelvich  writes:
>> On 03 May 2016, at 00:59, David Fetter  wrote:
>> I suspect that steering that ship would be a good idea starting with
>> deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?

> In 9.5 there already were tsvector functions length(), numnode(), strip()

> Recent commit added setweight(), delete(), unnest(), tsvector_to_array(), 
> array_to_tsvector(), filter().

> Last bunch can be painlessly renamed, for example to ts_setweight, ts_delete, 
> ts_unnest, ts_filter.

> The question is what to do with old ones? Leave them as is? Rename to ts_* 
> and create aliases with deprecation warning?

The other ones are not so problematic because they do not conflict with
SQL keywords.  It's only delete() and filter() that scare me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

On Wed, May 4, 2016 at 9:35 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Wed, May 4, 2016 at 5:29 AM, Stephen Frost  wrote:
>> > There is no such limitation on using pg_dump as a non-superuser.  It's
>> > always been the case that you need to be able to LOCK the table that
>> > you're dumping.  If you don't have rights to LOCK a certain table then
>> > pg_dump is going to throw an error just like the one you saw.
>>
>> But in Rushabh's example, he's not doing that.  He's trying to do a
>> full-database dump of a database that contains one object which the
>> dump user has rights to access.  Previously, that worked.  Now, it
>> fails with an error about a system catalog.  How is that not broken?
>
> As I mentioned up-thread, the optimization to skip tables which are not
> "interesting" has been improved in the patch-set posted this morning to
> skip over tables whose ACLs haven't changed from the defaults.  With
> that patch, we will skip over catalog tables whose ACLs haven't been
> changed and Rushabh's command will work as a non-superuser, assuming
> none of the ACLs on tables in pg_catalog have been changed.

OK.

> However, if any of the ACLs have been changed on tables in pg_catalog,
> we'll attempt to lock those tables and include those ACLs.  That will
> still work in many cases as you only need SELECT access to be able to
> lock a table in access share mode, but if the permissions on pg_authid
> are changed, the same failure will occur.

Hmm, I guess that's reasonable.  It might cause problems for some
users, but then again it might not.  I guess that's what a beta is
for.

> I wouldn't recommend depending on those tables being excluded in backup
> scripts.  If you wish to dump everything except pg_catalog, the -N
> switch can be used to exclude that schema.

Maybe that recommendation should be added to the documentation and/or
release notes someplace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-04 Thread Stas Kelvich

> On 03 May 2016, at 00:59, David Fetter  wrote:
> 
> On Mon, May 02, 2016 at 01:58:11PM -0400, Tom Lane wrote:
>> I wrote:
>>> I think we'd be better off to rename these to tsvector_delete()
>>> and tsvector_filter() while we still can.
>> 
>> ... although I now notice that hstore already exposes a function
>> named delete(), so that ship may have sailed already.  But I'm more
>> troubled by filter() anyhow, since that keyword can appear in
>> expressions --- it seems much more likely that that would pose a
>> parsing conflict after future SQL extensions.
> 
> I suspect that steering that ship would be a good idea starting with
> deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?
> 
> Cheers,
> David.


In 9.5 there already were tsvector functions length(), numnode(), strip()

Recent commit added setweight(), delete(), unnest(), tsvector_to_array(), 
array_to_tsvector(), filter().

Last bunch can be painlessly renamed, for example to ts_setweight, ts_delete, 
ts_unnest, ts_filter.

The question is what to do with old ones? Leave them as is? Rename to ts_* and 
create aliases with deprecation warning?

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, May 4, 2016 at 5:29 AM, Stephen Frost  wrote:
> > There is no such limitation on using pg_dump as a non-superuser.  It's
> > always been the case that you need to be able to LOCK the table that
> > you're dumping.  If you don't have rights to LOCK a certain table then
> > pg_dump is going to throw an error just like the one you saw.
> 
> But in Rushabh's example, he's not doing that.  He's trying to do a
> full-database dump of a database that contains one object which the
> dump user has rights to access.  Previously, that worked.  Now, it
> fails with an error about a system catalog.  How is that not broken?

As I mentioned up-thread, the optimization to skip tables which are not
"interesting" has been improved in the patch-set posted this morning to
skip over tables whose ACLs haven't changed from the defaults.  With
that patch, we will skip over catalog tables whose ACLs haven't been
changed and Rushabh's command will work as a non-superuser, assuming
none of the ACLs on tables in pg_catalog have been changed.

However, if any of the ACLs have been changed on tables in pg_catalog,
we'll attempt to lock those tables and include those ACLs.  That will
still work in many cases as you only need SELECT access to be able to
lock a table in access share mode, but if the permissions on pg_authid
are changed, the same failure will occur.

I wouldn't recommend depending on those tables being excluded in backup
scripts.  If you wish to dump everything except pg_catalog, the -N
switch can be used to exclude that schema.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] tsvector filter problem

2016-05-04 Thread Stas Kelvich

Hi.

As discovered by Oleg Bartunov, current filter() function for tsvector can 
crash backend.
Bug was caused erroneous usage of char type in memmove argument.



tsvector_bugfix_type.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump broken for non-super user

On Wed, May 4, 2016 at 5:29 AM, Stephen Frost  wrote:
> There is no such limitation on using pg_dump as a non-superuser.  It's
> always been the case that you need to be able to LOCK the table that
> you're dumping.  If you don't have rights to LOCK a certain table then
> pg_dump is going to throw an error just like the one you saw.

But in Rushabh's example, he's not doing that.  He's trying to do a
full-database dump of a database that contains one object which the
dump user has rights to access.  Previously, that worked.  Now, it
fails with an error about a system catalog.  How is that not broken?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2016-05-04 Thread Peter Eisentraut


On 5/4/16 3:21 AM, Dean Rasheed wrote:

Well, appendStringLiteralAH() takes both, so that sets a precedent.


Works for me.  Could you supply an updated patch with a static function 
instead of a macro?  Then I think this is good to go.


(bonus points for some tests)

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump dump catalog ACLs