Re: [HACKERS] Potential GIN vacuum bug

2015-10-04 Thread Jeff Janes
On Thu, Sep 3, 2015 at 10:42 AM, Jeff Janes  wrote:

> On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes  wrote:
>
>> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:
>>
>>> Jeff Janes  writes:
>>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
>>>
>> > But we would still have to deal with the
>>> > fact that unconditional acquire attempt by the backends will cause a
>>> vacuum
>>> > to cancel itself, which is undesirable.
>>>
>>> Good point.
>>>
>>> > If we define a new namespace for
>>> > this lock (like the relation extension lock has its own namespace) then
>>> > perhaps the cancellation code could be made to not cancel on that
>>> > condition.  But that too seems like a lot of work to backpatch.
>>>
>>> We could possibly teach the autocancel logic to distinguish this lock
>>> type
>>> from others without using a new namespace.  That seems a bit klugy, but
>>> maybe better than adding a new namespace.  (For example, there are
>>> probably only a couple of modes in which we take page-level locks at
>>> present.  Choosing a currently unused, but self-exclusive, mode for
>>> taking
>>> such a lock might serve.)
>>>
>>
>> Like the attached?  (The conditional variant for user backends was
>> unceremoniously yanked out.)
>>
>
> A problem here is that now we have the user backends waiting on vacuum to
> do the clean up, but vacuum is using throttled IO and so taking its sweet
> time at it.  Under the old code, the user backend could pass up the vacuum
> while it was sleeping.
>
> Maybe we could have the vacuum detect when someone is waiting on it, and
> temporarily suspend throttling and just run at full speed.  I don't believe
> there is any precedence for that, but I do think there are other places
> where such a technique could be useful.  That is kind of a scary change to
> backpatch.
>
> I am running out of ideas.
>


Teodor published a patch in another thread:
http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru but I
thought it would be best to discuss it here.

It is similar to my most recent patch.

He removes the parts of the code that anticipates concurrent clean up, and
replaces them with asserts, which I was too lazy to do until we have a
final design.

He uses a different lock mode (ExclusiveLock, rather than
ShareUpdateExclusiveLock) when heavy-locking the metapage.  It doesn't make
a difference, as long as it is self-exclusive and no one else uses it in a
way that causes false sharing (which is currently the case--the only other
user of PageLocks is the hash index code)

He always does conditional locks in regular backends.  That means he
doesn't have to hack the lmgr to prevent vacuum from canceling itself, the
way I did.  It also means there is not the "priority inversion" I mention
above, where a user backend blocks on vacuum, but vacuum is intentionally
throttling itself.

On the other hand, using conditional locks for normal backends does mean
that the size of the pending list can increase without limit, as there is
nothing to throttle the user backends from adding tuples faster than they
are cleaned up. Perhaps worse, it can pin down a vacuum worker without
limit, as it keeps finding more pages have been added by the time it
finished the prior set of pages.  I actually do see this on my (not very
realistic) testing.

I think that for correctness, vacuum only needs to clean the part of the
pending list which existed as of the time vacuum started.  So as long as it
gets that far, it can just be done even if more pages have since been
added.  I'm not sure the best way implement that, I guess you remember the
blkno of the tail page from when you started, and would set a flag once you
truncated away a page with that same blkno.  That would solve the pinning
down a vacuum worker for an unlimited amount of time issue, but would not
solve the unlimited growth of the pending list issue.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-10-02 Thread Robert Haas
On Thu, Oct 1, 2015 at 4:44 PM, Jeff Janes  wrote:
> Is the locking around indexes summarized someplace?

Not to my knowledge.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-10-01 Thread Jeff Janes
On Tue, Sep 1, 2015 at 8:05 AM, Robert Haas  wrote:

> On Sun, Aug 30, 2015 at 6:57 PM, Tom Lane  wrote:
> >> But we would still have to deal with the
> >> fact that unconditional acquire attempt by the backends will cause a
> vacuum
> >> to cancel itself, which is undesirable.
> >
> > Good point.
> >
> >> If we define a new namespace for
> >> this lock (like the relation extension lock has its own namespace) then
> >> perhaps the cancellation code could be made to not cancel on that
> >> condition.  But that too seems like a lot of work to backpatch.
> >
> > We could possibly teach the autocancel logic to distinguish this lock
> type
> > from others without using a new namespace.  That seems a bit klugy, but
> > maybe better than adding a new namespace.  (For example, there are
> > probably only a couple of modes in which we take page-level locks at
> > present.  Choosing a currently unused, but self-exclusive, mode for
> taking
> > such a lock might serve.)
>
> That seems like a pretty big kludge to me; it will work until it doesn't.
>
> Adding a new lock type (similar to "relation extension") would address
> a lot of my concern with the heavyweight lock approach.  It strikes me
> that trying to grab a lock on the index in what's basically a pretty
> low-level operation here could have a variety of unintended
> consequences.  The autovacuum thing is one; but consider also the risk
> of new deadlock scenarios resulting from a lock upgrade.  Those
> deadlocks would likely be, to use Peter Geoghegan's term,
> unprincipled.  The locking regime around indexes is already pretty
> complicated, and I'm skeptical about the idea that we can complicate
> it further without any blowback.
>

Is the locking around indexes summarized someplace?  About the best thing I
could come up with was to do a "git grep LockRelat" and then look for lines
where the first argument had a name that seemed likely to refer to an index.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-09-03 Thread Jeff Janes
On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes  wrote:

> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:
>
>> Jeff Janes  writes:
>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
>>
> > But we would still have to deal with the
>> > fact that unconditional acquire attempt by the backends will cause a
>> vacuum
>> > to cancel itself, which is undesirable.
>>
>> Good point.
>>
>> > If we define a new namespace for
>> > this lock (like the relation extension lock has its own namespace) then
>> > perhaps the cancellation code could be made to not cancel on that
>> > condition.  But that too seems like a lot of work to backpatch.
>>
>> We could possibly teach the autocancel logic to distinguish this lock type
>> from others without using a new namespace.  That seems a bit klugy, but
>> maybe better than adding a new namespace.  (For example, there are
>> probably only a couple of modes in which we take page-level locks at
>> present.  Choosing a currently unused, but self-exclusive, mode for taking
>> such a lock might serve.)
>>
>
> Like the attached?  (The conditional variant for user backends was
> unceremoniously yanked out.)
>

A problem here is that now we have the user backends waiting on vacuum to
do the clean up, but vacuum is using throttled IO and so taking its sweet
time at it.  Under the old code, the user backend could pass up the vacuum
while it was sleeping.

Maybe we could have the vacuum detect when someone is waiting on it, and
temporarily suspend throttling and just run at full speed.  I don't believe
there is any precedence for that, but I do think there are other places
where such a technique could be useful.  That is kind of a scary change to
backpatch.

I am running out of ideas.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-09-01 Thread Robert Haas
On Sun, Aug 30, 2015 at 6:57 PM, Tom Lane  wrote:
>> But we would still have to deal with the
>> fact that unconditional acquire attempt by the backends will cause a vacuum
>> to cancel itself, which is undesirable.
>
> Good point.
>
>> If we define a new namespace for
>> this lock (like the relation extension lock has its own namespace) then
>> perhaps the cancellation code could be made to not cancel on that
>> condition.  But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace.  That seems a bit klugy, but
> maybe better than adding a new namespace.  (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present.  Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)

That seems like a pretty big kludge to me; it will work until it doesn't.

Adding a new lock type (similar to "relation extension") would address
a lot of my concern with the heavyweight lock approach.  It strikes me
that trying to grab a lock on the index in what's basically a pretty
low-level operation here could have a variety of unintended
consequences.  The autovacuum thing is one; but consider also the risk
of new deadlock scenarios resulting from a lock upgrade.  Those
deadlocks would likely be, to use Peter Geoghegan's term,
unprincipled.  The locking regime around indexes is already pretty
complicated, and I'm skeptical about the idea that we can complicate
it further without any blowback.

If we use a new lock type, it's a lot easier to reason about the
interactions, I think.  We know all of the things that will take that
lock type.  And we can be reasonably confident that future code
changes won't introduce any new ones, or that the current ones will be
considered before making changes.

It's not great to have to back-patch such a change, but in practice
the blast radius should be pretty limited.  People who are looking at
pg_locks might start to see a new lock type show up that they're not
expecting, but a lot of people either aren't looking at that data at
all, or are looking at it but not doing anything programmatic with it
and therefore won't really be impacted by something new showing up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-31 Thread Jeff Janes
On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
> >> Your earlier point about how the current design throttles insertions to
> >> keep the pending list from growing without bound seems like a bigger
> deal
> >> to worry about.  I think we'd like to have some substitute for that.
> >> ...
>
> > If the goal is to not change existing behavior (like for back patching)
> the
> > margin should be 1, always wait.
>
> The current behavior is buggy, both as to performance and correctness,
> so I'm not sure how come "don't change the behavior" should be a
> requirement.
>

I'm not confident the new behavior, with regards to performance, is an
absolute win.
We usually don't backpatch performance changes unless they have no or very
little
trade off.  The only margin I can confidently say that for is the margin of
1.0.


>
> > But we would still have to deal with the
> > fact that unconditional acquire attempt by the backends will cause a
> vacuum
> > to cancel itself, which is undesirable.
>
> Good point.
>
> > If we define a new namespace for
> > this lock (like the relation extension lock has its own namespace) then
> > perhaps the cancellation code could be made to not cancel on that
> > condition.  But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace.  That seems a bit klugy, but
> maybe better than adding a new namespace.  (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present.  Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)
>

Like the attached?  (The conditional variant for user backends was
unceremoniously yanked out.)


>
> > Would we bother to back-patch a theoretical bug which there is no
> evidence
> > is triggering in the field?
>
> What's theoretical about it?  You seemed pretty sure that the issue in
>
> http://www.postgresql.org/message-id/flat/CA+bfosGVGVQhMAa=0-mue6coo7dbsgayxb-xsnr5vm-s39h...@mail.gmail.com
> was exactly this.
>

I was adamant that the broken concurrency was not helping him
performance-wise, and also introduces correctness bugs.  But I don't think
the unfortunate performance is a bug, just a issue highly related to one
that is a bug.  I don't think a margin of 2, or even 20, would help him.
It would just build a bigger time bomb with a longer fuse.  What he needs
is to turn fastupdate off, or get ssd, or get some other improvements that
aren't going to be backpatched.  If we don't know what setting to use to
fix one person's performance problem, I'd rather set it to something that
at least is know that it won't cause other people to have problems that
they didn't used to.


>
> > If we want to improve the current behavior rather than fix a bug, then I
> > think that if the list is greater than threshold*2 and the cleaning lock
> is
> > unavailable, what it should do is proceed to insert the tuple's keys into
> > the index itself, as if fastupdate = off.  That would require some major
> > surgery to the existing code, as by the time it invokes the clean up, it
> is
> > too late to not insert into the pending list.
>
> Meh.  That's introducing a whole new behavioral regime, which quite aside
> from correctness bugs might introduce new performance bugs of its own.
> It certainly doesn't sound like a better back-patch candidate than the
> other thing.
>

Right, the start of the paragraph was meant to transition from backpatch to
forward looking.

Continuing the forward looking part:

I've given up on fastupdate for 9.4, and turned it off for my indexes.  As
implemented it seems like a rather niche solution.  So it is kind of
unfortunate that it is on by default, and that there is no way to turn it
off except for each individual index separately.  Hopefully we can make it
less niche.  Or maybe the niche is what I (and apparently Mr. Kehlet)
are trying to do.

There seems to be two ways for fastupdate to help:

1) Let someone else do it, in the background.

That is pretty much useless from my perspective, because there is no way to
get someone else to actually do it as often as it is needed to be done.  I
can create an extension to expose the cleanup call to SQL, and then setup a
cron job (or a bgworker) to run that very frequently, or frequently poll to
decide it should be run.  That works, kind of, if this type of thing is
important enough for you to go through all that (It is not important enough
to me, for production use, at this point. I'd rather just set fastupdate to
off, but at least it is available if I need it).  Except, this is still a
serial job, and there is no way around that without turning fastupdate
off.  You can have a RAID of 30 disks, and the clean up process is still
going to have 1 IO outstanding at a time.  With 

Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 Attached is a patch to deal with this without the heavyweight locks.
 I realized that using the clean up lock on the meta page, rather than the
 pending head page, would be easier to implement as a pin was already held
 on the meta page throughout, and the pending head page can change during
 the cleanup if we need multiple passes.
 ...
 I still prefer the heavy-weight approach.  The buffer clean up lock for
 vacuuming seems fragile to start with, and abusing it for other purposes
 doesn't improve on that.

FWIW, I would go with the heavyweight lock approach as well.  The extra
cycles needed for a heavyweight lock don't seem significant in this
context, and you have far more control over which other operations
conflict or don't conflict with the lock.  Taking a buffer cleanup lock on
the metapage sounds really scary from that viewpoint; it's basically going
to conflict with everything else, even if the other operations only take
it for short intervals, and you have no freedom to adjust that.

Your earlier point about how the current design throttles insertions to
keep the pending list from growing without bound seems like a bigger deal
to worry about.  I think we'd like to have some substitute for that.
Perhaps we could make the logic in insertion be something like

if (pending-list-size  threshold)
{
if (conditional-lock-acquire(...))
{
do-pending-list-cleanup;
lock-release;
}
else if (pending-list-size  threshold * 2)
{
unconditional-lock-acquire(...);
if (pending-list-size  threshold)
do-pending-list-cleanup;
lock-release;
}
}

so that once the pending list got too big, incoming insertions would wait
for it to be cleared.  Whether to use a 2x safety margin or something else
could be a subject for debate, of course.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Jeff Janes
On Sat, Aug 22, 2015 at 11:25 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  User backends attempt to take the lock conditionally, because otherwise
 they
  would cause an autovacuum already holding the lock to cancel itself,
 which
  seems quite bad.
 
  Not that this a substantial behavior change, in that with this code the
 user
  backends which find the list already being cleaned will just add to the
 end
  of the pending list and go about their business.  So if things are
 added to
  the list faster than they can be cleaned up, the size of the pending
 list
  can increase without bound.
 
  Under the existing code each concurrent user backend will try to clean
 the
  pending list at the same time.  The work doesn't parallelize, so doing
 this
  is just burns CPU (and possibly consuming up to maintenance_work_mem
 for
  *each* backend) but it does server to throttle the insertion rate and so
  keep the list from growing without bound.
 
  This is just a proof-of-concept patch, because I don't know if this
 approach
  is the right approach.

 I'm not sure if this is the right approach, but I'm a little wary of
 involving the heavyweight lock manager in this.  If pending list
 cleanups are frequent, this could involve a lot of additional lock
 manager traffic, which could be bad for performance.



 Even if they are
 infrequent, it seems like it would be more natural to handle this
 without some regime of locks and pins and buffer cleanup locks on the
 buffers that are storing the pending list, rather than a heavyweight
 lock on the whole relation.  But I am just waving my hands wildly
 here.


 I also thought of a buffer clean up lock on the pending list head buffer
 to represent the right to do a clean up.  But with the proviso that once
 you have obtained the clean up lock, you can then drop the exclusive buffer
 content lock and continue to hold the conceptual lock just by maintaining
 the pin.  I think that this would be semantically correct, but backends
 doing a cleanup would have to get the lock conditionally, and I think you
 would have too many chances for false failures where it bails out when the
 other party simply holds a pin.  I guess I could implement it and see how
 it fairs in my test case.


Attached is a patch to deal with this without the heavyweight locks.

I realized that using the clean up lock on the meta page, rather than the
pending head page, would be easier to implement as a pin was already held
on the meta page throughout, and the pending head page can change during
the cleanup if we need multiple passes.

Also, I think the clean up lock on the metapage should actually be easier.
All queries need to visit the pending head, and they hold it long enough to
check all the keys (possibly hundreds, checked with arbitrarily slow SQL
functions) on that page.  The metapage is only checked for a few variables
which are C types.

I thought of checking for metadata-head == InvalidBlockNumber with a
sharelock before getting the clean-up lock and then again after, but decide
against it as the user backends already check that immediately before
calling this function, and wouldn't call it if there was no pending list
as-of that check.

I exchange the exclusive context lock given to us by
the LockBufferForCleanup for a share content lock, so as to not hold the
exclusive lock over the IO possibly needed to read the pending head page
into buffers.  I don't know that this is actually a win.

This fixed the same problem I was seeing that was fixed by the previous
heavy-weight lock patch.

I still prefer the heavy-weight approach.  The buffer clean up lock for
vacuuming seems fragile to start with, and abusing it for other purposes
doesn't improve on that.

Whichever approach is taken, more work is needed on the comments.  And the
code that currently checks for concurrent cleanup and bails out needs to be
changed to throw errors or something instead.  But I don't want to make too
many changes until I know which approach to take, and whether it will be
back-patched.

Cheers,

Jeff


gin_pending_lwlock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Jeff Janes
On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Janes jeff.ja...@gmail.com writes:

 Your earlier point about how the current design throttles insertions to
 keep the pending list from growing without bound seems like a bigger deal
 to worry about.  I think we'd like to have some substitute for that.
 Perhaps we could make the logic in insertion be something like

 if (pending-list-size  threshold)
 {
 if (conditional-lock-acquire(...))
 {
 do-pending-list-cleanup;
 lock-release;
 }
 else if (pending-list-size  threshold * 2)
 {
 unconditional-lock-acquire(...);
 if (pending-list-size  threshold)
 do-pending-list-cleanup;
 lock-release;
 }
 }

 so that once the pending list got too big, incoming insertions would wait
 for it to be cleared.  Whether to use a 2x safety margin or something else
 could be a subject for debate, of course.


If the goal is to not change existing behavior (like for back patching) the
margin should be 1, always wait.  But we would still have to deal with the
fact that unconditional acquire attempt by the backends will cause a vacuum
to cancel itself, which is undesirable.  If we define a new namespace for
this lock (like the relation extension lock has its own namespace) then
perhaps the cancellation code could be made to not cancel on that
condition.  But that too seems like a lot of work to backpatch.

Would we bother to back-patch a theoretical bug which there is no evidence
is triggering in the field?  Of course, if people are getting bit by this,
they probably wouldn't know.  You search for malevolent unicorns, get no
hits, and just assume there are no hits, without scouring the table and
seeing it is an index problem.  Or if you do realize it is an index
problem, you would probably never trace it back to the cause of the
problem.  There are quite a few reports of mysterious index corruptions
which never get resolved.

If we want to improve the current behavior rather than fix a bug, then I
think that if the list is greater than threshold*2 and the cleaning lock is
unavailable, what it should do is proceed to insert the tuple's keys into
the index itself, as if fastupdate = off.  That would require some major
surgery to the existing code, as by the time it invokes the clean up, it is
too late to not insert into the pending list.


Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-08-30 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Your earlier point about how the current design throttles insertions to
 keep the pending list from growing without bound seems like a bigger deal
 to worry about.  I think we'd like to have some substitute for that.
 ...

 If the goal is to not change existing behavior (like for back patching) the
 margin should be 1, always wait.

The current behavior is buggy, both as to performance and correctness,
so I'm not sure how come don't change the behavior should be a
requirement.

 But we would still have to deal with the
 fact that unconditional acquire attempt by the backends will cause a vacuum
 to cancel itself, which is undesirable.

Good point.

 If we define a new namespace for
 this lock (like the relation extension lock has its own namespace) then
 perhaps the cancellation code could be made to not cancel on that
 condition.  But that too seems like a lot of work to backpatch.

We could possibly teach the autocancel logic to distinguish this lock type
from others without using a new namespace.  That seems a bit klugy, but
maybe better than adding a new namespace.  (For example, there are
probably only a couple of modes in which we take page-level locks at
present.  Choosing a currently unused, but self-exclusive, mode for taking
such a lock might serve.)

 Would we bother to back-patch a theoretical bug which there is no evidence
 is triggering in the field?

What's theoretical about it?  You seemed pretty sure that the issue in
http://www.postgresql.org/message-id/flat/CA+bfosGVGVQhMAa=0-mue6coo7dbsgayxb-xsnr5vm-s39h...@mail.gmail.com
was exactly this.

 If we want to improve the current behavior rather than fix a bug, then I
 think that if the list is greater than threshold*2 and the cleaning lock is
 unavailable, what it should do is proceed to insert the tuple's keys into
 the index itself, as if fastupdate = off.  That would require some major
 surgery to the existing code, as by the time it invokes the clean up, it is
 too late to not insert into the pending list.

Meh.  That's introducing a whole new behavioral regime, which quite aside
from correctness bugs might introduce new performance bugs of its own.
It certainly doesn't sound like a better back-patch candidate than the
other thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-22 Thread Jeff Janes
On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  User backends attempt to take the lock conditionally, because otherwise
 they
  would cause an autovacuum already holding the lock to cancel itself,
 which
  seems quite bad.
 
  Not that this a substantial behavior change, in that with this code the
 user
  backends which find the list already being cleaned will just add to the
 end
  of the pending list and go about their business.  So if things are added
 to
  the list faster than they can be cleaned up, the size of the pending list
  can increase without bound.
 
  Under the existing code each concurrent user backend will try to clean
 the
  pending list at the same time.  The work doesn't parallelize, so doing
 this
  is just burns CPU (and possibly consuming up to maintenance_work_mem  for
  *each* backend) but it does server to throttle the insertion rate and so
  keep the list from growing without bound.
 
  This is just a proof-of-concept patch, because I don't know if this
 approach
  is the right approach.

 I'm not sure if this is the right approach, but I'm a little wary of
 involving the heavyweight lock manager in this.  If pending list
 cleanups are frequent, this could involve a lot of additional lock
 manager traffic, which could be bad for performance.


I don't think 10 cleanups a second should be a problem (and most of those
would probably fail to acquire the lock, but I don't know if that would
make a difference).  If there were several hundred per second, I think you
would have bigger problems than traffic through the lock manager.  In that
case, it is time to either turn off fastupdate, or increase the pending
list size.

As a mini-vacuum, it seems natural to me to hold a lock of the same nature
as a regular vacuum, but just on the one index involved rather than the
hole table.


 Even if they are
 infrequent, it seems like it would be more natural to handle this
 without some regime of locks and pins and buffer cleanup locks on the
 buffers that are storing the pending list, rather than a heavyweight
 lock on the whole relation.  But I am just waving my hands wildly
 here.


I also thought of a buffer clean up lock on the pending list head buffer to
represent the right to do a clean up.  But with the proviso that once you
have obtained the clean up lock, you can then drop the exclusive buffer
content lock and continue to hold the conceptual lock just by maintaining
the pin.  I think that this would be semantically correct, but backends
doing a cleanup would have to get the lock conditionally, and I think you
would have too many chances for false failures where it bails out when the
other party simply holds a pin.  I guess I could implement it and see how
it fairs in my test case.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-08-18 Thread Robert Haas
On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 User backends attempt to take the lock conditionally, because otherwise they
 would cause an autovacuum already holding the lock to cancel itself, which
 seems quite bad.

 Not that this a substantial behavior change, in that with this code the user
 backends which find the list already being cleaned will just add to the end
 of the pending list and go about their business.  So if things are added to
 the list faster than they can be cleaned up, the size of the pending list
 can increase without bound.

 Under the existing code each concurrent user backend will try to clean the
 pending list at the same time.  The work doesn't parallelize, so doing this
 is just burns CPU (and possibly consuming up to maintenance_work_mem  for
 *each* backend) but it does server to throttle the insertion rate and so
 keep the list from growing without bound.

 This is just a proof-of-concept patch, because I don't know if this approach
 is the right approach.

I'm not sure if this is the right approach, but I'm a little wary of
involving the heavyweight lock manager in this.  If pending list
cleanups are frequent, this could involve a lot of additional lock
manager traffic, which could be bad for performance.  Even if they are
infrequent, it seems like it would be more natural to handle this
without some regime of locks and pins and buffer cleanup locks on the
buffers that are storing the pending list, rather than a heavyweight
lock on the whole relation.  But I am just waving my hands wildly
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Jeff Janes
On Mon, Aug 17, 2015 at 6:23 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 On Aug 16, 2015 11:49 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 
  On 08/16/2015 12:58 AM, Jeff Janes wrote:
 
  When ginbulkdelete gets called for the first time in a  VACUUM(i.e.
 stats
  == NULL), one of the first things it does is call ginInsertCleanup to
 get
  rid of the pending list.  It does this in lieu of vacuuming the pending
  list.
 
  This is important because if there are any dead tids still in the
 Pending
  list, someone else could come along during the vacuum and post the dead
  tids into a part of the index that VACUUM has already passed over.
 
  The potential bug is that ginInsertCleanup exits early (ginfast.c lines
  796, 860, 898) if it detects that someone else is cleaning up the
 pending
  list, without waiting for that someone else to finish the job.
 
  Isn't this a problem?
 
 
  Yep, I think you're right. When that code runs as part of VACUUM, it
 should not give up like that.
 
  Hmm, I see other race conditions in that code too. Even if VACUUM wins
 the race you spotted, and performs all the insertions, reaches the end of
 the pending items list, and deletes the pending list pages, it's possible
 that another backend started earlier, and is still processing the same
 items from the pending items list. It will add them to the tree, and after
 it's finished with that it will see that the pending list page was already
 deleted, and bail out. But if there is a dead tuple in the pending items
 list, you have trouble. The other backend will re-insert it, and that might
 happen after VACUUM had already removed it from the tree.

 Could the right to clean the pending list be represented by a
 self-conflicting heavy weight lock on the index?  Vacuum could block on it,
 while user back-ends could try to get it conditionally and just give up on
 the cleanup if it is not available.

 
  Also, ginInsertCleanup() seems to assume that if another backend has
 just finished cleaning up the same page, it will see the page marked as
 deleted. But what if the page is not only marked as deleted, but also
 reused for something else already?

 Yeah.  Which is possible but pretty unlikely now; but would be far more
 likely if we added these page to the fsm more aggressively.


The attached patch takes a ShareUpdateExclusiveLock lock on the index in
order to clean the pending list.

This fixes the problem I had been seeing when testing
https://commitfest.postgresql.org/6/322/ (which was probably caused by the
deleted page situation you mentioned, not by tids getting revived issue I
originally brought up.)

User backends attempt to take the lock conditionally, because otherwise
they would cause an autovacuum already holding the lock to cancel itself,
which seems quite bad.

Not that this a substantial behavior change, in that with this code the
user backends which find the list already being cleaned will just add to
the end of the pending list and go about their business.  So if things are
added to the list faster than they can be cleaned up, the size of the
pending list can increase without bound.

Under the existing code each concurrent user backend will try to clean the
pending list at the same time.  The work doesn't parallelize, so doing this
is just burns CPU (and possibly consuming up to maintenance_work_mem  for
*each* backend) but it does server to throttle the insertion rate and so
keep the list from growing without bound.

This is just a proof-of-concept patch, because I don't know if this
approach is the right approach.

One potential problem is how it will interact with create index
concurrently.

Cheers,

Jeff


gin_pending_lock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Alvaro Herrera
Jeff Janes wrote:

 The attached patch takes a ShareUpdateExclusiveLock lock on the index in
 order to clean the pending list.

Does it take a lock on the table also?  Because if not ...

 One potential problem is how it will interact with create index
 concurrently.

... then I don't understand how you could have a problem here.  Surely
no pending list cleanup can happen concurrently with the index being
created?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Jeff Janes
On Mon, Aug 17, 2015 at 3:02 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Jeff Janes wrote:

  The attached patch takes a ShareUpdateExclusiveLock lock on the index in
  order to clean the pending list.

 Does it take a lock on the table also?  Because if not ...


There must be some kind of lock held on the table already (either
RowExclusive at least for user backends or ShareRowExclusive for vacuum
backends), but I don't do anything in this patch with it.



  One potential problem is how it will interact with create index
  concurrently.

 ... then I don't understand how you could have a problem here.  Surely
 no pending list cleanup can happen concurrently with the index being
 created?


While grepping through the code looking for precedent, I noticed that
create index concurrently takes a ShareUpdateExclusiveLock on both table
and index. I don't know why it needs one on the index, I didn't investigate
it thoroughly.

During the last stage of the create index concurrently, inserters and
updaters are obliged to maintain the index even though they can't use it
yet.  So that would mean adding to the pending list, which might get big
enough to need cleaning.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Heikki Linnakangas

On 08/16/2015 12:58 AM, Jeff Janes wrote:

When ginbulkdelete gets called for the first time in a  VACUUM(i.e. stats
== NULL), one of the first things it does is call ginInsertCleanup to get
rid of the pending list.  It does this in lieu of vacuuming the pending
list.

This is important because if there are any dead tids still in the Pending
list, someone else could come along during the vacuum and post the dead
tids into a part of the index that VACUUM has already passed over.

The potential bug is that ginInsertCleanup exits early (ginfast.c lines
796, 860, 898) if it detects that someone else is cleaning up the pending
list, without waiting for that someone else to finish the job.

Isn't this a problem?


Yep, I think you're right. When that code runs as part of VACUUM, it 
should not give up like that.


Hmm, I see other race conditions in that code too. Even if VACUUM wins 
the race you spotted, and performs all the insertions, reaches the end 
of the pending items list, and deletes the pending list pages, it's 
possible that another backend started earlier, and is still processing 
the same items from the pending items list. It will add them to the 
tree, and after it's finished with that it will see that the pending 
list page was already deleted, and bail out. But if there is a dead 
tuple in the pending items list, you have trouble. The other backend 
will re-insert it, and that might happen after VACUUM had already 
removed it from the tree.


Also, ginInsertCleanup() seems to assume that if another backend has 
just finished cleaning up the same page, it will see the page marked as 
deleted. But what if the page is not only marked as deleted, but also 
reused for something else already?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Potential GIN vacuum bug

2015-08-15 Thread Jeff Janes
When ginbulkdelete gets called for the first time in a  VACUUM(i.e. stats
== NULL), one of the first things it does is call ginInsertCleanup to get
rid of the pending list.  It does this in lieu of vacuuming the pending
list.

This is important because if there are any dead tids still in the Pending
list, someone else could come along during the vacuum and post the dead
tids into a part of the index that VACUUM has already passed over.

The potential bug is that ginInsertCleanup exits early (ginfast.c lines
796, 860, 898) if it detects that someone else is cleaning up the pending
list, without waiting for that someone else to finish the job.

Isn't this a problem?

Cheers,

Jeff