Heikki Linnakangas wrote:
> I finally got around to look at this. Attached patch adds a
> HASH_FIXED_SIZE flag, which disables the allocation of new entries
> after the initial allocation. I believe we have consensus to make
> the predicate lock hash tables fixed-size, so that there's no
> comp
On 03.04.2011 09:16, Dan Ports wrote:
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
On 11.04.2011 11:33, Heikki Linnakangas wrote:
On 31.03.2011 22:06, Kevin Grittner wrote:
Heikki Linnakangas wrote:
That's not enough. The hash tables can grow beyond the maximum
size you specify in ShmemInitHash. It's just a hint to size the
directory within the hash table.
We'll need to tea
On 31.03.2011 22:06, Kevin Grittner wrote:
Heikki Linnakangas wrote:
That's not enough. The hash tables can grow beyond the maximum
size you specify in ShmemInitHash. It's just a hint to size the
directory within the hash table.
We'll need to teach dynahash not to allocate any more entries
af
On 11.04.2011 11:33, Heikki Linnakangas wrote:
I also noticed that there's a few hash_search(HASH_ENTER) calls in
predicate.c followed by check for a NULL result. But with HASH_ENTER,
hash_search never returns NULL, it throws an "out of shared memory"
error internally. I changed those calls to us
hi,
> hi,
>
>> I think I see what is going on now. We are sometimes failing to set the
>> commitSeqNo correctly on the lock. In particular, if a lock assigned to
>> OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
>> cleared.
>>
>> The attached patch corrects this:
>> Trans
hi,
> YAMAMOTO Takashi wrote:
>
>> LOG: could not truncate directory "pg_serial": apparent
>> wraparound
>
> Did you get a warning with this text?:
>
> memory for serializable conflict tracking is nearly exhausted
there is not such a warning near the above "aparent wraparound" record.
not
I wrote:
> YAMAMOTO Takashi wrote:
>
>> LOG: could not truncate directory "pg_serial": apparent
>> wraparound
> there's some sort of cleanup bug to fix in the predicate
> locking's use of SLRU. It may be benign, but we won't really know
> until we find it. I'm investigating.
I'm pretty sur
YAMAMOTO Takashi wrote:
> LOG: could not truncate directory "pg_serial": apparent
> wraparound
Did you get a warning with this text?:
memory for serializable conflict tracking is nearly exhausted
If not, there's some sort of cleanup bug to fix in the predicate
locking's use of SLRU. It ma
hi,
> I think I see what is going on now. We are sometimes failing to set the
> commitSeqNo correctly on the lock. In particular, if a lock assigned to
> OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
> cleared.
>
> The attached patch corrects this:
> TransferPredicateLock
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
TransferPredicateLocksToNewTarget should
Heikki Linnakangas wrote:
> That's not enough. The hash tables can grow beyond the maximum
> size you specify in ShmemInitHash. It's just a hint to size the
> directory within the hash table.
>
> We'll need to teach dynahash not to allocate any more entries
> after the preallocation. A new HASH
On 31.03.2011 21:23, Kevin Grittner wrote:
Dan Ports wrote:
On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
The only thing I've been on the fence about is whether it
makes more sense to allocate it all up front or to continue to
allow
incremental allocation but set a hard lim
Dan Ports wrote:
> On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
>> The only thing I've been on the fence about is whether it
>> makes more sense to allocate it all up front or to continue to
allow
>> incremental allocation but set a hard limit on the number of
entries
>> allocat
On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
> The only thing I've been on the fence about is whether it
> makes more sense to allocate it all up front or to continue to allow
> incremental allocation but set a hard limit on the number of entries
> allocated for each shared memor
Heikki Linnakangas wrote:
> Did we get anywhere with the sizing of the various shared memory
> structures? Did we find the cause of the "out of shared memory"
> warnings?
The patch you just committed is related to that. Some tuple locks
for summarized transactions were not getting cleaned up
On 31.03.2011 16:31, Kevin Grittner wrote:
I've stared at the code for hours and have only come up with one
race condition which can cause this, although the window is so small
it's hard to believe that you would get this volume of orphaned
locks. I'll keep looking, but if you could try this to
YAMAMOTO Takashi wrote:
> hoge=# select locktype,count(*) from pg_locks group by locktype;
> -[ RECORD 1 ]
> locktype | virtualxid
> count| 1
> -[ RECORD 2 ]
> locktype | relation
> count| 1
> -[ RECORD 3 ]
> locktype | tuple
> count| 7061
I've stared at t
YAMAMOTO Takashi wrote:
>>> [no residual SIReadLock]
>
> i read it as there are many (7057) SIReadLocks somehow leaked.
> am i wrong?
No, I am. Could you send the full SELECT * of pg_locks when this is
manifest? (Probably best to do that off-list.)
-Kevin
--
Sent via pgsql-hackers maili
hi,
>> [no residual SIReadLock]
i read it as there are many (7057) SIReadLocks somehow leaked.
am i wrong?
YAMAMOTO Takashi
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
hi,
>>> (6) Does the application continue to run relatively sanely, or
>>> does it fall over at this point?
>>
>> my application just exits on the error.
>>
>> if i re-run the application without rebooting postgres, it seems
>> that i will get the error sooner than the first run. (but it might
>
YAMAMOTO Takashi wrote:
> this psql session was the only activity to the server at this
> point.
> [no residual SIReadLock]
>> Right, that's because we were using HASH_ENTER instead of
>> HASH_ENTER_NULL. I've posted a patch which should correct that.
> sure, with your patch it seems that
YAMAMOTO Takashi wrote:
> Kevin Grittner wrote:
>> (1) Could you post the non-default configuration settings?
>
> none. it can happen with just initdb+createdb'ed database.
>
>> (2) How many connections are in use in your testing?
>
> 4.
>
>> (3) Can you give a rough categorization of how
Tom Lane wrote:
> There might perhaps be some value in adding a warning like this if
> it were enabled per-table (and not enabled by default).
It only fires where a maximum has been declared and is exceeded.
Most HTABs don't declare a maximum -- they leave it at zero. These
are ignored. Whe
Robert Haas wrote:
> I don't see much advantage in changing these to asserts - in a
> debug build, that will promote ERROR to PANIC; whereas in a
> production build, they'll cause a random failure somewhere
> downstream.
The reason Assert is appropriate is that it is *impossible* to hit
that c
On Fri, Mar 25, 2011 at 04:06:30PM -0400, Tom Lane wrote:
> Up to now, I believe the lockmgr's lock table is the only shared hash
> table that is expected to grow past the declared size; that can happen
> anytime a session exceeds max_locks_per_transaction, which we consider
> to be only a soft lim
On Fri, Mar 25, 2011 at 4:06 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
>> wrote:
I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of entries declared as its
max_size when it was crea
Robert Haas writes:
> On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
> wrote:
>>> I'm still looking at whether it's sane to try to issue a warning
>>> when an HTAB exceeds the number of entries declared as its
>>> max_size when it was created.
> I don't think it's too late to commit something l
On Fri, Mar 18, 2011 at 4:51 PM, Kevin Grittner
wrote:
> Dan Ports wrote:
>
>> I am surprised to see that error message without SSI's hint about
>> increasing max_predicate_locks_per_xact.
>
> After reviewing this, I think something along the following lines
> might be needed, for a start. I'm n
On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
wrote:
> "Kevin Grittner" wrote:
>
>> I'm still looking at whether it's sane to try to issue a warning
>> when an HTAB exceeds the number of entries declared as its
>> max_size when it was created.
>
> I think this does it.
>
> If nothing else, it m
hi,
> YAMAMOTO Takashi wrote:
>
>> thanks for quickly fixing problems.
>
> Thanks for the rigorous testing. :-)
>
>> i tested the later version
>> (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
>> was "out of shared memory". i'm not sure if it was caused by SSI
>> activi
"Kevin Grittner" wrote:
> I'm still looking at whether it's sane to try to issue a warning
> when an HTAB exceeds the number of entries declared as its
> max_size when it was created.
I think this does it.
If nothing else, it might be instructive to use it while testing the
SSI patch. Would
Dan Ports wrote:
> I am surprised to see that error message without SSI's hint about
> increasing max_predicate_locks_per_xact.
After reviewing this, I think something along the following lines
might be needed, for a start. I'm not sure the Asserts are actually
needed; they basically are chec
It would probably also be worth monitoring the size of pg_locks to see
how many predicate locks are being held.
On Fri, Mar 18, 2011 at 12:50:16PM -0500, Kevin Grittner wrote:
> Even with the above information it may be far from clear where
> allocations are going past their maximum, since one HT
YAMAMOTO Takashi wrote:
> thanks for quickly fixing problems.
Thanks for the rigorous testing. :-)
> i tested the later version
> (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
> was "out of shared memory". i'm not sure if it was caused by SSI
> activities or not.
> PG_
hi,
thanks for quickly fixing problems.
i tested the later version (a2eb9e0c08ee73208b5419f5a53a6eba55809b92)
and only errors i got was "out of shared memory". i'm not sure if
it was caused by SSI activities or not.
YAMAMOTO Takashi
the following is a snippet from my application log:
PG_DIAG_
On Tue, Mar 01, 2011 at 07:07:42PM +0200, Heikki Linnakangas wrote:
> Was there test cases for any of the issues fixed by this patch that we
> should add to the suite?
Some of these issues are tricky to test, e.g. some of the code about
transferring predicate locks to a new target doesn't get exe
Heikki Linnakangas wrote:
> committed with minor changes.
Thanks!
> The ordering of the fields in PREDICATELOCKTAG was bizarre, so I
> just expanded the offsetnumber fields to an uint32, instead of
> having the padding field. I think that's a lot more readable.
I can understand that, but I
On 01.03.2011 02:03, Dan Ports wrote:
An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:
- CheckSingleTargetForConflictsIn now correctly handles the case
where a lock that's being held is not reflected in the
An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:
- CheckSingleTargetForConflictsIn now correctly handles the case
where a lock that's being held is not reflected in the local lock
table. This fixes the asser
Heikki Linnakangas wrote:
> On 23.02.2011 07:20, Kevin Grittner wrote:
>> Dan Ports wrote:
>>
>>> The obvious solution to me is to just keep the lock on both the
>>> old and new page.
>>
>> That's the creative thinking I was failing to do. Keeping the
>> old lock will generate some false positiv
On 23.02.2011 07:20, Kevin Grittner wrote:
Dan Ports wrote:
The obvious solution to me is to just keep the lock on both the old
and new page.
That's the creative thinking I was failing to do. Keeping the old
lock will generate some false positives, but it will be rare and
those don't compro
Dan Ports wrote:
> The obvious solution to me is to just keep the lock on both the old
> and new page.
That's the creative thinking I was failing to do. Keeping the old
lock will generate some false positives, but it will be rare and
those don't compromise correctness -- they just carry the c
On Tue, Feb 22, 2011 at 05:54:49PM -0600, Kevin Grittner wrote:
> I'm not sure it's safe to assume that the index page won't get
> reused before the local lock information is cleared. In the absence
> of a clear proof that it is safe, or some enforcement mechanism to
> ensure that it is, I don't t
Dan Ports wrote:
> On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:
> The theory was before that the local lock table would only have
> false negatives, i.e. if it says we hold a lock then we really do.
> That makes it a useful heuristic because we can bail out quickly
> if we're
On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:
> Dan Ports wrote:
>
> > It looks like CheckTargetForConflictsIn is making the assumption
> > that the backend-local lock table is accurate, which was probably
> > even true at the time it was written.
>
> I remember we decided th
hi,
> "Kevin Grittner" wrote:
>
>> I'm proceeding on this basis.
>
> Result attached. I found myself passing around the tuple xmin value
> just about everywhere that the predicate lock target tag was being
> passed, so it finally dawned on me that this logically belonged as
> part of the targe
Dan Ports wrote:
> It looks like CheckTargetForConflictsIn is making the assumption
> that the backend-local lock table is accurate, which was probably
> even true at the time it was written.
I remember we decided that it could only be false in certain ways
which allowed us to use it as a "los
On Mon, Feb 21, 2011 at 11:42:36PM +, YAMAMOTO Takashi wrote:
> i tested ede45e90dd1992bfd3e1e61ce87bad494b81f54d + ssi-multi-update-1.patch
> with my application and got the following assertion failure.
>
> #4 0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
> at predicate.c
"Kevin Grittner" wrote:
> I'm proceeding on this basis.
Result attached. I found myself passing around the tuple xmin value
just about everywhere that the predicate lock target tag was being
passed, so it finally dawned on me that this logically belonged as
part of the target tag. That simplifi
> Heikki Linnakangas wrote:
> On 14.02.2011 20:10, Kevin Grittner wrote:
>> Promotion of the lock granularity on the prior tuple is where we
>> have problems. If the two tuple versions are in separate pages
>> then the second UPDATE could miss the conflict. My first thought
>> was to fix that by
On Thu, Feb 17, 2011 at 23:11, Kevin Grittner
wrote:
> Dan Ports wrote:
>
>> Oops. Those are both definitely bugs (and my fault). Your patch
>> looks correct. Thanks for catching that!
>
> Could a committer please apply the slightly modified version here?:
>
> http://archives.postgresql.org/messa
Dan Ports wrote:
> Oops. Those are both definitely bugs (and my fault). Your patch
> looks correct. Thanks for catching that!
Could a committer please apply the slightly modified version here?:
http://archives.postgresql.org/message-id/4d5c46bb02250003a...@gw.wicourts.gov
It is a prett
On Wed, Feb 16, 2011 at 10:13:35PM +, YAMAMOTO Takashi wrote:
> i got the following SEGV when runnning vacuum on a table.
> (the line numbers in predicate.c is different as i have local modifications.)
> oldlocktag.myTarget was NULL.
> it seems that TransferPredicateLocksToNewTarget sometimes u
hi,
might be unrelated to the loop problem, but...
i got the following SEGV when runnning vacuum on a table.
(the line numbers in predicate.c is different as i have local modifications.)
oldlocktag.myTarget was NULL.
it seems that TransferPredicateLocksToNewTarget sometimes use stack garbage
for
hi,
> YAMAMOTO Takashi wrote:
>
>> might be unrelated to the loop problem, but...
>
> Aha! I think it *is* related. There were several places where data
> was uninitialized here; mostly because Dan was working on this piece
> while I was working on separate issues which added the new fields
hi,
> YAMAMOTO Takashi wrote:
>
>> with your previous patch or not?
>
> With, thanks.
i tried. unfortunately i can still reproduce the original loop problem.
WARNING: [0] target 0xbb51ef18 tag 4000:4017:7e3:78:0 prior 0xbb51f148 next 0xb
b51edb0
WARNING: [1] target 0xbb51f148 tag 4000:40
YAMAMOTO Takashi wrote:
> with your previous patch or not?
With, thanks.
-Kevin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
YAMAMOTO Takashi wrote:
> might be unrelated to the loop problem, but...
Aha! I think it *is* related. There were several places where data
was uninitialized here; mostly because Dan was working on this piece
while I was working on separate issues which added the new fields.
I missed the int
YAMAMOTO Takashi wrote:
> might be unrelated to the loop problem, but...
>
> i got the following SEGV when runnning vacuum on a table.
> vacuum on the table succeeded with the attached patch.
Thanks! I appreciate the heavy testing and excellent diagnostics.
On the face of it, this doesn't
On 14.02.2011 20:10, Kevin Grittner wrote:
Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages then
the second UPDATE could miss the conflict. My first thought was to
fix that by requiring promotion of a predicate lock o
YAMAMOTO Takashi wrote:
>> Did you notice whether the loop involved multiple tuples within a
>> single page?
>
> if i understand correctly, yes.
>
> the following is a snippet of my debug code (dump targets when
> triggerCheckTargetForConflictsIn loops >1000 times) and its
> output.the same lo
Heikki Linnakangas wrote:
> Looking at the prior/next version chaining, aside from the
> looping issue, isn't it broken by lock promotion too? There's a
> check in RemoveTargetIfNoLongerUsed() so that we don't release a
> lock target if its priorVersionOfRow is set, but what if the tuple
> lock
Looking at the prior/next version chaining, aside from the looping
issue, isn't it broken by lock promotion too? There's a check in
RemoveTargetIfNoLongerUsed() so that we don't release a lock target if
its priorVersionOfRow is set, but what if the tuple lock is promoted to
a page level lock fi
hi,
all of the following answers are with the patch you provided in
other mail applied.
> YAMAMOTO Takashi wrote:
>
>> i have seen this actually happen. i've confirmed the creation of
>> the loop with the attached patch. it's easily reproducable with
>> my application. i can provide the full
hi,
> I wrote:
>
>> it seems likely that such a cycle might be related to this new
>> code not properly allowing for some aspect of tuple cleanup.
>
> I found a couple places where cleanup could let these fall through
> the cracks long enough to get stale and still be around when a tuple
> ID
I wrote:
> it seems likely that such a cycle might be related to this new
> code not properly allowing for some aspect of tuple cleanup.
I found a couple places where cleanup could let these fall through
the cracks long enough to get stale and still be around when a tuple
ID is re-used, causing
YAMAMOTO Takashi wrote:
> i have seen this actually happen. i've confirmed the creation of
> the loop with the attached patch. it's easily reproducable with
> my application. i can provide the full source code of my
> application if you want. (but it isn't easy to run unless you are
> familiar
hi,
> YAMAMOTO Takashi wrote:
>
>> it seems that PredicateLockTupleRowVersionLink sometimes create
>> a loop of targets (it founds an existing 'newtarget' whose
>> nextVersionOfRow chain points to the 'oldtarget') and it later
>> causes CheckTargetForConflictsIn loop forever.
>
> Is this a hy
YAMAMOTO Takashi wrote:
> it seems that PredicateLockTupleRowVersionLink sometimes create
> a loop of targets (it founds an existing 'newtarget' whose
> nextVersionOfRow chain points to the 'oldtarget') and it later
> causes CheckTargetForConflictsIn loop forever.
Is this a hypothetical risk b
hi,
it seems that PredicateLockTupleRowVersionLink sometimes create
a loop of targets (it founds an existing 'newtarget' whose nextVersionOfRow
chain points to the 'oldtarget') and it later causes
CheckTargetForConflictsIn loop forever.
YAMAMOTO Takashi
--
Sent via pgsql-hackers mailing list (p
71 matches
Mail list logo