hi,
hi,
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
On 11.04.2011 11:33, Heikki Linnakangas wrote:
I also noticed that there's a few hash_search(HASH_ENTER) calls in
predicate.c followed by check for a NULL result. But with HASH_ENTER,
hash_search never returns NULL, it throws an out of shared memory
error internally. I changed those calls to use
On 31.03.2011 22:06, Kevin Grittner wrote:
Heikki Linnakangasheikki.linnakan...@enterprisedb.com wrote:
That's not enough. The hash tables can grow beyond the maximum
size you specify in ShmemInitHash. It's just a hint to size the
directory within the hash table.
We'll need to teach dynahash
On 11.04.2011 11:33, Heikki Linnakangas wrote:
On 31.03.2011 22:06, Kevin Grittner wrote:
Heikki Linnakangasheikki.linnakan...@enterprisedb.com wrote:
That's not enough. The hash tables can grow beyond the maximum
size you specify in ShmemInitHash. It's just a hint to size the
directory
On 03.04.2011 09:16, Dan Ports wrote:
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
I finally got around to look at this. Attached patch adds a
HASH_FIXED_SIZE flag, which disables the allocation of new entries
after the initial allocation. I believe we have consensus to make
the predicate lock hash tables
hi,
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
LOG: could not truncate directory pg_serial: apparent
wraparound
Did you get a warning with this text?:
memory for serializable conflict tracking is nearly exhausted
there is not such a warning near the above aparent wraparound
hi,
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
LOG: could not truncate directory pg_serial: apparent
wraparound
Did you get a warning with this text?:
memory for serializable conflict tracking is nearly exhausted
If not, there's some sort of cleanup bug to fix in the predicate
locking's
I wrote:
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
LOG: could not truncate directory pg_serial: apparent
wraparound
there's some sort of cleanup bug to fix in the predicate
locking's use of SLRU. It may be benign, but we won't really know
until we find it. I'm investigating.
I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.
The attached patch corrects this:
TransferPredicateLocksToNewTarget
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
hoge=# select locktype,count(*) from pg_locks group by locktype;
-[ RECORD 1 ]
locktype | virtualxid
count| 1
-[ RECORD 2 ]
locktype | relation
count| 1
-[ RECORD 3 ]
locktype | tuple
count| 7061
On 31.03.2011 16:31, Kevin Grittner wrote:
I've stared at the code for hours and have only come up with one
race condition which can cause this, although the window is so small
it's hard to believe that you would get this volume of orphaned
locks. I'll keep looking, but if you could try this to
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
Did we get anywhere with the sizing of the various shared memory
structures? Did we find the cause of the out of shared memory
warnings?
The patch you just committed is related to that. Some tuple locks
for summarized
On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
The only thing I've been on the fence about is whether it
makes more sense to allocate it all up front or to continue to allow
incremental allocation but set a hard limit on the number of entries
allocated for each shared memory
Dan Ports d...@csail.mit.edu wrote:
On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
The only thing I've been on the fence about is whether it
makes more sense to allocate it all up front or to continue to
allow
incremental allocation but set a hard limit on the number of
On 31.03.2011 21:23, Kevin Grittner wrote:
Dan Portsd...@csail.mit.edu wrote:
On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
The only thing I've been on the fence about is whether it
makes more sense to allocate it all up front or to continue to
allow
incremental allocation
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
That's not enough. The hash tables can grow beyond the maximum
size you specify in ShmemInitHash. It's just a hint to size the
directory within the hash table.
We'll need to teach dynahash not to allocate any more entries
after
hi,
[no residual SIReadLock]
i read it as there are many (7057) SIReadLocks somehow leaked.
am i wrong?
YAMAMOTO Takashi
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
[no residual SIReadLock]
i read it as there are many (7057) SIReadLocks somehow leaked.
am i wrong?
No, I am. Could you send the full SELECT * of pg_locks when this is
manifest? (Probably best to do that off-list.)
-Kevin
--
Sent via
YAMAMOTO Takashi wrote:
this psql session was the only activity to the server at this
point.
[no residual SIReadLock]
Right, that's because we were using HASH_ENTER instead of
HASH_ENTER_NULL. I've posted a patch which should correct that.
sure, with your patch it seems that they
hi,
(6) Does the application continue to run relatively sanely, or
does it fall over at this point?
my application just exits on the error.
if i re-run the application without rebooting postgres, it seems
that i will get the error sooner than the first run. (but it might
be just a
Robert Haas wrote:
I don't see much advantage in changing these to asserts - in a
debug build, that will promote ERROR to PANIC; whereas in a
production build, they'll cause a random failure somewhere
downstream.
The reason Assert is appropriate is that it is *impossible* to hit
that
Tom Lane wrote:
There might perhaps be some value in adding a warning like this if
it were enabled per-table (and not enabled by default).
It only fires where a maximum has been declared and is exceeded.
Most HTABs don't declare a maximum -- they leave it at zero. These
are ignored.
YAMAMOTO Takashi wrote:
Kevin Grittner wrote:
(1) Could you post the non-default configuration settings?
none. it can happen with just initdb+createdb'ed database.
(2) How many connections are in use in your testing?
4.
(3) Can you give a rough categorization of how many of what
On Fri, Mar 25, 2011 at 04:06:30PM -0400, Tom Lane wrote:
Up to now, I believe the lockmgr's lock table is the only shared hash
table that is expected to grow past the declared size; that can happen
anytime a session exceeds max_locks_per_transaction, which we consider
to be only a soft limit.
hi,
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
thanks for quickly fixing problems.
Thanks for the rigorous testing. :-)
i tested the later version
(a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
was out of shared memory. i'm not sure if it was caused by SSI
On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of entries declared as its
max_size when it was created.
I
On Fri, Mar 18, 2011 at 4:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Dan Ports d...@csail.mit.edu wrote:
I am surprised to see that error message without SSI's hint about
increasing max_predicate_locks_per_xact.
After reviewing this, I think something along the following lines
Robert Haas robertmh...@gmail.com writes:
On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of entries declared as its
max_size when it was created.
I don't
On Fri, Mar 25, 2011 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of
hi,
thanks for quickly fixing problems.
i tested the later version (a2eb9e0c08ee73208b5419f5a53a6eba55809b92)
and only errors i got was out of shared memory. i'm not sure if
it was caused by SSI activities or not.
YAMAMOTO Takashi
the following is a snippet from my application log:
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
thanks for quickly fixing problems.
Thanks for the rigorous testing. :-)
i tested the later version
(a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
was out of shared memory. i'm not sure if it was caused by SSI
activities
It would probably also be worth monitoring the size of pg_locks to see
how many predicate locks are being held.
On Fri, Mar 18, 2011 at 12:50:16PM -0500, Kevin Grittner wrote:
Even with the above information it may be far from clear where
allocations are going past their maximum, since one
Dan Ports d...@csail.mit.edu wrote:
I am surprised to see that error message without SSI's hint about
increasing max_predicate_locks_per_xact.
After reviewing this, I think something along the following lines
might be needed, for a start. I'm not sure the Asserts are actually
needed; they
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of entries declared as its
max_size when it was created.
I think this does it.
If nothing else, it might be instructive to use it while
On 01.03.2011 02:03, Dan Ports wrote:
An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:
- CheckSingleTargetForConflictsIn now correctly handles the case
where a lock that's being held is not reflected in the
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
committed with minor changes.
Thanks!
The ordering of the fields in PREDICATELOCKTAG was bizarre, so I
just expanded the offsetnumber fields to an uint32, instead of
having the padding field. I think that's a lot more
On Tue, Mar 01, 2011 at 07:07:42PM +0200, Heikki Linnakangas wrote:
Was there test cases for any of the issues fixed by this patch that we
should add to the suite?
Some of these issues are tricky to test, e.g. some of the code about
transferring predicate locks to a new target doesn't get
An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:
- CheckSingleTargetForConflictsIn now correctly handles the case
where a lock that's being held is not reflected in the local lock
table. This fixes the
On 23.02.2011 07:20, Kevin Grittner wrote:
Dan Ports wrote:
The obvious solution to me is to just keep the lock on both the old
and new page.
That's the creative thinking I was failing to do. Keeping the old
lock will generate some false positives, but it will be rare and
those don't
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
On 23.02.2011 07:20, Kevin Grittner wrote:
Dan Ports wrote:
The obvious solution to me is to just keep the lock on both the
old and new page.
That's the creative thinking I was failing to do. Keeping the
old lock will generate
Dan Ports d...@csail.mit.edu wrote:
It looks like CheckTargetForConflictsIn is making the assumption
that the backend-local lock table is accurate, which was probably
even true at the time it was written.
I remember we decided that it could only be false in certain ways
which allowed us to
hi,
Kevin Grittner wrote:
I'm proceeding on this basis.
Result attached. I found myself passing around the tuple xmin value
just about everywhere that the predicate lock target tag was being
passed, so it finally dawned on me that this logically belonged as
part of the target tag.
On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:
Dan Ports d...@csail.mit.edu wrote:
It looks like CheckTargetForConflictsIn is making the assumption
that the backend-local lock table is accurate, which was probably
even true at the time it was written.
I remember we
Dan Ports d...@csail.mit.edu wrote:
On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:
The theory was before that the local lock table would only have
false negatives, i.e. if it says we hold a lock then we really do.
That makes it a useful heuristic because we can bail out
On Tue, Feb 22, 2011 at 05:54:49PM -0600, Kevin Grittner wrote:
I'm not sure it's safe to assume that the index page won't get
reused before the local lock information is cleared. In the absence
of a clear proof that it is safe, or some enforcement mechanism to
ensure that it is, I don't
Dan Ports wrote:
The obvious solution to me is to just keep the lock on both the old
and new page.
That's the creative thinking I was failing to do. Keeping the old
lock will generate some false positives, but it will be rare and
those don't compromise correctness -- they just carry the
On Mon, Feb 21, 2011 at 11:42:36PM +, YAMAMOTO Takashi wrote:
i tested ede45e90dd1992bfd3e1e61ce87bad494b81f54d + ssi-multi-update-1.patch
with my application and got the following assertion failure.
#4 0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
at
Kevin Grittner wrote:
I'm proceeding on this basis.
Result attached. I found myself passing around the tuple xmin value
just about everywhere that the predicate lock target tag was being
passed, so it finally dawned on me that this logically belonged as
part of the target tag. That simplified
Heikki Linnakangas wrote:
On 14.02.2011 20:10, Kevin Grittner wrote:
Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages
then the second UPDATE could miss the conflict. My first thought
was to fix that by
On Thu, Feb 17, 2011 at 23:11, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Dan Ports d...@csail.mit.edu wrote:
Oops. Those are both definitely bugs (and my fault). Your patch
looks correct. Thanks for catching that!
Could a committer please apply the slightly modified version here?:
hi,
YAMAMOTO Takashi wrote:
with your previous patch or not?
With, thanks.
i tried. unfortunately i can still reproduce the original loop problem.
WARNING: [0] target 0xbb51ef18 tag 4000:4017:7e3:78:0 prior 0xbb51f148 next 0xb
b51edb0
WARNING: [1] target 0xbb51f148 tag
hi,
YAMAMOTO Takashi wrote:
might be unrelated to the loop problem, but...
Aha! I think it *is* related. There were several places where data
was uninitialized here; mostly because Dan was working on this piece
while I was working on separate issues which added the new fields.
I
hi,
might be unrelated to the loop problem, but...
i got the following SEGV when runnning vacuum on a table.
(the line numbers in predicate.c is different as i have local modifications.)
oldlocktag.myTarget was NULL.
it seems that TransferPredicateLocksToNewTarget sometimes use stack garbage
for
On Wed, Feb 16, 2011 at 10:13:35PM +, YAMAMOTO Takashi wrote:
i got the following SEGV when runnning vacuum on a table.
(the line numbers in predicate.c is different as i have local modifications.)
oldlocktag.myTarget was NULL.
it seems that TransferPredicateLocksToNewTarget sometimes use
Dan Ports d...@csail.mit.edu wrote:
Oops. Those are both definitely bugs (and my fault). Your patch
looks correct. Thanks for catching that!
Could a committer please apply the slightly modified version here?:
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
might be unrelated to the loop problem, but...
i got the following SEGV when runnning vacuum on a table.
vacuum on the table succeeded with the attached patch.
Thanks! I appreciate the heavy testing and excellent diagnostics.
On the face
YAMAMOTO Takashi wrote:
might be unrelated to the loop problem, but...
Aha! I think it *is* related. There were several places where data
was uninitialized here; mostly because Dan was working on this piece
while I was working on separate issues which added the new fields.
I missed the
YAMAMOTO Takashi wrote:
with your previous patch or not?
With, thanks.
-Kevin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 14.02.2011 20:10, Kevin Grittner wrote:
Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages then
the second UPDATE could miss the conflict. My first thought was to
fix that by requiring promotion of a predicate lock
Looking at the prior/next version chaining, aside from the looping
issue, isn't it broken by lock promotion too? There's a check in
RemoveTargetIfNoLongerUsed() so that we don't release a lock target if
its priorVersionOfRow is set, but what if the tuple lock is promoted to
a page level lock
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
Looking at the prior/next version chaining, aside from the
looping issue, isn't it broken by lock promotion too? There's a
check in RemoveTargetIfNoLongerUsed() so that we don't release a
lock target if its priorVersionOfRow is
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
Did you notice whether the loop involved multiple tuples within a
single page?
if i understand correctly, yes.
the following is a snippet of my debug code (dump targets when
triggerCheckTargetForConflictsIn loops 1000 times) and its
hi,
I wrote:
it seems likely that such a cycle might be related to this new
code not properly allowing for some aspect of tuple cleanup.
I found a couple places where cleanup could let these fall through
the cracks long enough to get stale and still be around when a tuple
ID is
hi,
all of the following answers are with the patch you provided in
other mail applied.
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
i have seen this actually happen. i've confirmed the creation of
the loop with the attached patch. it's easily reproducable with
my application. i can
hi,
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
it seems that PredicateLockTupleRowVersionLink sometimes create
a loop of targets (it founds an existing 'newtarget' whose
nextVersionOfRow chain points to the 'oldtarget') and it later
causes CheckTargetForConflictsIn loop forever.
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
i have seen this actually happen. i've confirmed the creation of
the loop with the attached patch. it's easily reproducable with
my application. i can provide the full source code of my
application if you want. (but it isn't easy to run unless
I wrote:
it seems likely that such a cycle might be related to this new
code not properly allowing for some aspect of tuple cleanup.
I found a couple places where cleanup could let these fall through
the cracks long enough to get stale and still be around when a tuple
ID is re-used, causing
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
it seems that PredicateLockTupleRowVersionLink sometimes create
a loop of targets (it founds an existing 'newtarget' whose
nextVersionOfRow chain points to the 'oldtarget') and it later
causes CheckTargetForConflictsIn loop forever.
Is this a
70 matches
Mail list logo