Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-07 Thread Tom Lane
I went ahead and pushed this, since the window for getting buildfarm
testing done before Monday's wrap is closing fast.  We can always
improve on it later, but I think beta3 ought to carry some fix
for the problem.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Tom Lane
Andres Freund  writes:
> Could you attach the current version of the patch, or were there no
> meaningful changes?

No changes.

>> So I took that as license to proceed, but while doing a final round of
>> testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
>> because now that's an infinite recursion.  On reflection it's a bit
>> surprising that it wasn't so all along.  What I'm inclined to do about
>> it is to adjust AcceptInvalidationMessages so that there's a finite
>> recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
>> already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
>> would be enough.

> Hm, but wouldn't that pretty much mean that the risk exposed in this
> thread would still be present?  The reason your approach fixes things is
> that if we're processing invalidation event N, which depends on
> invalidation changes at N+y being processed, is that the recursion leads
> to N+y being processed before the invalidation processing of N finishes,
> due to the recursion.

Right, which makes the maximum recursion depth equal to basically whatever
you think the "backlog" of relevant catalog inval messages could be.  It's
finite, because we do after all have lock on one more catalog at each
level, and eventually we'd hold lock on every system catalog or index
that could be referenced inside AcceptInvalidationMessages.  In practice
I doubt the worst-case nesting level is more than two or three, but
it's not clear how to establish just what it is.

CLOBBER_CACHE_RECURSIVELY breaks this because it recurses whether or not
any relevant inval activity occurred.  I think that's good for verifying
that recursion of AcceptInvalidationMessages works at all, but it's
not relevant to whether infinite recursion could happen without that.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Andres Freund
On 2018-09-06 17:38:38 -0400, Tom Lane wrote:
> I wrote:
> > So where are we on this?  Should I proceed with my patch, or are we
> > going to do further investigation?  Does anyone want to do an actual
> > patch review?
> 
> [ crickets... ]

Sorry, bit busy with postgres open, and a few people being in town due
to that.

Could you attach the current version of the patch, or were there no
meaningful changes?

> So I took that as license to proceed, but while doing a final round of
> testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
> because now that's an infinite recursion.  On reflection it's a bit
> surprising that it wasn't so all along.  What I'm inclined to do about
> it is to adjust AcceptInvalidationMessages so that there's a finite
> recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
> already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
> would be enough.

Hm, but wouldn't that pretty much mean that the risk exposed in this
thread would still be present?  The reason your approach fixes things is
that if we're processing invalidation event N, which depends on
invalidation changes at N+y being processed, is that the recursion leads
to N+y being processed before the invalidation processing of N finishes,
due to the recursion.  Now you can argue that your restriction would
only apply to CLOBBER_CACHE_RECURSIVELY, and thus not in normal builds,
but that still makes me uncomfortable.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Tom Lane
I wrote:
> So where are we on this?  Should I proceed with my patch, or are we
> going to do further investigation?  Does anyone want to do an actual
> patch review?

[ crickets... ]

So I took that as license to proceed, but while doing a final round of
testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
because now that's an infinite recursion.  On reflection it's a bit
surprising that it wasn't so all along.  What I'm inclined to do about
it is to adjust AcceptInvalidationMessages so that there's a finite
recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
would be enough.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-05 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> One concern I have with your approach is that it isn't particularly
>> bullet-proof for cases where the rebuild is triggered by something that
>> doesn't hold a conflicting lock.

> Wouldn't that be a bug in the something-else?

So where are we on this?  Should I proceed with my patch, or are we
going to do further investigation?  Does anyone want to do an actual
patch review?

I think it's important to have some fix in place in time for the next
11beta release, so time grows short ...

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-01 Thread Tom Lane
Andres Freund  writes:
> One concern I have with your approach is that it isn't particularly
> bullet-proof for cases where the rebuild is triggered by something that
> doesn't hold a conflicting lock.

Wouldn't that be a bug in the something-else?  The entire relation cache
system is based on the assumptions that (a) if you hold lock, you can read
a consistent and valid set of information about the rel from the catalogs,
and (b) anyone changing that info must hold a conflicting lock and send an
SINVAL message *before* releasing said lock.  I'm not prepared to consider
a redesign of those assumptions, especially not for back-patching.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Andres Freund
Hi,

On 2018-08-31 19:53:43 -0400, Tom Lane wrote:
> My thought is to do (and back-patch) my change, and then work on yours
> as a performance improvement for HEAD only.

That does make sense.

> I don't believe that yours would make mine redundant, either --- they
> are good complementary changes to make real sure we have no remaining
> bugs of this ilk.  (In particular, no matter how much de-duplication
> we do, we'll still have scenarios with recursive cache flushes; so I'm
> not quite convinced that your solution provides a 100% fix by itself.)

Yea this is a fair concern.

One concern I have with your approach is that it isn't particularly
bullet-proof for cases where the rebuild is triggered by something that
doesn't hold a conflicting lock.  If there's an independent cause to
reload - something that very commonly happens - we might not necessarily
re-trigger the recursive reloads.  That *should* be safe due such
changes hopefully being "harmless", but it's a bit disconcerting
nonetheless.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Tom Lane
Andres Freund  writes:
> Leaving that aside, I think there's one architectural aspect of my
> approach that I prefer over yours: Deduplicating eager cache rebuilds
> like my approach does seems quite advantageous.

That is attractive, for sure, but the other side of the coin is that
getting there seems to require a lot of ticklish redesign.  We would
certainly not consider back-patching such a change normally, and I'm
unconvinced that we should do so in this case.

My thought is to do (and back-patch) my change, and then work on yours
as a performance improvement for HEAD only.  I don't believe that yours
would make mine redundant, either --- they are good complementary changes
to make real sure we have no remaining bugs of this ilk.  (In particular,
no matter how much de-duplication we do, we'll still have scenarios with
recursive cache flushes; so I'm not quite convinced that your solution
provides a 100% fix by itself.)

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-31 Thread Andres Freund
On 2018-08-29 17:58:19 -0400, Tom Lane wrote:
> I wrote:
> > We could perhaps fix this with a less invasive change than what you
> > suggest here, by attacking the missed-call-due-to-recursion aspect
> > rather than monkeying with how relcache rebuild itself works.
> 
> Seeing that rearranging the relcache rebuild logic is looking less than
> trivial, I thought it'd be a good idea to investigate this alternative
> approach.
> 
> Essentially, the problem here is that lmgr.c supposes that
> AcceptInvalidationMessages is an atomic operation, which it most
> certainly isn't.  In reality, it's unsafe to suppose that we can skip
> reading incoming inval messages until we have *completed*
> AcceptInvalidationMessages, not just invoked it.
> 
> However, we can fix that very easily, by retaining one additional bit
> of state in each LOCALLOCK entry, which records whether we know we have
> completed AcceptInvalidationMessages at least once subsequent to obtaining
> that lock.  In the attached draft patch, I refer to that state as having
> "cleared" the lock, but I'm not super pleased with that terminology ---
> anybody have a better idea?
> 
> A small problem with the lock.c API as it stands is that we'd have to do
> an additional hashtable lookup to re-find the relevant LOCALLOCK entry.
> In the attached, I fixed that by extending LockAcquireExtended's API
> to allow the caller to obtain a pointer to the LOCALLOCK entry, making
> it trivially cheap to set the flag after we finish
> AcceptInvalidationMessages.  LockAcquireExtended has only one external
> caller right now, which makes me think it's OK to change its API rather
> than introduce YA variant entry point, but that could be argued.
> 
> There are two ways we could adjust the return value from
> LockAcquire(Extended) to cope with this requirement.  What I did here
> was to add an additional LockAcquireResult code, so that "already held"
> can be distinguished from "already held and cleared".  But we don't
> really need to make that distinction, at least not in the core code.
> So another way would be to redefine LOCKACQUIRE_ALREADY_HELD as meaning
> "held and cleared", and then just return LOCKACQUIRE_OK if you haven't
> called MarkLockClear for the lock.  I'm not entirely sure which way is
> less likely to break any third-party code that might be inspecting the
> result of LockAcquire.  You could probably argue it either way depending
> on what you think the third-party code is doing with the result.
> 
> Anyway, the attached appears to fix the problem: Andres' test script
> fails in fractions of a second with HEAD on my workstation, but it's
> run error-free for quite awhile now with this patch.  And this is sure
> a lot simpler than any relcache rebuild refactoring we're likely to come
> up with.
> 
> Discuss.

This is a neat idea.

I've one intuitive correctness concern that I can't quite nail down, I
plan to spend more time on those aspects (Partially around previously
held locks and invalidations generated with non-conflicting lock-modes).

Leaving that aside, I think there's one architectural aspect of my
approach that I prefer over yours: Deduplicating eager cache rebuilds
like my approach does seems quite advantageous. There's a lot of cases
where we end up sending out many cache rebuilds that invalidate the same
entries - as we do not defer rebuilds, that leads to repeated
rebuilds. My approach avoids overhead associated with that, as long as
the invalidations are queued close enough together.

We could obviously just do both, but that seems unnecessary.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
I wrote:
> We could perhaps fix this with a less invasive change than what you
> suggest here, by attacking the missed-call-due-to-recursion aspect
> rather than monkeying with how relcache rebuild itself works.

Seeing that rearranging the relcache rebuild logic is looking less than
trivial, I thought it'd be a good idea to investigate this alternative
approach.

Essentially, the problem here is that lmgr.c supposes that
AcceptInvalidationMessages is an atomic operation, which it most
certainly isn't.  In reality, it's unsafe to suppose that we can skip
reading incoming inval messages until we have *completed*
AcceptInvalidationMessages, not just invoked it.

However, we can fix that very easily, by retaining one additional bit
of state in each LOCALLOCK entry, which records whether we know we have
completed AcceptInvalidationMessages at least once subsequent to obtaining
that lock.  In the attached draft patch, I refer to that state as having
"cleared" the lock, but I'm not super pleased with that terminology ---
anybody have a better idea?

A small problem with the lock.c API as it stands is that we'd have to do
an additional hashtable lookup to re-find the relevant LOCALLOCK entry.
In the attached, I fixed that by extending LockAcquireExtended's API
to allow the caller to obtain a pointer to the LOCALLOCK entry, making
it trivially cheap to set the flag after we finish
AcceptInvalidationMessages.  LockAcquireExtended has only one external
caller right now, which makes me think it's OK to change its API rather
than introduce YA variant entry point, but that could be argued.

There are two ways we could adjust the return value from
LockAcquire(Extended) to cope with this requirement.  What I did here
was to add an additional LockAcquireResult code, so that "already held"
can be distinguished from "already held and cleared".  But we don't
really need to make that distinction, at least not in the core code.
So another way would be to redefine LOCKACQUIRE_ALREADY_HELD as meaning
"held and cleared", and then just return LOCKACQUIRE_OK if you haven't
called MarkLockClear for the lock.  I'm not entirely sure which way is
less likely to break any third-party code that might be inspecting the
result of LockAcquire.  You could probably argue it either way depending
on what you think the third-party code is doing with the result.

Anyway, the attached appears to fix the problem: Andres' test script
fails in fractions of a second with HEAD on my workstation, but it's
run error-free for quite awhile now with this patch.  And this is sure
a lot simpler than any relcache rebuild refactoring we're likely to come
up with.

Discuss.

regards, tom lane

PS: this also fixes what seems to me to be a buglet in the fast-path
locks stuff: there are failure paths out of LockAcquire that don't
remove the failed LOCALLOCK entry.  Not any more.

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 2e07702..e924c96 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -661,7 +661,8 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 
 	SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
 
-	LockAcquireExtended(, AccessExclusiveLock, true, false, false);
+	LockAcquireExtended(, AccessExclusiveLock, true, false,
+		false, NULL);
 }
 
 static void
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 7b2dcb6..dc0a439 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -105,11 +105,12 @@ void
 LockRelationOid(Oid relid, LOCKMODE lockmode)
 {
 	LOCKTAG		tag;
+	LOCALLOCK  *locallock;
 	LockAcquireResult res;
 
 	SetLocktagRelationOid(, relid);
 
-	res = LockAcquire(, lockmode, false, false);
+	res = LockAcquireExtended(, lockmode, false, false, true, );
 
 	/*
 	 * Now that we have the lock, check for invalidation messages, so that we
@@ -120,9 +121,18 @@ LockRelationOid(Oid relid, LOCKMODE lockmode)
 	 * relcache entry in an undesirable way.  (In the case where our own xact
 	 * modifies the rel, the relcache update happens via
 	 * CommandCounterIncrement, not here.)
+	 *
+	 * However, in corner cases where code acts on tables (usually catalogs)
+	 * recursively, we might get here while still processing invalidation
+	 * messages in some outer execution of this function or a sibling.  The
+	 * "cleared" status of the lock tells us whether we really are done
+	 * absorbing relevant inval messages.
 	 */
-	if (res != LOCKACQUIRE_ALREADY_HELD)
+	if (res != LOCKACQUIRE_ALREADY_CLEAR)
+	{
 		AcceptInvalidationMessages();
+		MarkLockClear(locallock);
+	}
 }
 
 /*
@@ -138,11 +148,12 @@ bool
 ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode)
 {
 	LOCKTAG		tag;
+	LOCALLOCK  *locallock;
 	LockAcquireResult res;
 
 	SetLocktagRelationOid(, relid);
 
-	res = LockAcquire(, lockmode, false, true);
+	res = LockAcquireExtended(, lockmode, 

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
Andres Freund  writes:
> On 2018-08-29 14:00:12 -0400, Tom Lane wrote:
>> 2. I think we may need to address the same order-of-operations hazards
>> as RelationCacheInvalidate() worries about.  Alternatively, maybe we
>> could simplify that function by making it use the same
>> delayed-revalidation logic as we're going to develop for this.

> Hm, just to make sure I understand correctly: You're thinking about
> having to rebuild various nailed entries in a specific order [2]?

> [2] Hm, I'm right now a bit confused as to why "Other nailed relations
> go to the front of rebuildList" is a good idea.

I'm not recalling the reason right now either, but I'm pretty sure that
logic was added out of necessity ... it hasn't always been like that.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
Andres Freund  writes:
> On 2018-08-29 12:56:07 -0400, Tom Lane wrote:
>> BTW, I now have a theory for why we suddenly started seeing this problem
>> in mid-June: commits a54e1f158 et al added a ScanPgRelation call where
>> there had been none before (in RelationReloadNailed, for non-index rels).
>> That didn't create the problem, but it probably increased the odds of
>> seeing it happen.

> Yea.  Doesn't explain why it's only really visible on the BF in
> 11/master though :/

Probably we changed around the tests that run in parallel with vacuum.sql
enough during v11 to make it happen with noticeable probability.  I think
this is sufficient evidence that it's happening now in the back branches:

https://www.postgresql.org/message-id/20180829140149.go23...@telsasoft.com

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Andres Freund
Hi,

On 2018-08-29 14:00:12 -0400, Tom Lane wrote:
> A couple thoughts after reading and reflecting for awhile:

Thanks. This definitely is too complicated for a single brain :(


> 1. I don't much like the pending_rebuilds list, mainly because of this
> consideration: what happens if we hit an OOM error trying to add an entry
> to that list?  As you've drafted the patch, we don't even mark the
> relevant relcache entry rd_invalid before that fails, so that's surely
> bad.  Now, I'm not sure how bulletproof relcache inval is in general
> with respect to OOM failures, but let's not add more hazards.

Obviously you're right and we first need to mark the entry as invalid -
and I think that should also mostly [1] address the OOM issue. If we
error out there's no need to eagerly rebuild the entry during
invalidation - subsequent RelationIdGetRelation() calls will trigger the
necessary rebuild.

But it sounds like your concern might be larger?  I don't quite see how
we could get around without such a list?  I mean I think it'd better if
it were an array, for efficiency, but that seems like a fairly minor
point?

[1] I'm kinda doubful that's entirely bulletproof in case an OOM, or
similar, error is caught with a savepoint, and then execution
continues outside. That in turn might access the relcache entry
without a RelationIdGetRelation call. But that doesn't seem new.


> 2. I think we may need to address the same order-of-operations hazards
> as RelationCacheInvalidate() worries about.  Alternatively, maybe we
> could simplify that function by making it use the same
> delayed-revalidation logic as we're going to develop for this.

Hm, just to make sure I understand correctly: You're thinking about
having to rebuild various nailed entries in a specific order [2]?  I'm
not quite sure I really see a problem here - in contrast to the current
RelationCacheInvalidate() logic, what I'm proposing marks the entries as
invalid in the first phase, so any later access guarantees will rebuild
the entry as necessary.

[2] Hm, I'm right now a bit confused as to why "Other nailed relations
go to the front of rebuildList" is a good idea. That means they might be
built while ClassOidIndexId isn't yet rebuilt?  Practically I don't
think it matters, because the lookup for RelationRelationId will rebuild
it, but I don't quite understand what that logic is precisely getting
at.


> 3. I don't at all like the ProcessPendingRelcacheRebuilds call you added
> to ProcessInvalidationMessages.  That's effectively assuming that the
> "func" *must* be LocalExecuteInvalidationMessage and not anything else;
> likewise, the lack of such a call inside ProcessInvalidationMessagesMulti
> presumes that that one is never called to actually execute invalidations.
> (While those are true statements, it's a horrible violation of modularity
> for these two functions to know it.)  Probably better to put this into the
> callers, which will know what the actual semantics are.

Yea, I mostly hacked this together quickly to have a proposal on the
table, I'm not happy with this as it stands.


> 4. The call added to the middle of ReceiveSharedInvalidMessages doesn't
> seem all that safe either; the problem is its relationship to the
> "catchup" processing.  We are caught up at the moment we exit the loop,
> but are we sure we still would be after doing assorted work for relcache
> rebuild?  Swapping the order of the two steps might help, but then we
> have to consider what happens if we error out from SICleanupQueue.

I'm not sure I understand the problem here. If there's new invalidation
processing inside the rebuilds, that'd mean there was another
ReceiveSharedInvalidMessages, which then would have triggered the "new"
ReceiveSharedInvalidMessages, to then process the new pending why I made
ProcessPendingRelcacheRebuilds() unlink the queue it processes first.  I
think locking should prevent issues with entries that are currently
accessed during rebuild from being re-invalidated.

Did I misunderstand?

As the comment in ProcessPendingRelcacheRebuilds() notes, I'm concerned
what happens if we error out after unlinking the current pending
items. But I think, if we handle the memory leak with a PG_CATCH, that
should be ok, because after the error new accesses should trigger the
rebuilds.


> (In general, the hard part of all this stuff is being sure that sane
> things happen if you error out part way through ...)

Yea :/

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Andres Freund
Hi,

On 2018-08-29 12:56:07 -0400, Tom Lane wrote:
> I wrote:
> > * We now recursively enter ScanPgRelation, which (again) needs to do a
> > search using pg_class_oid_index, so it (again) opens and locks that.
> > BUT: LockRelationOid sees that *this process already has share lock on
> > pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.
> 
> BTW, I now have a theory for why we suddenly started seeing this problem
> in mid-June: commits a54e1f158 et al added a ScanPgRelation call where
> there had been none before (in RelationReloadNailed, for non-index rels).
> That didn't create the problem, but it probably increased the odds of
> seeing it happen.

Yea.  Doesn't explain why it's only really visible on the BF in
11/master though :/


> Also ... isn't the last "relation->rd_isvalid = true" in
> RelationReloadNailed wrong?  If it got cleared during ScanPgRelation,
> I do not think we want to believe that we got an up-to-date row.

I don't really think so - note how a normal relcache inval essentially
does the same. RelationClearRelation() first marks the entry as invalid,
then goes and builds a new entry that's *not* hooked into the hashtable
(therefore doesn't receive new invals), and then moves the contents
over. That overwrites rd_isvalid to true, as that's guaranteed to be set
by by RelationBuildDesc(). During the move no new invalidations are
accepted.   So this really is just behaving equivalently.

The harder question is why that's safe. I think I convinced myself that
it is a couple times over the years, but I don't think we've properly
documented it. As the header says:
 *  The following code contains many undocumented hacks.  Please be
 *  careful

We definitely relied on RelationClearRelation() always returning a valid
record for a while, c.f. RelationIdGetRelation()'s rd_isvalid assertion,
and the lack of a loop in that function.

(There's no coffee in this hotel at 4am. Shame.)

Ah, yes.  This assumption is currently safe because the locking on
relations being looked up, better guarantees that there's no critical
changes to relcache entries while the entry is being rebuilt.

I think we'd also run into trouble with clobber cache recursively etc
without it.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
Andres Freund  writes:
> A bit of food, a coke and a talk later, here's a first draft *prototype*
> of how this could be solved. ...
> Obviously this is far from clean enough, but what do you think about the
> basic approach?  It does, in my limited testing, indeed solve the "could
> not read block" issue.

A couple thoughts after reading and reflecting for awhile:

1. I don't much like the pending_rebuilds list, mainly because of this
consideration: what happens if we hit an OOM error trying to add an entry
to that list?  As you've drafted the patch, we don't even mark the
relevant relcache entry rd_invalid before that fails, so that's surely
bad.  Now, I'm not sure how bulletproof relcache inval is in general
with respect to OOM failures, but let's not add more hazards.

2. I think we may need to address the same order-of-operations hazards
as RelationCacheInvalidate() worries about.  Alternatively, maybe we
could simplify that function by making it use the same
delayed-revalidation logic as we're going to develop for this.

3. I don't at all like the ProcessPendingRelcacheRebuilds call you added
to ProcessInvalidationMessages.  That's effectively assuming that the
"func" *must* be LocalExecuteInvalidationMessage and not anything else;
likewise, the lack of such a call inside ProcessInvalidationMessagesMulti
presumes that that one is never called to actually execute invalidations.
(While those are true statements, it's a horrible violation of modularity
for these two functions to know it.)  Probably better to put this into the
callers, which will know what the actual semantics are.

4. The call added to the middle of ReceiveSharedInvalidMessages doesn't
seem all that safe either; the problem is its relationship to the
"catchup" processing.  We are caught up at the moment we exit the loop,
but are we sure we still would be after doing assorted work for relcache
rebuild?  Swapping the order of the two steps might help, but then we
have to consider what happens if we error out from SICleanupQueue.

(In general, the hard part of all this stuff is being sure that sane
things happen if you error out part way through ...)

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
I wrote:
> * We now recursively enter ScanPgRelation, which (again) needs to do a
> search using pg_class_oid_index, so it (again) opens and locks that.
> BUT: LockRelationOid sees that *this process already has share lock on
> pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.

BTW, I now have a theory for why we suddenly started seeing this problem
in mid-June: commits a54e1f158 et al added a ScanPgRelation call where
there had been none before (in RelationReloadNailed, for non-index rels).
That didn't create the problem, but it probably increased the odds of
seeing it happen.

Also ... isn't the last "relation->rd_isvalid = true" in
RelationReloadNailed wrong?  If it got cleared during ScanPgRelation,
I do not think we want to believe that we got an up-to-date row.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Tom Lane
Andres Freund  writes:
> It's not OK to rebuild relcache entries in the middle of
> ReceiveSharedInvalidMessages() - a later entry in the invalidation queue
> might be relmapper invalidation, and thus immediately processing a
> relcache invalidation might attempt to scan a relation that does not
> exist anymore.

Check, but the bug is actually more subtle than that.  I've reproduced the
problem using your scripts, changed the ERROR to PANIC, and studied the
stack traces of a few core dumps, and here's what I find is the scenario:

* Code tries to open some relation or other.  After acquiring lock, it
goes off to do AcceptInvalidationMessages (cf. LockRelationOid).  That
reads a relcache inval message for "t", and because "t"'s relcache entry
is already open, it decides to rebuild it immediately.

* Therefore it calls ScanPgRelation, which needs to do a search using
pg_class_oid_index, so it opens and locks that.  Again, having acquired
the lock in LockRelationOid, it goes off to do AcceptInvalidationMessages
(recursively).

* Now we read another relcache inval (for "t_pkey", in the cases I've
traced through in detail).  Again, that's open and we decide we'd better
do RelationReloadIndexInfo for it.

* We now recursively enter ScanPgRelation, which (again) needs to do a
search using pg_class_oid_index, so it (again) opens and locks that.
BUT: LockRelationOid sees that *this process already has share lock on
pg_class_oid_index*, so it figures it can skip AcceptInvalidationMessages.

Had we done AcceptInvalidationMessages at this point, we'd have found
the relcache inval message that's waiting for us about pg_class_oid_index,
and we'd have processed it before attempting to read pg_class_oid_index,
and all would be well.  It's the missed AIM call due to the recursive
entry to ScanPgRelation that is causing us to fail to read all available
inval messages before believing that our info about pg_class_oid_index
is up to date.

> Instead, receiving a relcache invalidation now just queues an entry onto
> a new "pending rebuilds" list, that is then processed in a second stage,
> after emptying the entire sinval queue.  At that stage we're guaranteed
> that the relmapper is up2date.

More to the point, we're guaranteed that we've seen all inval messages
that were sent before we acquired whatever lock we just acquired.
(In the cases I've looked at, the local relmapper *is* up to date,
it's the relcache entry for pg_class_oid_index that is not.)

My first gut feeling after understanding the recursion aspect was that
what you're suggesting would not be enough to be bulletproof.  We will
still have the hazard that while we're rebuilding relcache entries,
we may end up doing recursive ScanPgRelation calls and the inner ones
will decide they needn't call AcceptInvalidationMessages.  So we might
well still be rebuilding stuff without having seen every available
inval message.  But it should be OK, because any inval messages we could
possibly need to worry about will have been scanned and queued by the
outer call.

We could perhaps fix this with a less invasive change than what you
suggest here, by attacking the missed-call-due-to-recursion aspect
rather than monkeying with how relcache rebuild itself works.  But I'm
thinking this way might be a good idea anyhow, in order to reduce the
depth of recursion that occurs in scenarios like this.

I've not read your patch yet, so no comment on specifics.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-29 Thread Andres Freund
On 2018-08-28 20:29:08 -0700, Andres Freund wrote:
> On 2018-08-28 20:27:14 -0700, Andres Freund wrote:
> > Locally that triggers the problem within usually a few seconds.
> 
> FWIW, it does so including versions as old as 9.2.
> 
> Now I need to look for power for my laptop and some for me ;)

A bit of food, a coke and a talk later, here's a first draft *prototype*
of how this could be solved.

It's not OK to rebuild relcache entries in the middle of
ReceiveSharedInvalidMessages() - a later entry in the invalidation queue
might be relmapper invalidation, and thus immediately processing a
relcache invalidation might attempt to scan a relation that does not
exist anymore.

Instead, receiving a relcache invalidation now just queues an entry onto
a new "pending rebuilds" list, that is then processed in a second stage,
after emptying the entire sinval queue.  At that stage we're guaranteed
that the relmapper is up2date.

This might actually have performance benefits in some realistic
scenarios - while RelationFlushRelation() marks the relcache entry
invalid for each of the received entries,
ProcessPendingRelcacheRebuilds() can skip rebuilding if the entry has
since been rebuild.


Obviously this is far from clean enough, but what do you think about the
basic approach?  It does, in my limited testing, indeed solve the "could
not read block" issue.


Greetings,

Andres Freund
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5f4ae1291c6..d7037d02a75 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2096,6 +2096,7 @@ ReorderBufferExecuteInvalidations(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 	for (i = 0; i < txn->ninvalidations; i++)
 		LocalExecuteInvalidationMessage(>invalidations[i]);
+	ProcessPendingRelcacheRebuilds();
 }
 
 /*
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 563e2906e36..4528a17a9f7 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -127,6 +127,8 @@ ReceiveSharedInvalidMessages(
 		 */
 	} while (nummsgs == MAXINVALMSGS);
 
+	ProcessPendingRelcacheRebuilds();
+
 	/*
 	 * We are now caught up.  If we received a catchup signal, reset that
 	 * flag, and call SICleanupQueue().  This is not so much because we need
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 16d86a29390..e4f70428999 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -456,6 +456,8 @@ ProcessInvalidationMessages(InvalidationListHeader *hdr,
 {
 	ProcessMessageList(hdr->cclist, func(msg));
 	ProcessMessageList(hdr->rclist, func(msg));
+
+	ProcessPendingRelcacheRebuilds();
 }
 
 /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6125421d39a..927b8cef4d9 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -144,6 +144,14 @@ bool		criticalSharedRelcachesBuilt = false;
  */
 static long relcacheInvalsReceived = 0L;
 
+/*
+ * List of oids for entries that need to be rebuilt. Entries get queued onto
+ * it while receiving invalidations, and are processed at the end. This both
+ * avoids errors due to relcache rebuilds relying on an up2date relmapper, and
+ * avoids redundant rebuilds.
+ */
+static List *pending_rebuilds = NIL;
+
 /*
  * eoxact_list[] stores the OIDs of relations that (might) need AtEOXact
  * cleanup work.  This list intentionally has limited size; if it overflows,
@@ -251,7 +259,7 @@ static void RelationClearRelation(Relation relation, bool rebuild);
 
 static void RelationReloadIndexInfo(Relation relation);
 static void RelationReloadNailed(Relation relation);
-static void RelationFlushRelation(Relation relation);
+static void RelationFlushRelation(Relation relation, bool immed);
 static void RememberToFreeTupleDescAtEOX(TupleDesc td);
 static void AtEOXact_cleanup(Relation relation, bool isCommit);
 static void AtEOSubXact_cleanup(Relation relation, bool isCommit,
@@ -2556,7 +2564,7 @@ RelationClearRelation(Relation relation, bool rebuild)
  *	 This is used when we receive a cache invalidation event for the rel.
  */
 static void
-RelationFlushRelation(Relation relation)
+RelationFlushRelation(Relation relation, bool immed)
 {
 	if (relation->rd_createSubid != InvalidSubTransactionId ||
 		relation->rd_newRelfilenodeSubid != InvalidSubTransactionId)
@@ -2581,10 +2589,69 @@ RelationFlushRelation(Relation relation)
 		 */
 		bool		rebuild = !RelationHasReferenceCountZero(relation);
 
-		RelationClearRelation(relation, rebuild);
+		/*
+		 * Can't immediately rebuild entry - in the middle of receiving
+		 * invalidations relmapper might be out of date and point to an older
+		 * version of required catalogs. Defer rebuilds until the queue has
+		 * been emptied.
+		 */
+		if (rebuild && !immed)
+		{
+			MemoryContext oldcontext;
+
+			oldcontext = 

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
On 2018-08-28 23:32:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-08-28 20:27:14 -0700, Andres Freund wrote:
> >> Locally that triggers the problem within usually a few seconds.
> 
> > FWIW, it does so including versions as old as 9.2.

9.0 as well, so it's definitely not some recently backpatched change.


> Interesting.  One thing I'd like to know is why this only started
> showing up in the buildfarm a few weeks ago.  Maybe that's just a
> timing effect, but it seems odd.

I suspect it's largely timing and invalidation traffic.  It's been
really hard to reproduce with core regression tests after all.  If my
theory is right - and I'm getting more and more certain - that we're
trying to access a remapped relation during invalidation, before reading
the relevant relmapper invalidation, that also makes sense: You need to
be unlucky enough that there's a relmapper invalidation *after* an
invalidation for a currently "open" relation (otherwise we'd not do
inval processing). Most of the time we'll likely just be "too fast" to
process all the invalidations (thereby not getting to the point that
pg_class has been remapped later in the queue).

I think it might just be related to some modifed tests changing when
exactly autovacuum is triggered, which then in turn triggers the
invalidation.

Unfortunately the only "real" fix I can think of is to change the
relcache invalidation logic to only ever mark entries as 'invalid', and
not do any eager rebuilding of the entries.  That might be a good idea
reasons for performance anyway, but is quite scary to imagine doing in a
minor release.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Tom Lane
Andres Freund  writes:
> On 2018-08-28 20:27:14 -0700, Andres Freund wrote:
>> Locally that triggers the problem within usually a few seconds.

> FWIW, it does so including versions as old as 9.2.

Interesting.  One thing I'd like to know is why this only started
showing up in the buildfarm a few weeks ago.  Maybe that's just a
timing effect, but it seems odd.

> Now I need to look for power for my laptop and some for me ;)

Yeah, I'm not going to look harder till tomorrow either.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
On 2018-08-28 20:27:14 -0700, Andres Freund wrote:
> Locally that triggers the problem within usually a few seconds.

FWIW, it does so including versions as old as 9.2.

Now I need to look for power for my laptop and some for me ;)



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
On 2018-08-28 23:18:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Tom, I think this could use your eyes.
> 
> I've had no luck reproducing it locally ... do you have a recipe
> for that?

It can reproduce reliably with the three scripts attached:

psql -c' drop table if exists t; create table t (a serial primary key, b int, c 
int);' && pgbench -n -c 4 -T300 -f /tmp/insert.sql
pgbench -n -c 4 -T300 -f /tmp/vacuum.sql
pgbench -n -c 4 -T300 -f /tmp/analyze.sql

Locally that triggers the problem within usually a few seconds.

Greetings,

Andres Freund


insert.sql
Description: application/sql


vacuum.sql
Description: application/sql


analyze.sql
Description: application/sql


Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Tom Lane
Andres Freund  writes:
> Tom, I think this could use your eyes.

I've had no luck reproducing it locally ... do you have a recipe
for that?

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
On 2018-08-28 19:56:58 -0700, Andres Freund wrote:
> Hi Everyone,
> 
> 
> Tom, I think this could use your eyes.
> 
> 
> On 2018-08-28 00:52:13 -0700, Andres Freund wrote:
> > I've a few leads that I'm currently testing out. One observation I think
> > might be crucial is that the problem, in Tomas' testcase with just
> > VACUUM FULL of pg_class and INSERTs into another rel, appears to only
> > happen if autovacuum is enabled. The crashes are always immediately
> > after autovacuum did an analyze of the relation.
> 
> I've run this with autovacuum disabled for quite a while, without the
> problem re-occuring. It does however re-occur with pgbench running an
> ANALYZE on the user defined table.
> 
> 
> > What I see on the apply side, I think, that invalidation records for
> > contents of pg_class and pg_class itself are intermixed.
> 
> I've not 100% figured this out yet, and I need to switch to another task
> for a bit, but what I think is happening here is that we have a
> relatively fundamental issue with how rewrites of catalog tables are
> handled.  A slightly instrumented version of postgres shows the
> following:
> 
> 2018-08-29 04:30:17.515 CEST [19551] LOG:  statement: INSERT INTO t(b,c) 
> SELECT i,i FROM generate_series(1,59) s(i);
> ...
> 2018-08-29 04:30:17.559 CEST [19554] LOG:  statement: vacuum full pg_class;
> 2018-08-29 04:30:17.571 CEST [19557] LOG:  swapping mapped 1259/1281520 w 
> 1281526/1281526
> 2018-08-29 04:30:17.571 CEST [19557] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.572 CEST [19557] LOG:  reindexing 2662 to 1281529 on 
> 1259/1281526
> 2018-08-29 04:30:17.572 CEST [19557] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.576 CEST [19557] LOG:  reindexing 2663 to 1281530 on 
> 1259/1281526
> 2018-08-29 04:30:17.576 CEST [19557] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.578 CEST [19557] LOG:  reindexing 3455 to 1281531 on 
> 1259/1281526
> 2018-08-29 04:30:17.578 CEST [19557] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.582 CEST [19557] LOG:  releasing lock on 1259
> 2018-08-29 04:30:17.582 CEST [19557] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.582 CEST [19557] LOG:  duration: 87.389 ms
> 2018-08-29 04:30:17.582 CEST [19557] LOG:  statement: vacuum full pg_class;
> 2018-08-29 04:30:17.583 CEST [19563] LOG:  duration: 630.481 ms
> 
> 2018-08-29 04:30:17.584 CEST [19563] LOG:  statement: analyze t;
> 2018-08-29 04:30:17.597 CEST [19555] LOG:  swapping mapped 1259/1281526 w 
> 1281532/1281532
> 2018-08-29 04:30:17.597 CEST [19555] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.597 CEST [19555] LOG:  reindexing 2662 to 1281535 on 
> 1259/1281532
> 2018-08-29 04:30:17.597 CEST [19555] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.599 CEST [19555] LOG:  reindexing 2663 to 1281536 on 
> 1259/1281532
> 2018-08-29 04:30:17.599 CEST [19555] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.601 CEST [19555] LOG:  reindexing 3455 to 1281537 on 
> 1259/1281532
> 2018-08-29 04:30:17.601 CEST [19555] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.605 CEST [19555] LOG:  releasing lock on 1259
> 2018-08-29 04:30:17.605 CEST [19555] STATEMENT:  vacuum full pg_class;
> 2018-08-29 04:30:17.605 CEST [19551] WARNING:  could not read block 3 in file 
> "base/16384/1281529": read only 0 of 8192 bytes
> (it's a warning because the process SIGSTOPs itself afterwards, for
> easier debugging)
> 
> So what we can see is that a first VACUUM FULL of pg_class builds a new
> pg_class_oid_index/2662 index under filefilenode 1281526.  Then a
> *second* VACUUM FULL again reindexes 2663, this time to 1281532.
> 
> But *after* the VACUUM FULL has finished and released the lock (that's
> printed just before the UnlockRelationIdForSession in vacuum_rel()), we
> can see that the INSERT from pid 19551 still accesses the OLD index,
> from *before* the second VACUUM FULL.  Which at that point doesn't exist
> anymore.
> 
> I think the reason for that is that an invalidation for the relmapper
> changes due to the second VACUUM has been queued, but currently the
> invalidation processing hasn't yet gotten to that entry yet. Instead
> currently the entry for the user defined table is being rebuilt, which
> then tries to access pg_class, but under its "old" relfilenode. Which
> then fails.  The fact that inex scans trigger this problem isn't
> relevant - the heap entry for pg_class also refers to the old entry, as
> I've verified using gdb.
> 
> 
> I'm unclear on how can fix this, as long as we rebuild cache entries
> during invalidation, rather than during access. For most cache entries
> it's not a problem to rebuild them relying on other outdated content,
> we'll just rebuild them again later.  But for things like the relation
> mapper that's not true, unfortunately.

Fwiw, if I use gdb to issue InvalidateSystemCaches(),
RelationInitPhysicalAddr(sysscan->irel) makes the relcache entry have
the correct relfilenode, and restarting 

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
Hi Everyone,


Tom, I think this could use your eyes.


On 2018-08-28 00:52:13 -0700, Andres Freund wrote:
> I've a few leads that I'm currently testing out. One observation I think
> might be crucial is that the problem, in Tomas' testcase with just
> VACUUM FULL of pg_class and INSERTs into another rel, appears to only
> happen if autovacuum is enabled. The crashes are always immediately
> after autovacuum did an analyze of the relation.

I've run this with autovacuum disabled for quite a while, without the
problem re-occuring. It does however re-occur with pgbench running an
ANALYZE on the user defined table.


> What I see on the apply side, I think, that invalidation records for
> contents of pg_class and pg_class itself are intermixed.

I've not 100% figured this out yet, and I need to switch to another task
for a bit, but what I think is happening here is that we have a
relatively fundamental issue with how rewrites of catalog tables are
handled.  A slightly instrumented version of postgres shows the
following:

2018-08-29 04:30:17.515 CEST [19551] LOG:  statement: INSERT INTO t(b,c) SELECT 
i,i FROM generate_series(1,59) s(i);
...
2018-08-29 04:30:17.559 CEST [19554] LOG:  statement: vacuum full pg_class;
2018-08-29 04:30:17.571 CEST [19557] LOG:  swapping mapped 1259/1281520 w 
1281526/1281526
2018-08-29 04:30:17.571 CEST [19557] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.572 CEST [19557] LOG:  reindexing 2662 to 1281529 on 
1259/1281526
2018-08-29 04:30:17.572 CEST [19557] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.576 CEST [19557] LOG:  reindexing 2663 to 1281530 on 
1259/1281526
2018-08-29 04:30:17.576 CEST [19557] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.578 CEST [19557] LOG:  reindexing 3455 to 1281531 on 
1259/1281526
2018-08-29 04:30:17.578 CEST [19557] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.582 CEST [19557] LOG:  releasing lock on 1259
2018-08-29 04:30:17.582 CEST [19557] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.582 CEST [19557] LOG:  duration: 87.389 ms
2018-08-29 04:30:17.582 CEST [19557] LOG:  statement: vacuum full pg_class;
2018-08-29 04:30:17.583 CEST [19563] LOG:  duration: 630.481 ms

2018-08-29 04:30:17.584 CEST [19563] LOG:  statement: analyze t;
2018-08-29 04:30:17.597 CEST [19555] LOG:  swapping mapped 1259/1281526 w 
1281532/1281532
2018-08-29 04:30:17.597 CEST [19555] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.597 CEST [19555] LOG:  reindexing 2662 to 1281535 on 
1259/1281532
2018-08-29 04:30:17.597 CEST [19555] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.599 CEST [19555] LOG:  reindexing 2663 to 1281536 on 
1259/1281532
2018-08-29 04:30:17.599 CEST [19555] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.601 CEST [19555] LOG:  reindexing 3455 to 1281537 on 
1259/1281532
2018-08-29 04:30:17.601 CEST [19555] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.605 CEST [19555] LOG:  releasing lock on 1259
2018-08-29 04:30:17.605 CEST [19555] STATEMENT:  vacuum full pg_class;
2018-08-29 04:30:17.605 CEST [19551] WARNING:  could not read block 3 in file 
"base/16384/1281529": read only 0 of 8192 bytes
(it's a warning because the process SIGSTOPs itself afterwards, for
easier debugging)

So what we can see is that a first VACUUM FULL of pg_class builds a new
pg_class_oid_index/2662 index under filefilenode 1281526.  Then a
*second* VACUUM FULL again reindexes 2663, this time to 1281532.

But *after* the VACUUM FULL has finished and released the lock (that's
printed just before the UnlockRelationIdForSession in vacuum_rel()), we
can see that the INSERT from pid 19551 still accesses the OLD index,
from *before* the second VACUUM FULL.  Which at that point doesn't exist
anymore.

I think the reason for that is that an invalidation for the relmapper
changes due to the second VACUUM has been queued, but currently the
invalidation processing hasn't yet gotten to that entry yet. Instead
currently the entry for the user defined table is being rebuilt, which
then tries to access pg_class, but under its "old" relfilenode. Which
then fails.  The fact that inex scans trigger this problem isn't
relevant - the heap entry for pg_class also refers to the old entry, as
I've verified using gdb.


I'm unclear on how can fix this, as long as we rebuild cache entries
during invalidation, rather than during access. For most cache entries
it's not a problem to rebuild them relying on other outdated content,
we'll just rebuild them again later.  But for things like the relation
mapper that's not true, unfortunately.


Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-28 Thread Andres Freund
Hi,

Tomas provided me with a machine where the problem was reproducible
(Thanks again!). I first started to make sure a54e1f158 is unrelated -
and indeed, the problem appears independently.

I've a few leads that I'm currently testing out. One observation I think
might be crucial is that the problem, in Tomas' testcase with just
VACUUM FULL of pg_class and INSERTs into another rel, appears to only
happen if autovacuum is enabled. The crashes are always immediately
after autovacuum did an analyze of the relation.

What I see on the apply side, I think, that invalidation records for
contents of pg_class and pg_class itself are intermixed.

I'll continue looking into this.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-14 Thread Peter Geoghegan
On Tue, Aug 14, 2018 at 2:07 PM, Todd A. Cook  wrote:
> Sorry, I just noticed this.  Mantid is my animal, so I can set 
> "min_parallel_table_scan_size = 0"
> on it if that would be helpful.  (Please reply directly if so; I'm not able 
> to keep up with pgsql-hackers
> right now.)

We've already been able to rule that out as a factor here. Thanks all
the same, though.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-14 Thread Todd A. Cook
On 8/9/18, 12:56 AM, "Peter Geoghegan"  wrote:

On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:
>> Anyway, "VACUUM FULL pg_class" should be expected to corrupt
>> pg_class_oid_index when we happen to get a parallel build, since
>> pg_class is a mapped relation, and I've identified that as a problem
>> for parallel CREATE INDEX [2]. If that was the ultimate cause of the
>> issue, it would explain why only REL_11_STABLE and master are
>> involved.
>
> Oooh ... but pg_class wouldn't be big enough to get a parallel
> index rebuild during that test, would it?

Typically not, but I don't think that we can rule it out right away.
"min_parallel_table_scan_size = 0" will make it happen, and that
happens in quite a few regression tests. Though that doesn't include
vacuum.sql.

The animals mandrill, mantid and lapwing are all
"force_parallel_mode=regress", according to their notes. That actually
isn't enough to force parallel CREATE INDEX, though I tend to think it
should be. I don't see anything interesting in their "extra_config",
but perhaps "min_parallel_table_scan_size = 0" got in some other way.
I don't know all that much about the buildfarm client code, and it's
late.

Sorry, I just noticed this.  Mantid is my animal, so I can set 
"min_parallel_table_scan_size = 0"
on it if that would be helpful.  (Please reply directly if so; I'm not able to 
keep up with pgsql-hackers
right now.)

-- todd




Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-11 Thread Tomas Vondra
On 08/11/2018 04:08 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-08-11 15:40:19 +0200, Tomas Vondra wrote:
>> For the record, I can actually reproduce this on 9.6 (haven't tried
>> older releases, but I suspect it's there too). Instead of using the
>> failing subscription, I've used another pgbench script doing this:
> 
>>   SET statement_timeout = 5;
>>   COPY t TO '/dev/null';
>>
>> and doing something like:
>>
>>pgbench -n -c 20 -T 300 -f copy.sql test
> 
> Just to confirm: That's with the vacuum full and insert running
> concurrently? And then just restarting the above copy.sql (as pgbench
> errors out after the timeouts) until you get the error?
> 

Yes, pretty much.

> I'm a bit confused what the copy + timeout is doing here? It shouldn't
> trigger any invalidations itself, and the backtrace appears to be from
> the insert.sql you posted earlier? Unclear why a copy to /dev/null
> should trigger anything like this?
> 

My goal was to simulate the failing subscription sync, which does COPY
and fails because of duplicate data. The statement_timeout seemed like a
good approximation of that. It may be unnecessary, I don't know.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-11 Thread Andres Freund
Hi,

On 2018-08-11 15:40:19 +0200, Tomas Vondra wrote:
> For the record, I can actually reproduce this on 9.6 (haven't tried
> older releases, but I suspect it's there too). Instead of using the
> failing subscription, I've used another pgbench script doing this:

>   SET statement_timeout = 5;
>   COPY t TO '/dev/null';
> 
> and doing something like:
> 
>pgbench -n -c 20 -T 300 -f copy.sql test

Just to confirm: That's with the vacuum full and insert running
concurrently? And then just restarting the above copy.sql (as pgbench
errors out after the timeouts) until you get the error?

I'm a bit confused what the copy + timeout is doing here? It shouldn't
trigger any invalidations itself, and the backtrace appears to be from
the insert.sql you posted earlier? Unclear why a copy to /dev/null
should trigger anything like this?

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-11 Thread Tomas Vondra
On 08/11/2018 03:16 PM, Tomas Vondra wrote:
> On 08/11/2018 05:02 AM, Tom Lane wrote:
>> Peter Geoghegan  writes:
>>> I'm concerned that this item has the potential to delay the release,
>>> since, as you said, we're back to the drawing board.
>>
>> Me too.  I will absolutely not vote to release 11.0 before we've
>> solved this ...
>>
> 
> Not sure. I can easily reproduce that on REL_10_STABLE, so it's not a
> new issue specific to 11.0.
> 

For the record, I can actually reproduce this on 9.6 (haven't tried
older releases, but I suspect it's there too). Instead of using the
failing subscription, I've used another pgbench script doing this:

  SET statement_timeout = 5;
  COPY t TO '/dev/null';

and doing something like:

   pgbench -n -c 20 -T 300 -f copy.sql test

I don't think the 20-client COPY concurrency is necessary, it just makes
restarts after the statement_timeout faster.

This produces about the same backtrace as reported on the other thread.
Attaches is both plain 'bt' and 'bt full'.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Core was generated by `postgres: user test [local] INSERT  '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  }
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x778bc7f4b800 in __GI_abort () at abort.c:89
#2  0x00962df3 in errfinish (dummy=0) at elog.c:557
#3  0x008091b9 in mdread (reln=0x161a940, forknum=MAIN_FORKNUM, 
blocknum=1, buffer=0x778bb9a38080 "") at md.c:784
#4  0x0080b8bd in smgrread (reln=0x161a940, forknum=MAIN_FORKNUM, 
blocknum=1, buffer=0x778bb9a38080 "") at smgr.c:628
#5  0x007cf93f in ReadBuffer_common (smgr=0x161a940, relpersistence=112 
'p', forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, 
hit=0x7fff20f9131b "") at bufmgr.c:890
#6  0x007cf272 in ReadBufferExtended (reln=0x15f2318, 
forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0) at bufmgr.c:664
#7  0x007cf14e in ReadBuffer (reln=0x15f2318, blockNum=1) at 
bufmgr.c:596
#8  0x004e0953 in _bt_getbuf (rel=0x15f2318, blkno=1, access=1) at 
nbtpage.c:576
#9  0x004df73a in _bt_getroot (rel=0x15f2318, access=1) at nbtpage.c:132
#10 0x004e6adb in _bt_search (rel=0x15f2318, keysz=1, 
scankey=0x7fff20f91dd0, nextkey=0 '\000', bufP=0x7fff20f927d4, access=1, 
snapshot=0xe45920 ) at nbtsearch.c:99
#11 0x004e80a9 in _bt_first (scan=0x16454b8, dir=ForwardScanDirection) 
at nbtsearch.c:983
#12 0x004e4eb5 in btgettuple (scan=0x16454b8, dir=ForwardScanDirection) 
at nbtree.c:326
#13 0x004d9321 in index_getnext_tid (scan=0x16454b8, 
direction=ForwardScanDirection) at indexam.c:415
#14 0x004d9744 in index_getnext (scan=0x16454b8, 
direction=ForwardScanDirection) at indexam.c:553
#15 0x004d81c3 in systable_getnext (sysscan=0x16455d0) at genam.c:416
#16 0x0094c114 in ScanPgRelation (targetRelId=16391, indexOK=1 '\001', 
force_non_historic=0 '\000') at relcache.c:347
#17 0x0094f6a3 in RelationReloadIndexInfo (relation=0x778bc8bdcfc8) at 
relcache.c:1936
#18 0x0094fd8e in RelationClearRelation (relation=0x778bc8bdcfc8, 
rebuild=1 '\001') at relcache.c:2199
#19 0x0095023d in RelationFlushRelation (relation=0x778bc8bdcfc8) at 
relcache.c:2403
#20 0x00950351 in RelationCacheInvalidateEntry (relationId=16391) at 
relcache.c:2455
#21 0x0094865e in LocalExecuteInvalidationMessage (msg=0x7fff20f92cc0) 
at inval.c:578
#22 0x007eaf7e in ReceiveSharedInvalidMessages (invalFunction=0x94856d 
, resetFunction=0x9487b3 
) at sinval.c:91
#23 0x0094887d in AcceptInvalidationMessages () at inval.c:672
#24 0x007eee5c in LockRelationOid (relid=2662, lockmode=1) at lmgr.c:125
#25 0x004bad33 in relation_open (relationId=2662, lockmode=1) at 
heapam.c:1124
#26 0x004d888f in index_open (relationId=2662, lockmode=1) at 
indexam.c:150
#27 0x004d7f8d in systable_beginscan (heapRelation=0x15ed518, 
indexId=2662, indexOK=1 '\001', snapshot=0xe45920 , 
nkeys=1, key=0x7fff20f92e20) at genam.c:334
#28 0x0094c104 in ScanPgRelation (targetRelId=16387, indexOK=1 '\001', 
force_non_historic=0 '\000') at relcache.c:342
#29 0x0094d569 in RelationBuildDesc (targetRelId=16387, insertIt=0 
'\000') at relcache.c:949
#30 0x0094fe4c in RelationClearRelation (relation=0x778bc8bdb390, 
rebuild=1 '\001') at relcache.c:2281
#31 0x0095023d in RelationFlushRelation (relation=0x778bc8bdb390) at 
relcache.c:2403
#32 0x00950351 in RelationCacheInvalidateEntry (relationId=16387) at 
relcache.c:2455
#33 0x0094865e in LocalExecuteInvalidationMessage (msg=0x7fff20f93190) 
at inval.c:578
#34 0x007eb057 in ReceiveSharedInvalidMessages (invalFunction=0x94856d 
, 

Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-11 Thread Tomas Vondra
On 08/11/2018 05:02 AM, Tom Lane wrote:
> Peter Geoghegan  writes:
>> I'm concerned that this item has the potential to delay the release,
>> since, as you said, we're back to the drawing board.
> 
> Me too.  I will absolutely not vote to release 11.0 before we've
> solved this ...
> 

Not sure. I can easily reproduce that on REL_10_STABLE, so it's not a
new issue specific to 11.0.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-10 Thread Peter Geoghegan
On Fri, Aug 10, 2018 at 8:02 PM, Tom Lane  wrote:
> Me too.  I will absolutely not vote to release 11.0 before we've
> solved this ...

I believe that that's the right call, assuming things don't change.

This is spooky in a way that creates a lot of doubts in my mind. I
don't think it's at all advisable to make an executive decision to
release, while still not having the foggiest idea what's really going
on.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-10 Thread Tom Lane
Peter Geoghegan  writes:
> I'm concerned that this item has the potential to delay the release,
> since, as you said, we're back to the drawing board.

Me too.  I will absolutely not vote to release 11.0 before we've
solved this ...

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-10 Thread Peter Geoghegan
On Fri, Aug 10, 2018 at 7:45 PM, Tom Lane  wrote:
> Didn't take long to show that the relmapper issue wasn't it:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu=2018-08-10%2021%3A21%3A40
>
> So we're back to square one.  Although Tomas' recent report might
> give us something new to chase.

Thanks for pointing out that Tomas had a lead - I missed that.

I'm concerned that this item has the potential to delay the release,
since, as you said, we're back to the drawing board.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-10 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:
>> Oooh ... but pg_class wouldn't be big enough to get a parallel
>> index rebuild during that test, would it?

> Typically not, but I don't think that we can rule it out right away.

Didn't take long to show that the relmapper issue wasn't it:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu=2018-08-10%2021%3A21%3A40

So we're back to square one.  Although Tomas' recent report might
give us something new to chase.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-09 Thread Peter Geoghegan
On Wed, Aug 8, 2018 at 10:08 PM, Peter Geoghegan  wrote:
>> Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
>> a parallel build happened or not, so that we can check the buildfarm logs
>> next time we see that failure?
>
> I can do that tomorrow. Of course, it might be easier to just push the
> pending fix for the mapped relation bug, and see if that makes the
> issue go away. That would take longer, but it's also the only thing
> that's likely to be definitive anyway.

I'm going to push my fix for the relmapper.c parallel CREATE INDEX bug
tomorrow, and see what happens. I have a hard time imagining how there
could be a parallel index build on pg_class, now that Andrew has
indicated there is no reason to think that
"min_parallel_table_scan_size = 0" could slip in.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-09 Thread Andrew Dunstan




On 08/09/2018 01:03 AM, Tom Lane wrote:

Peter Geoghegan  writes:

On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:

Oooh ... but pg_class wouldn't be big enough to get a parallel
index rebuild during that test, would it?

Typically not, but I don't think that we can rule it out right away.

Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
a parallel build happened or not, so that we can check the buildfarm logs
next time we see that failure?


I don't know all that much about the buildfarm client code, and it's
late.

It doesn't really stick in any undocumented configuration changes,
AFAIK.  Possibly Andrew would have some more insight.




No, everything should be visible in the config. Hidden things are what I 
try to avoid.


The only things the configure()  adds are the prefix and pgport 
settings, and the cache-file. make() only adds a "-j jobs" if so 
configured. make_check() normally just runs "make NO_LOCALE=1 check".




cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-08 Thread Peter Geoghegan
On Wed, Aug 8, 2018 at 10:03 PM, Tom Lane  wrote:
>> Typically not, but I don't think that we can rule it out right away.
>
> Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
> a parallel build happened or not, so that we can check the buildfarm logs
> next time we see that failure?

I can do that tomorrow. Of course, it might be easier to just push the
pending fix for the mapped relation bug, and see if that makes the
issue go away. That would take longer, but it's also the only thing
that's likely to be definitive anyway.

If you want to go ahead with it yourself, I suggest modifying the
DEBUG1 elog()s within index_build() to be LOG elevel.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:
>> Oooh ... but pg_class wouldn't be big enough to get a parallel
>> index rebuild during that test, would it?

> Typically not, but I don't think that we can rule it out right away.

Hmmm ... maybe we should temporarily stick in an elog(LOG) showing whether
a parallel build happened or not, so that we can check the buildfarm logs
next time we see that failure?

> I don't know all that much about the buildfarm client code, and it's
> late.

It doesn't really stick in any undocumented configuration changes,
AFAIK.  Possibly Andrew would have some more insight.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-08 Thread Peter Geoghegan
On Wed, Aug 8, 2018 at 7:40 PM, Tom Lane  wrote:
>> Anyway, "VACUUM FULL pg_class" should be expected to corrupt
>> pg_class_oid_index when we happen to get a parallel build, since
>> pg_class is a mapped relation, and I've identified that as a problem
>> for parallel CREATE INDEX [2]. If that was the ultimate cause of the
>> issue, it would explain why only REL_11_STABLE and master are
>> involved.
>
> Oooh ... but pg_class wouldn't be big enough to get a parallel
> index rebuild during that test, would it?

Typically not, but I don't think that we can rule it out right away.
"min_parallel_table_scan_size = 0" will make it happen, and that
happens in quite a few regression tests. Though that doesn't include
vacuum.sql.

The animals mandrill, mantid and lapwing are all
"force_parallel_mode=regress", according to their notes. That actually
isn't enough to force parallel CREATE INDEX, though I tend to think it
should be. I don't see anything interesting in their "extra_config",
but perhaps "min_parallel_table_scan_size = 0" got in some other way.
I don't know all that much about the buildfarm client code, and it's
late.

-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 25, 2018 at 4:07 PM, Andres Freund  wrote:
>> I don't immediately see it being responsible, but I wonder if there's a
>> chance it actually is: Note that it happens in a parallel group that
>> includes vacuum.sql, which does a VACUUM FULL pg_class - but I still
>> don't immediately see how it could apply.

> Anyway, "VACUUM FULL pg_class" should be expected to corrupt
> pg_class_oid_index when we happen to get a parallel build, since
> pg_class is a mapped relation, and I've identified that as a problem
> for parallel CREATE INDEX [2]. If that was the ultimate cause of the
> issue, it would explain why only REL_11_STABLE and master are
> involved.

Oooh ... but pg_class wouldn't be big enough to get a parallel
index rebuild during that test, would it?

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-08-08 Thread Peter Geoghegan
On Wed, Jul 25, 2018 at 4:07 PM, Andres Freund  wrote:
>> HEAD/REL_11_STABLE apparently solely being affected points elsewhere,
>> but I don't immediatley know where.
>
> Hm, there was:
> http://archives.postgresql.org/message-id/20180628150209.n2qch5jtn3vt2xaa%40alap3.anarazel.de
>
>
> I don't immediately see it being responsible, but I wonder if there's a
> chance it actually is: Note that it happens in a parallel group that
> includes vacuum.sql, which does a VACUUM FULL pg_class - but I still
> don't immediately see how it could apply.

It's now pretty clear that it was not that particular bug, since I
pushed a fix, and yet the issue hasn't gone away on affected buildfarm
animals. There was a recurrence of the problem on lapwing, for example
[1].

Anyway, "VACUUM FULL pg_class" should be expected to corrupt
pg_class_oid_index when we happen to get a parallel build, since
pg_class is a mapped relation, and I've identified that as a problem
for parallel CREATE INDEX [2]. If that was the ultimate cause of the
issue, it would explain why only REL_11_STABLE and master are
involved.

My guess is that the metapage considers the root page to be at block 3
(block 3 is often the root page for small though not tiny B-Trees),
which for whatever reason is where we get a short read. I don't know
why there is a short read, but corrupting mapped catalog indexes at
random can be expected to cause all kinds of chaos, so that doesn't
mean much.

In any case, I'll probably push a fix for this other bug on Friday,
barring any objections. It's possible that that will make the problem
go away.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2018-08-04%2004%3A20%3A01
[2] 
https://www.postgresql.org/message-id/CAH2-Wzn=j0i8rxCAo6E=tbo9xuyxb8hbusnw7j_stkon8dd...@mail.gmail.com
-- 
Peter Geoghegan



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-20 13:24:50 -0700, Andres Freund wrote:
> On 2018-07-20 16:15:14 -0400, Tom Lane wrote:
> > We've seen several occurrences of $subject in the buildfarm in the past
> > month or so.  Scraping the logs, I find
> > 
> >  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> > 23:31:44.505 CEST [5b22deb8.30e1:124] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> > 23:31:44.653 CEST [5b22deb8.6d4d:132] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> >  gull | 2018-06-21 04:02:11 | HEAD  | Check | 2018-06-20 
> > 21:27:06.843 PDT [31138:3] FATAL:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> > 16:46:24.138 UTC [14353240:43] pg_regress/create_table_like ERROR:  could 
> > not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
> >  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> > 16:46:24.137 UTC [20710034:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> > 07:21:45.585 UTC [24022:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> > 07:21:45.591 UTC [23941:39] pg_regress/roleattributes ERROR:  could not 
> > read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
> >  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> > 21:21:32.557 EDT [26072:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> > 21:21:32.557 EDT [25990:82] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-07-17 01:27:39 | HEAD  | Check | 2018-07-17 
> > 03:47:26.264 CEST [5b4d4aa4.3ac4:71] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-07-20 11:18:02 | HEAD  | Check | 2018-07-20 
> > 13:32:14.104 CEST [5b51c833.4884:131] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> > 
> > While I didn't go back indefinitely far, there are no other occurrences of
> > "could not read block" failures in the last three months.  This suggests
> > strongly that we broke something in early June.  Don't know what though.
> > Ideas?
> > 
> > In case anyone's wondering, 2662 is pg_class_oid_index.
> 
> While I do not immediately see how, and the affected versions don't
> really point that way, the timing does make me think of
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a54e1f1587793b5bf926630ec9ce142e4578d7a0
> 
> HEAD/REL_11_STABLE apparently solely being affected points elsewhere,
> but I don't immediatley know where.

Hm, there was:
http://archives.postgresql.org/message-id/20180628150209.n2qch5jtn3vt2xaa%40alap3.anarazel.de

I don't immediately see it being responsible, but I wonder if there's a
chance it actually is: Note that it happens in a parallel group that
includes vacuum.sql, which does a VACUUM FULL pg_class - but I still
don't immediately see how it could apply.

But either way, we need to fix the issue mentioned in the above email.

Greetings,

Andres Freund



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-07-20 Thread Andres Freund
On 2018-07-20 16:15:14 -0400, Tom Lane wrote:
> We've seen several occurrences of $subject in the buildfarm in the past
> month or so.  Scraping the logs, I find
> 
>  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> 23:31:44.505 CEST [5b22deb8.30e1:124] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> 23:31:44.653 CEST [5b22deb8.6d4d:132] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  gull | 2018-06-21 04:02:11 | HEAD  | Check | 2018-06-20 
> 21:27:06.843 PDT [31138:3] FATAL:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> 16:46:24.138 UTC [14353240:43] pg_regress/create_table_like ERROR:  could not 
> read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
>  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> 16:46:24.137 UTC [20710034:1] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> 07:21:45.585 UTC [24022:1] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> 07:21:45.591 UTC [23941:39] pg_regress/roleattributes ERROR:  could not read 
> block 3 in file "base/16384/2662": read only 0 of 8192 bytes
>  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> 21:21:32.557 EDT [26072:1] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> 21:21:32.557 EDT [25990:82] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  coypu| 2018-07-17 01:27:39 | HEAD  | Check | 2018-07-17 
> 03:47:26.264 CEST [5b4d4aa4.3ac4:71] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
>  coypu| 2018-07-20 11:18:02 | HEAD  | Check | 2018-07-20 
> 13:32:14.104 CEST [5b51c833.4884:131] ERROR:  could not read block 3 in file 
> "base/16384/2662": read only 0 of 8192 bytes
> 
> While I didn't go back indefinitely far, there are no other occurrences of
> "could not read block" failures in the last three months.  This suggests
> strongly that we broke something in early June.  Don't know what though.
> Ideas?
> 
> In case anyone's wondering, 2662 is pg_class_oid_index.

While I do not immediately see how, and the affected versions don't
really point that way, the timing does make me think of
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a54e1f1587793b5bf926630ec9ce142e4578d7a0

HEAD/REL_11_STABLE apparently solely being affected points elsewhere,
but I don't immediatley know where.

Greetings,

Andres Freund



buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-07-20 Thread Tom Lane
We've seen several occurrences of $subject in the buildfarm in the past
month or so.  Scraping the logs, I find

 coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
23:31:44.505 CEST [5b22deb8.30e1:124] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
23:31:44.653 CEST [5b22deb8.6d4d:132] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 gull | 2018-06-21 04:02:11 | HEAD  | Check | 2018-06-20 
21:27:06.843 PDT [31138:3] FATAL:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
16:46:24.138 UTC [14353240:43] pg_regress/create_table_like ERROR:  could not 
read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
 mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
16:46:24.137 UTC [20710034:1] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
07:21:45.585 UTC [24022:1] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
07:21:45.591 UTC [23941:39] pg_regress/roleattributes ERROR:  could not read 
block 3 in file "base/16384/2662": read only 0 of 8192 bytes
 mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
21:21:32.557 EDT [26072:1] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
21:21:32.557 EDT [25990:82] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 coypu| 2018-07-17 01:27:39 | HEAD  | Check | 2018-07-17 
03:47:26.264 CEST [5b4d4aa4.3ac4:71] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes
 coypu| 2018-07-20 11:18:02 | HEAD  | Check | 2018-07-20 
13:32:14.104 CEST [5b51c833.4884:131] ERROR:  could not read block 3 in file 
"base/16384/2662": read only 0 of 8192 bytes

While I didn't go back indefinitely far, there are no other occurrences of
"could not read block" failures in the last three months.  This suggests
strongly that we broke something in early June.  Don't know what though.
Ideas?

In case anyone's wondering, 2662 is pg_class_oid_index.

regards, tom lane