Re: [HACKERS] Latent cache flush hazard in RelationInitIndexAccessInfo

2016-05-23 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Sep 11, 2015 at 11:29 AM, Robert Haas  wrote:
>> On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane  wrote:
>>> The specific reason there's a problem is that there's a disconnect between
>>> RelationClearRelation's test for whether a relcache entry has valid index
>>> info (it checks whether rd_indexcxt != NULL) and the order of operations
>>> in RelationInitIndexAccessInfo (creating the index context is not the
>>> first thing it does).  Given that if RelationClearRelation thinks the
>>> info is valid, what it does is call RelationReloadIndexInfo which needs
>>> rd_index to be valid, I'm thinking the best fix is to leave the order of
>>> operations in RelationInitIndexAccessInfo alone and instead make the
>>> "relcache entry has index info" check be "rd_index != NULL".  There might
>>> need to be some additional work to keep RelationReloadIndexInfo from
>>> setting rd_isvalid = true too soon.

>> Doesn't seem like a bad thing to clean up.

> Doesn't sound bad to me either to reinforce things on HEAD... It's
> been months since this has been posted, are you working on a patch?

This dropped off my radar screen somehow.  (I have a feeling I fixed it in
Salesforce's code and forgot to transfer the fix to community.)

Given that this is just latent for the community's purposes, it seems
probably inappropriate to fix it post-beta1.  I'm inclined to let it
slide till 9.7^H^H^H10 development opens.  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latent cache flush hazard in RelationInitIndexAccessInfo

2016-05-21 Thread Michael Paquier
On Fri, Sep 11, 2015 at 11:29 AM, Robert Haas  wrote:
> On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane  wrote:
>> Some stuff Salesforce has been doing turned up the fact that
>> RelationInitIndexAccessInfo is not safe against a relcache flush on
>> its target index.  Specifically, the failure goes like this:
>>
>> * RelationInitIndexAccessInfo loads up relation->rd_indextuple.
>>
>> * Within one of the subsequent catalog fetches, eg the pg_am read, a
>> relcache flush occurs (this can be forced with CLOBBER_CACHE_ALWAYS).
>>
>> * RelationClearRelation keeps the relcache entry, since it has
>> positive refcount, but doesn't see a reason to keep the index-related
>> fields.
>>
>> * RelationInitIndexAccessInfo dumps core when it tries to fetch
>> fields out of relation->rd_indextuple, which has been reset to NULL.
>>
>> Now, it turns out this scenario cannot happen in the standard Postgres
>> code as it stands today (if it could, we'd have seen it in the buildfarm's
>> CLOBBER_CACHE_ALWAYS testing).  The reason for that is that
>> RelationInitIndexAccessInfo is actually only called on newly-minted
>> Relation structs that are not yet installed into the relcache hash table;
>> hence RelationCacheInvalidate can't find them to zap them.  It can happen
>> in Salesforce's modified code though, and it's not hard to imagine that
>> future changes in the community version might expose the same hazard.
>> (For one reason, the current technique implies that an error halfway
>> through relcache load results in leaking the Relation struct and
>> subsidiary data; we might eventually decide that's not acceptable.)
>>
>> We could just ignore the problem until/unless it becomes real for us,
>> but now that I've figured it out I'm inclined to do something before
>> the knowledge disappears again.
>>
>> The specific reason there's a problem is that there's a disconnect between
>> RelationClearRelation's test for whether a relcache entry has valid index
>> info (it checks whether rd_indexcxt != NULL) and the order of operations
>> in RelationInitIndexAccessInfo (creating the index context is not the
>> first thing it does).  Given that if RelationClearRelation thinks the
>> info is valid, what it does is call RelationReloadIndexInfo which needs
>> rd_index to be valid, I'm thinking the best fix is to leave the order of
>> operations in RelationInitIndexAccessInfo alone and instead make the
>> "relcache entry has index info" check be "rd_index != NULL".  There might
>> need to be some additional work to keep RelationReloadIndexInfo from
>> setting rd_isvalid = true too soon.
>>
>> Thoughts?
>
> Doesn't seem like a bad thing to clean up.

Doesn't sound bad to me either to reinforce things on HEAD... It's
been months since this has been posted, are you working on a patch?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latent cache flush hazard in RelationInitIndexAccessInfo

2015-09-11 Thread Robert Haas
On Wed, Sep 9, 2015 at 3:00 PM, Tom Lane  wrote:
> Some stuff Salesforce has been doing turned up the fact that
> RelationInitIndexAccessInfo is not safe against a relcache flush on
> its target index.  Specifically, the failure goes like this:
>
> * RelationInitIndexAccessInfo loads up relation->rd_indextuple.
>
> * Within one of the subsequent catalog fetches, eg the pg_am read, a
> relcache flush occurs (this can be forced with CLOBBER_CACHE_ALWAYS).
>
> * RelationClearRelation keeps the relcache entry, since it has
> positive refcount, but doesn't see a reason to keep the index-related
> fields.
>
> * RelationInitIndexAccessInfo dumps core when it tries to fetch
> fields out of relation->rd_indextuple, which has been reset to NULL.
>
> Now, it turns out this scenario cannot happen in the standard Postgres
> code as it stands today (if it could, we'd have seen it in the buildfarm's
> CLOBBER_CACHE_ALWAYS testing).  The reason for that is that
> RelationInitIndexAccessInfo is actually only called on newly-minted
> Relation structs that are not yet installed into the relcache hash table;
> hence RelationCacheInvalidate can't find them to zap them.  It can happen
> in Salesforce's modified code though, and it's not hard to imagine that
> future changes in the community version might expose the same hazard.
> (For one reason, the current technique implies that an error halfway
> through relcache load results in leaking the Relation struct and
> subsidiary data; we might eventually decide that's not acceptable.)
>
> We could just ignore the problem until/unless it becomes real for us,
> but now that I've figured it out I'm inclined to do something before
> the knowledge disappears again.
>
> The specific reason there's a problem is that there's a disconnect between
> RelationClearRelation's test for whether a relcache entry has valid index
> info (it checks whether rd_indexcxt != NULL) and the order of operations
> in RelationInitIndexAccessInfo (creating the index context is not the
> first thing it does).  Given that if RelationClearRelation thinks the
> info is valid, what it does is call RelationReloadIndexInfo which needs
> rd_index to be valid, I'm thinking the best fix is to leave the order of
> operations in RelationInitIndexAccessInfo alone and instead make the
> "relcache entry has index info" check be "rd_index != NULL".  There might
> need to be some additional work to keep RelationReloadIndexInfo from
> setting rd_isvalid = true too soon.
>
> Thoughts?

Doesn't seem like a bad thing to clean up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers