I was asked for a translation of my previous email, bonging the 4.2.1 RC0.

The problem in 4.2.0 was a shift in the set of WKS values. These are not just 
live data but also written to the cache in the object headers so if they change 
at all, it de facto invalidates the cache. The 4.2.0 crashes (TS-2564) are due 
to this, because various secondary bits of data get written inconsistently 
which in turns causes ATS to look up the wrong data for header fields. For 
instance, the VARY field would be written out along with a hint about where it 
was in the header. When read back in 4.2.0 ATS would use the stored WKS index 
to lookup the hint location and get the wrong location (because VARY had 
shifted) and use that to find the wrong data for VARY (possibly null or 
unallocated memory).

To fix this, 4.2.1 simply clears all the hints and rewrites them when the 
object is read from disk if using a cache version earlier than 4.2.1. This 
ignores the stored values and uses only the current in memory values.

However, it turns out that when the object is read from disk, it may be stored 
in the ram cache. If retrieved from ram cache later, it goes through the same 
logic as if it had been loaded from disk, which includes clearing and rewriting 
the hints. The ATS logic, though, doesn't lock the object for this because it 
is expected to be read only once read from the disk. The TS-2564 logic violates 
this and thereby creates a race condition between two transaction both access 
the same object. It is possible for one to check the valid hints for a field 
and then, while it is trying to retrieve the field, the other transaction can 
clear the hints causing the field to not be found. This leads to a crash 
because the logic assumes (reasonably) that if it's checked the hints and 
verified the field presence, the field is present and will be found. If the 
field is not found, you get a null pointer dereference.

The solution is to prevent the 4.2.0 fixup from being done on objects retrieved 
from the ram cache. There's no need as the fixup was done when it was read from 
disk and put in the ram cache. There is no race condition for disk reads 
because those are not shared until after the fixup.

Reply via email to