[OpenAFS-devel] Re: [OpenAFS] Volume root corruptions - anybody seen those? Explained and corrected

Rainer Toebbicke Tue, 12 Aug 2008 07:42:23 -0700

In June I reported volumes corrupted by the salvager. Hartmut Reuterfixed the problem a while ago suspecting a big/little endian problemwhen setting the length of a salvaged directory (-salvagedir option).This wasn't the case, but the error was in that area and his patchfixes it!

For completeness, here's a complementary patch that fixes theSplitInt64 macro which is the culprit, on all 64-bit machines:


Here's a little refresh:

on a few (1/1000) volumes the length of the volume root directory wouldbe flagged as something astronomically big, to be corrected to 6144,8192 or other more modest multiples of 2k. And sure enough, doing thereal salvage the root would then be logically truncated (not to saycompletely messed up) and plenty of orphaned files would appear.
When you converted that "BIG" number to hexadecimal you got the peculiarpattern 0x180000001800, or 0x080000000800, i.e. always the "final"multiple of 2K ored into that same multiple shifted left by 32 bits!


And it happens like this:

On 64-bit machines SplitInt64 shifts the operand 32 bits to the rightin order to isolate the high-order 32-bit word. If the operand itselfis a 32-Bit integer (as is the case with "Length()" in vol-salvage.c),the behaviour is undefined according to the C standard! What gcc makesout of it is careless or at least unhelpful in my humble opinion, butwho am I to criticize: it shifts the operand by 32 bits, forgettingthat on Intel the shift counter is taken modulo 32 (in 32-bit mode),hence the shift is a no-op. On SPARC the same thing happens, but hereit is (odd enough) the Sun compiler who shifts by modulo 32. ProbablyI am alone in regretting that given that both got worried to the pointof issuing a warning, neither chose the solution closest to theunderlying mathematical principle, anything resembling 32 divisions by2...

Anyway, the corrupted vnode length field which repeats the length intwo 32-bit word positions is thus explained, and the (attached) simplesolution is to type-cast the operand to afs_int64 before the shift.(There are others: the construct ((i >> 16) >> 16) may fool somecompilers in doing the right thing... and surprise people looking atthe code).


Bcc'ed to openafs-bugs.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155

--- openafs/src/config/stds.h.o0        2005-05-08 08:01:12.000000000 +0200
+++ openafs/src/config/stds.h   2008-08-12 13:51:29.000000000 +0200
@@ -67,7 +67,7 @@
 #define NonZeroInt64(a)                (a)
 #define Int64ToInt32(a)    (a) & 0xFFFFFFFFL
 #define FillInt64(t,h,l) (t) = (h); (t) <<= 32; (t) |= (l);
-#define SplitInt64(t,h,l) (h) = (t) >> 32; (l) = (t) & 0xFFFFFFFF;
+#define SplitInt64(t,h,l) (h) = ((afs_int64)t) >> 32; (l) = (t) & 0xFFFFFFFF;
 #else /* AFS_64BIT_ENV */
 typedef long afs_int32;
 typedef unsigned long afs_uint32;

[OpenAFS-devel] Re: [OpenAFS] Volume root corruptions - anybody seen those? Explained and corrected

Reply via email to