In June I reported volumes corrupted by the salvager. Hartmut Reuter fixed the problem a while ago suspecting a big/little endian problem when setting the length of a salvaged directory (-salvagedir option). This wasn't the case, but the error was in that area and his patch fixes it!

For completeness, here's a complementary patch that fixes the SplitInt64 macro which is the culprit, on all 64-bit machines:

Here's a little refresh:


on a few (1/1000) volumes the length of the volume root directory would be flagged as something astronomically big, to be corrected to 6144, 8192 or other more modest multiples of 2k. And sure enough, doing the real salvage the root would then be logically truncated (not to say completely messed up) and plenty of orphaned files would appear.

When you converted that "BIG" number to hexadecimal you got the peculiar pattern 0x180000001800, or 0x080000000800, i.e. always the "final" multiple of 2K ored into that same multiple shifted left by 32 bits!


And it happens like this:

On 64-bit machines SplitInt64 shifts the operand 32 bits to the right in order to isolate the high-order 32-bit word. If the operand itself is a 32-Bit integer (as is the case with "Length()" in vol-salvage.c), the behaviour is undefined according to the C standard! What gcc makes out of it is careless or at least unhelpful in my humble opinion, but who am I to criticize: it shifts the operand by 32 bits, forgetting that on Intel the shift counter is taken modulo 32 (in 32-bit mode), hence the shift is a no-op. On SPARC the same thing happens, but here it is (odd enough) the Sun compiler who shifts by modulo 32. Probably I am alone in regretting that given that both got worried to the point of issuing a warning, neither chose the solution closest to the underlying mathematical principle, anything resembling 32 divisions by 2...

Anyway, the corrupted vnode length field which repeats the length in two 32-bit word positions is thus explained, and the (attached) simple solution is to type-cast the operand to afs_int64 before the shift. (There are others: the construct ((i >> 16) >> 16) may fool some compilers in doing the right thing... and surprise people looking at the code).

Bcc'ed to openafs-bugs.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
--- openafs/src/config/stds.h.o0        2005-05-08 08:01:12.000000000 +0200
+++ openafs/src/config/stds.h   2008-08-12 13:51:29.000000000 +0200
@@ -67,7 +67,7 @@
 #define NonZeroInt64(a)                (a)
 #define Int64ToInt32(a)    (a) & 0xFFFFFFFFL
 #define FillInt64(t,h,l) (t) = (h); (t) <<= 32; (t) |= (l);
-#define SplitInt64(t,h,l) (h) = (t) >> 32; (l) = (t) & 0xFFFFFFFF;
+#define SplitInt64(t,h,l) (h) = ((afs_int64)t) >> 32; (l) = (t) & 0xFFFFFFFF;
 #else /* AFS_64BIT_ENV */
 typedef long afs_int32;
 typedef unsigned long afs_uint32;

Reply via email to