Re: [ccache] Corrupt objects from three colliding compiles

2010-05-09 Thread Joel Rosdahl
On 2010-05-06 19:14, Wilson Snyder wrote:

 [...]
 [1273161404.432174,2424 ] === 2010-05-06 11:56:44 ===
 [1273161404.432636,2424 ] Hostname: maserati
 [1273161404.432643,2424 ] Cwd: /nfs/compiling/dir02
 [1273161404.432646,2424 ] Source file: INV.c
 [1273161404.432648,2424 ] Object file: INV.o
 [1273161404.432657,2424 ] Trying direct lookup
 [1273161404.439730,2424 ] Looking for object file hash in 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 [1273161404.440126,2424 ] Did not find object file hash in manifest
 [1273161404.440134,2424 ] Running preprocessor
 [1273161404.469188,2084 ] === 2010-05-06 11:56:44 ===
 [1273161404.469515,2084 ] Hostname: bird07
 [1273161404.469519,2084 ] Cwd: /nfs/compiling/dir03
 [1273161404.469521,2084 ] Source file: INV.c
 [1273161404.469523,2084 ] Object file: INV.o
 [1273161404.469532,2084 ] Trying direct lookup
 [1273161404.471093,2084 ] Looking for object file hash in 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 [1273161404.475427,2084 ] Did not find object file hash in manifest
 [1273161404.475434,2084 ] Running preprocessor
 [1273161404.487509,2424 ] Got object file hash from preprocessor
 [1273161404.492494,2424 ] Object file 
 /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o not in cache

Job 2424 doesn't see the object file in the cache...

 [1273161404.492522,2424 ] Running real compiler

...and decides to create it.

 [1273161404.511123,21936] === 2010-05-06 11:56:44 ===
 [1273161404.511506,21936] Hostname: bat05
 [1273161404.511510,21936] Cwd: /nfs/compiling/dir04
 [1273161404.511513,21936] Source file: INV.c
 [1273161404.511515,21936] Object file: INV.o
 [1273161404.511523,21936] Trying direct lookup
 [1273161404.514058,21936] Looking for object file hash in 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 [1273161404.517648,21936] Did not find object file hash in manifest
 [1273161404.517654,21936] Running preprocessor
 [1273161404.523658,2084 ] Got object file hash from preprocessor
 [1273161404.542730,2084 ] Created INV.o from 
 /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o

But here, job 2084 sees the object file in the cache ~34 ms before the
object file was created...

 [1273161404.564688,21936] Got object file hash from preprocessor
 [1273161404.569955,2084 ] Added object file hash to 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 [1273161404.570387,2084 ] Succeded getting cached result
 [1273161404.576412,2424 ] Stored in cache: 
 /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o

...by job 2424 here. Maybe the clocks are more than 12 ms apart?

 [1273161404.588776,21936] Created INV.o from 
 /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o   ***BAD OBJECT***
 [1273161404.596098,2424 ] Created INV.o from 
 /nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
 [1273161404.630619,21936] Added object file hash to 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 [1273161404.634864,21936] Succeded getting cached result
 [1273161404.638443,2424 ] Added object file hash to 
 /nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
 
 It looks like the third job's read of the cache is stuck in
 between the first's write and the second's.  Note that times
 may be off by 8ms, this would put the bad object read right
 on top of the object update by the second compile when it
 hits the NFS server.

Since renames should be atomic on NFS, I don't see any problem here, but
I'm by no means an NFS expert.

-- Joel
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] Corrupt objects from three colliding compiles

2010-05-06 Thread Wilson Snyder

I'm hitting a NFS issue causing bad caching.  (And yes, we have
no_subtree_check.)

This is a bug present in 2.4, but it is also present in
trunk b964c173cf493a1c0c0749a9c6a39bc07936507b which is used
for the below diagnosis.

The issue seems to occur only when there are three compiles
on different systems all compiling the same file at the same
time.  Here's the C files:

$ find . -name INV.c -exec ls -la --time-style=full-iso {} \;
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:38:39.0 -0400 ./dir02/INV.c
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:48:57.0 -0400 ./dir03/INV.c
-rw-rw-r-- 1 wsnyder users 405 2010-05-06 11:43:08.0 -0400 ./dir04/INV.c

They got compiled, but one got truncated

$ find . -name INV.o -exec ls -la --time-style=full-iso {} \;
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.0 -0400 ./dir02/INV.o
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.0 -0400 ./dir03/INV.o
-rw-rw-r-- 1 wsnyder users   0 2010-05-06 11:56:44.0 -0400 ./dir04/INV.o

Note the timestamps (unfortunately this isn't a file system
with sub-second resolution!)

The cache files:

$ ls -la --time-style=full-iso $CCACHE_DIR/2/b/2/eedf0591498af8e5f3a*
-rw-rw-r-- 1 wsnyder users 916 2010-05-06 11:56:44.0 -0400 
$CCACHE_DIR/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o

$ ls -la --time-style=full-iso $CCACHE_DIR/0/3/8/f19fe0a63fc088a9c57*
-rw-rw-r-- 1 wsnyder users 874 2010-05-06 11:56:44.0 -0400 
$CCACHE_DIR/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest

To debug this further I added hostname and timestamps to the
log files (earlier email sent this patch), otherwise I
couldn't know what the ordering was.  This gave three
reports:

[1273161404.432174,2424 ] === 2010-05-06 11:56:44 ===
[1273161404.432636,2424 ] Hostname: maserati
[1273161404.432643,2424 ] Cwd: /nfs/compiling/dir02
[1273161404.432646,2424 ] Source file: INV.c
[1273161404.432648,2424 ] Object file: INV.o
[1273161404.432657,2424 ] Trying direct lookup
[1273161404.439730,2424 ] Looking for object file hash in 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.440126,2424 ] Did not find object file hash in manifest
[1273161404.440134,2424 ] Running preprocessor
[1273161404.487509,2424 ] Got object file hash from preprocessor
[1273161404.492494,2424 ] Object file 
/nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o not in cache
[1273161404.492522,2424 ] Running real compiler
[1273161404.576412,2424 ] Stored in cache: 
/nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.596098,2424 ] Created INV.o from 
/nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.638443,2424 ] Added object file hash to 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest

[1273161404.469188,2084 ] === 2010-05-06 11:56:44 ===
[1273161404.469515,2084 ] Hostname: bird07
[1273161404.469519,2084 ] Cwd: /nfs/compiling/dir03
[1273161404.469521,2084 ] Source file: INV.c
[1273161404.469523,2084 ] Object file: INV.o
[1273161404.469532,2084 ] Trying direct lookup
[1273161404.471093,2084 ] Looking for object file hash in 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.475427,2084 ] Did not find object file hash in manifest
[1273161404.475434,2084 ] Running preprocessor
[1273161404.523658,2084 ] Got object file hash from preprocessor
[1273161404.542730,2084 ] Created INV.o from 
/nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o
[1273161404.569955,2084 ] Added object file hash to 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.570387,2084 ] Succeded getting cached result

[1273161404.511123,21936] === 2010-05-06 11:56:44 ===
[1273161404.511506,21936] Hostname: bat05
[1273161404.511510,21936] Cwd: /nfs/compiling/dir04
[1273161404.511513,21936] Source file: INV.c
[1273161404.511515,21936] Object file: INV.o
[1273161404.511523,21936] Trying direct lookup
[1273161404.514058,21936] Looking for object file hash in 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.517648,21936] Did not find object file hash in manifest
[1273161404.517654,21936] Running preprocessor
[1273161404.564688,21936] Got object file hash from preprocessor
[1273161404.588776,21936] Created INV.o from 
/nfs/ccache-dir/2/b/2/eedf0591498af8e5f3a27fcc89c4f-69975.o   ***BAD OBJECT***
[1273161404.630619,21936] Added object file hash to 
/nfs/ccache-dir/0/3/8/f19fe0a63fc088a9c570845a0a4eb-584.manifest
[1273161404.634864,21936] Succeded getting cached result

The next step assumes our times are close across systems; they should be within 
about 12ms.  Here's the sorted view

[1273161404.432174,2424 ] === 2010-05-06 11:56:44 ===
[1273161404.432636,2424 ] Hostname: maserati
[1273161404.432643,2424 ] Cwd: /nfs/compiling/dir02
[1273161404.432646,2424 ] Source file: INV.c
[1273161404.432648,2424 ] Object file: INV.o
[1273161404.432657,2424 ] Trying direct lookup
[1273161404.439730,2424 ] Looking