On Nov 29, 2010, at 1:48 AM, Jonathan Nieder wrote:
Results (on ext4) suggest that patches 1 and 4 matter and the rest are
within noise. Timings are rough; sometimes replicates vary by as much
as a second. Numbers are cold cache (i.e., after running sync and
echo 3.../drop_caches), best of 3, dpkg --install python2.7 and
python2.7-minimal.
before:
5.73user 1.62system 0:33.84elapsed 21%CPU (0avgtext+0avgdata
89968maxresident)k
0inputs+0outputs (0major+46962minor)pagefaults 0swaps
patch 1 (use SYNC_FILE_RANGE_WRITE):
5.64user 1.69system 0:10.47elapsed 69%CPU (0avgtext+0avgdata
9maxresident)k
0inputs+0outputs (0major+46948minor)pagefaults 0swaps
patch 1+2 (use SYNC_FILE_RANGE_WAIT_BEFORE):
5.48user 1.61system 0:10.43elapsed 70%CPU (0avgtext+0avgdata
9maxresident)k
0inputs+0outputs (0major+46958minor)pagefaults 0swaps
So Patch #2 wasn't quite what I talked about doing; patch #2 is adding
SYNC_FILE_RANGE_WAIT_BEFORE for each file immediately after writing the file.
So it's the equivalent of:
extract(a)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE)
extract(b)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE)
extract(b)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE)
What I was suggesting was to use a separate for loop in patch #2, like patch #3:
extract(a)
sync_file_range(SYNC_FILE_RANGE_WRITE)
extract(b)
sync_file_range(SYNC_FILE_RANGE_WRITE)
extract(b)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(a, SYNC_FILE_RANGE_WAIT_BEFORE)
sync_file_range(b, SYNC_FILE_RANGE_WAIT_BEFORE)
sync_file_range(c, SYNC_FILE_RANGE_WAIT_BEFORE)
As to why the voodoo, the idea is to make sure all of the delayed allocation,
for all of the files, is completely resolved before the first fsync().The
reason why I suggested doing the WAIT_BEFORE as a separate path was to allow
for parallelism in the case where /var/cache/apt/archives is on a different
disk spindle than /usr. By doing this:
extract(a)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE)
extract(b)
sync_file_range(SYNC_FILE_RANGE_WRITE)
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE)
we make the copying get done in lockstep; that is, we don't start
extracting file b until the data blocks for a are done being written. If /var
and /usr were mounted on different floppy disks (yeah, I know) you'd see first
one disk light up, and then the other disk light up, back and forth, and it
would be slow and horrible. With the mechanism I suggested, both lights would
be on at the same time, since SYNC_FILE_RANGE_WRITE initiates the writeback,
but does not block for it to complete. SYNC_FILE_RANGE_WAIT_BEFORE is what
actually blocks. Does that help to visualize what I was going for?
BTW, if you had opened the file handle in subsequent passes using
O_RDONLY|O_NOATIME, the use of fdatasync() instead of fsync() might not have
been necessary. And as far as the comments in patch #4 was concerned, it
wasn't a matter of delaying the file modification time update that was my
concern; it was avoiding an update of the file access time caused by reopening
the file which concerned me. The reason why I did both in my test program was
because (a) I was paranoid, and (b) fdatasync() is standard, where as O_NOATIME
is another Linux-specific thing.
Thinking about this some more, though, using O_NOATIME may actually save more
time, and may in the end be more important than the use of fdatasync() vs.
fsync(). (Although I like doing the last amount of work necessary, and in this
case we really don't need to use fsync(); fdatasync() will do.)
-- Ted
--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/461105de-f8d2-421f-92e9-23e556823...@mit.edu