Do the file copying programs open their output files with
O_SEQUENTIAL ? If so, there is information to exploit...
You can change them to do so....
I rather meant : if a program opens a file for write with O_SEQUENTIAL
(which should be done when copying files), will reiser4 exploit the
information by flushing sooner, and in a more "streaming" fashion ? Or
will it use the default algorithm, flushing under memory pressure ?
The current behaviour of reiser4, flushing dirty pages late in case they
are modified again before being written, is excellent for many use cases
(random writes, small file writes, temporary files, copying files inside a
single spindle etc) ; but it isn't optimal for writing large amounts of
data in sequential fashion, like copying large files between disks, for
instance. Adapting this would be quite tricky I guess...
Consider the two following scenarios :
- A database is doing an UPDATE query on a table. It will issue a lot of
reads and writes, probably in random order. In this case, if the working
set fits in RAM, it pays big time to flush as late as possible, ideally
when the query is finished, because some pages may be written to multiple
times. Also, delayed allocation will reduce fragmentation of the files.
It's the same thing for doing a Make, unzipping an archive, copying many
small files, etc.
- A process acquires audio or video in realtime and streams it to disk,
or copies files from one disk to another. In this case it is better to
stream the data directly to disk, especially if the files are large.
Guessing is a pain in the ass. How can the application inform the
filesystem of its intentions ?