On Mon, Jul 14, 2014 at 04:51:03PM -0500, Kevin K wrote:
> On Jul 14, 2014, at 4:37 PM, Konstantin Olchanski <[email protected]> wrote:
> 
> > On Mon, Jul 14, 2014 at 04:33:03PM -0500, Kevin K wrote:
> >> I guess I don't understand the part about how files can be different sizes 
> >> on different filesystems.
> >> 
> >> They can obviously use up more or less disk space on different 
> >> filesystems.  For instance, a FAT disk with 32KB clusters will use up a 
> >> minimum of 32KB even for a 10 byte file.  While NTFS will probably put the 
> >> 10 bytes in the directory entry or use up a maximum of 4KB for 4KB 
> >> clusters.
> >> 
> >> But I don't see why rsync would care about the unused data.  It should 
> >> just sync the 10 bytes accessible.  I'm ignoring alternate streams here.
> > 
> > 
> > This is the usual confusion between the "st_size" and "st_blocks" entries 
> > in "struct stat" returned by lstat() and co.
> 
> Is what I was missing is complexities in files that, for example, may be 
> sparse?
> 
> I was thinking of the case that, when you do a ls -l, you normally get a byte 
> size value.  Depending on your options, you can also get block size, which du 
> would also return.
> 
> So, if I'm not going off the deep end, a quick determination of whether a 
> file is different probably has to check both values.  Since it may show 
> 1000000 bytes, but if sparse most of the file may be nulls and therefore no 
> on disk storage allocated to it.  If that changes, on even the same 
> filesystem, something may have changed and data may have to be synced.  And 
> with different cluster sizes, the normal case will be blocks used will be 
> different.


No, this will not work. You cannot rely on the "st_blocks" to compare file 
contents.

For example, some filesystems implement "tail packing", where contents of 
multiple files is packed
into a single block. (I think ReiserFS was the first to do this and I have no 
idea what it returned
as "st_blocks" for tail-packed files).

Anyhow, for tail-packing, different versions of the same filesystems may use 
different heuristics
on when files are packed or not and how and depending on what.

Not deterministic and not reliable.

Kind of like checking if this is the same person by counting the coins in their 
pockets.


-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

Reply via email to