> On Nov 17, 2019, at 11:21 AM, HRISHIKESH GOYAL <hrishi.go...@gmail.com> wrote:
> 
> Questions:
> 1. As what I follow from the above stackoverflow answer and truncate man 
> page, even though `truncate` doesn't allocate space for file baz but 
> filesystem should still update the free space by reducing it to 
> 0.3G(otherwise filesystem metadata are not consistent with file metadata). 
> Could anyone please correct me?
> 
> 2. Does it mean that `truncate` only updates file vnode (i.e. size) attribute 
> and doesn't update super block (free_space) attribute?
> 
> 3. I checked first 100 bytes in both above files using c lang fread() 
> function, all are filled with NULL character ( '\0' ), how file bar 
> (previously fallocate'ed file) got initialised with NULLs(as per my 
> understanding since they are uninitialised, they should be some random 
> bytes.. and not all nulls right?).

I think what you are missing is that that many file systems support sparse 
files.  Consider an application that does:

1- Create file "foo".
2- Write a single byte to offset 0.
3- Write a single byte to offset (4GiB-1).

That file will have a logical size of 4GiB; this size is recorded in the inode. 
 However, on FFS, it will only have 2 file system blocks allocated.  The direct 
and indirect block pointers for the whole middle range will not point to any 
physical space on disk[*], and when an application reads from that range, the 
file system will return zero-filled pages.

[*] ...a little bit of hand-waving some of the details here; some of the 
indirect block pointers will in fact be filled in, because they are needed to 
be able to find the block at the end of the file that's actually allocated, and 
at 4GiB, you're definitely into indirect block territory.

This is similar to what happens when you call truncate() on a file with a size 
beyond the current EOF, only in that case, you didn't need to write a byte to 
the end to get the size to change; there's simply no block allocated to the end 
of the file.

Now, what happens if you do a posix_fallocate("foo", 0, 4GiB)?  The file system 
will have to allocate all of the necessary space, FILL IT WITH ZEROS, and fill 
in the direct and indirect block pointers in the inode.

Now, a file system is allowed to make an optimization, here.  The 
posix_fallocate() specification does state that if offset+len is beyond the 
current file size, that the file size will be updated, i.e. it behaves like 
ftruncate() in that regard.  However, the file system is allowed to NOT zero 
out the space SO LONG AS it knows that the space is uninitialized and thus 
return zero-filled pages when the space is read.  This allows the file system 
to avoid redundantly filling the space with zeros only to have those zeros 
overwritten with actual data later.  This is good for performance AND for 
reducing PE cycles on flash storage.  This would require an additional size 
field in the inode to indicate the end if the initialized space (this 
information would have to persist across unmounts, and essentially represents 
an incompatible format change in the case of FFS since software that does not 
understand this extra field could not safely mount the file system).

Technically, a file system is allowed to make that optimization for the 
"allocate to fill in a sparse hole" case as well, but it would require a bunch 
of extra metadata to track the valid ranges of the file, and so probably isn't 
worth it.

-- thorpej

Reply via email to