> On Nov 17, 2019, at 11:21 AM, HRISHIKESH GOYAL <hrishi.go...@gmail.com> wrote:
>
> Questions:
> 1. As what I follow from the above stackoverflow answer and truncate man
> page, even though `truncate` doesn't allocate space for file baz but
> filesystem should still update the free space by reducing it to
> 0.3G(otherwise filesystem metadata are not consistent with file metadata).
> Could anyone please correct me?
>
> 2. Does it mean that `truncate` only updates file vnode (i.e. size) attribute
> and doesn't update super block (free_space) attribute?
>
> 3. I checked first 100 bytes in both above files using c lang fread()
> function, all are filled with NULL character ( '\0' ), how file bar
> (previously fallocate'ed file) got initialised with NULLs(as per my
> understanding since they are uninitialised, they should be some random
> bytes.. and not all nulls right?).
I think what you are missing is that that many file systems support sparse
files. Consider an application that does:
1- Create file "foo".
2- Write a single byte to offset 0.
3- Write a single byte to offset (4GiB-1).
That file will have a logical size of 4GiB; this size is recorded in the inode.
However, on FFS, it will only have 2 file system blocks allocated. The direct
and indirect block pointers for the whole middle range will not point to any
physical space on disk[*], and when an application reads from that range, the
file system will return zero-filled pages.
[*] ...a little bit of hand-waving some of the details here; some of the
indirect block pointers will in fact be filled in, because they are needed to
be able to find the block at the end of the file that's actually allocated, and
at 4GiB, you're definitely into indirect block territory.
This is similar to what happens when you call truncate() on a file with a size
beyond the current EOF, only in that case, you didn't need to write a byte to
the end to get the size to change; there's simply no block allocated to the end
of the file.
Now, what happens if you do a posix_fallocate("foo", 0, 4GiB)? The file system
will have to allocate all of the necessary space, FILL IT WITH ZEROS, and fill
in the direct and indirect block pointers in the inode.
Now, a file system is allowed to make an optimization, here. The
posix_fallocate() specification does state that if offset+len is beyond the
current file size, that the file size will be updated, i.e. it behaves like
ftruncate() in that regard. However, the file system is allowed to NOT zero
out the space SO LONG AS it knows that the space is uninitialized and thus
return zero-filled pages when the space is read. This allows the file system
to avoid redundantly filling the space with zeros only to have those zeros
overwritten with actual data later. This is good for performance AND for
reducing PE cycles on flash storage. This would require an additional size
field in the inode to indicate the end if the initialized space (this
information would have to persist across unmounts, and essentially represents
an incompatible format change in the case of FFS since software that does not
understand this extra field could not safely mount the file system).
Technically, a file system is allowed to make that optimization for the
"allocate to fill in a sparse hole" case as well, but it would require a bunch
of extra metadata to track the valid ranges of the file, and so probably isn't
worth it.
-- thorpej