Hi,
On Thu, 26 Feb 2015 19:54:48 +, Benixon Dhas wrote:
Hi All,
We are trying to make Nilfs work with a SMR Device which adheres to
Zoned ATA Commands(ZAC) Specification. One of the restrictions in
the specification is reading an unwritten part of the Zone(Segment
in Nilfs) will cause a read error.
We observe that Nilfs does not write a complete physical segment(we
use 256MB segment) always. After digging in the source a while we
figured that this is due to the fact that Nilfs requires a certain
number of minimum blocks for constructing a partial segment
(NILFS_PSEG_MIN_BLOCKS), which currently is 2. So we see some
segments where the last block (in our case a block is 4k) is not
being written to.
For recovery and GC, NILFS needs to insert one or more header blocks
before writing payload blocks. Inevitably, the minimum size of a
partial segment becomes 2.
When some utilities like garbage collector and dump segment reads
(May not be an exhaustive list) a segment it tries to read the
entire physical segment. This causes read errors in the kernel and
hence retries for the last unwritten block in certain segments.
The recovery function of NILFS also needs to read entire physical
segment. It never reads unwritten blocks if the file system was
cleanly unmounted, however, this is not the case for unclean shutdown
or panic.
Worse yet, if it gets an EIO from the underlying block layer, the
recovery will fail and the mount system call will abort.
In an attempt to solve this problem we were trying to figure out if
we can write some dummy data to the remaining unutilized blocks in
the segment. But we are not sure what would be the best way to do
this.
Another solution we had in mind was to figure out all places where
segments are read, and modify it to prevent it from reading
unwritten blocks. But we feel this might be more complex solution
and might impact performance more.
Looks like sufile is available for this purpose. It is maintaining
how many blocks are written for each segment. You can see it in the
NBLOCKS field of the output of lssu command.
One restriction is that this metadata file (sufile) is unavailable
until mount system call succeeds. The recovery code cannot use it.
Please advise us on the best way to solve the problem. Also what
would be architecturally a best place to fix the problem.
Writing dummy data to the dead space for SMR devices looks better to
me because it's simpler and the performance penalty seems not so high.
But,
What will happen if an unexpected power failure hits the device ?
Does that cause the file system to read unwritten blocks ?
If so, it seems that we need translation layer to hide these issues,
or a new error code or a new mechanism to make it possible for file
systems to know/handle them.
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html