Re: Making Nilfs ZAC Compliant

2015-02-27 Thread Ryusuke Konishi
Hi,
On Thu, 26 Feb 2015 19:54:48 +, Benixon Dhas wrote:
 Hi All,
 
 We are trying to make Nilfs work with a SMR Device which adheres to
 Zoned ATA Commands(ZAC) Specification.  One of the restrictions in
 the specification is reading an unwritten part of the Zone(Segment
 in Nilfs) will cause a read error.
 
 We observe that Nilfs does not write a complete physical segment(we
 use 256MB segment) always. After digging in the source a while we
 figured that this is due to the fact that Nilfs requires a certain
 number of minimum blocks for constructing a partial segment
 (NILFS_PSEG_MIN_BLOCKS), which currently is 2.  So we see some
 segments where the last block (in our case a block is 4k) is not
 being written to.

For recovery and GC, NILFS needs to insert one or more header blocks
before writing payload blocks.  Inevitably, the minimum size of a
partial segment becomes 2.

 When some utilities like garbage collector and dump segment reads
 (May not be an exhaustive list) a segment it tries to read the
 entire physical segment. This causes read errors in the kernel and
 hence retries for the last unwritten block in certain segments.

The recovery function of NILFS also needs to read entire physical
segment.  It never reads unwritten blocks if the file system was
cleanly unmounted, however, this is not the case for unclean shutdown
or panic.

Worse yet, if it gets an EIO from the underlying block layer, the
recovery will fail and the mount system call will abort.

 In an attempt to solve this problem we were trying to figure out if
 we can write some dummy data to the remaining unutilized blocks in
 the segment. But we are not sure what would be the best way to do
 this.
 
 Another solution we had in mind was to figure out all places where
 segments are read, and modify it to prevent it from reading
 unwritten blocks. But we feel this might be more complex solution
 and might impact performance more.

Looks like sufile is available for this purpose.  It is maintaining
how many blocks are written for each segment.  You can see it in the
NBLOCKS field of the output of lssu command.

One restriction is that this metadata file (sufile) is unavailable
until mount system call succeeds.  The recovery code cannot use it.

 Please advise us on the best way to solve the problem. Also what
 would be architecturally a best place to fix the problem.

Writing dummy data to the dead space for SMR devices looks better to
me because it's simpler and the performance penalty seems not so high.

But,
What will happen if an unexpected power failure hits the device ?
Does that cause the file system to read unwritten blocks ?

If so, it seems that we need translation layer to hide these issues,
or a new error code or a new mechanism to make it possible for file
systems to know/handle them.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Making Nilfs ZAC Compliant

2015-02-26 Thread Benixon Dhas
Hi All,

We are trying to make Nilfs work with a SMR Device which adheres to Zoned ATA 
Commands(ZAC) Specification.
One of the restrictions in the specification is reading an unwritten part of 
the Zone(Segment in Nilfs) will cause a read error.

We observe that Nilfs does not write a complete physical segment(we use 256MB 
segment)  always. After digging in the source a while we figured that this
is due to the fact that Nilfs requires a certain number of minimum blocks for 
constructing a partial segment (NILFS_PSEG_MIN_BLOCKS), which currently is 2.
So we see some segments where the last block (in our case a block is 4k)  is 
not being written to.

When some utilities like garbage collector and dump segment reads (May not be 
an exhaustive list)  a segment it tries to read the entire physical segment. 
This causes read errors in the kernel and hence retries for the last unwritten 
block in certain segments.
In an attempt to solve this problem we were trying to figure out if we can 
write some dummy data to the remaining unutilized blocks in the segment. But we 
are not sure what would be the best way to do this.

Another solution we had in mind was to figure out all places where segments are 
read, and modify it to prevent it from reading unwritten blocks. But we feel 
this might be more complex solution and might impact performance more.

Please advise us on the best way to solve the problem. Also what would be 
architecturally a best place to fix the problem.

Thanks,
Benixon
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html