Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Chao Yu
On 2019/8/19 21:33, Chandan Rajendra wrote:
> On Sunday, August 18, 2019 7:15:42 PM IST Chao Yu wrote:
>> Hi Chandan,
>>
>> On 2019-8-16 14:18, Chandan Rajendra wrote:
>>> F2FS has a copy of "post read processing" code using which encrypted
>>> file data is decrypted. This commit replaces it to make use of the
>>> generic read_callbacks facility.
>>
>> I remember that previously Jaegeuk had mentioned f2fs will support 
>> compression
>> later, and it needs to reuse 'post read processing' fwk.
>>
>> There is very initial version of compression feature in below link:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/log/?h=compression
>>
>> So my concern is how can we uplift the most common parts of this fwk into 
>> vfs,
>> and meanwhile keeping the ability and flexibility when introducing private
>> feature/step in specified filesytem(now f2fs)?
>>
>> According to current f2fs compression's requirement, maybe we can expand to
>>
>> - support callback to let filesystem set the function for the flow in
>> decompression/verity/decryption step.
>> - support to use individual/common workqueue according the parameter.
>>
>> Any thoughts?
>>
> 
> Hi,
> 
> F2FS can be made to use fscrypt's queue for decryption and hence can reuse
> "read callbacks" code for decrypting data.
> 
> For decompression, we could have a STEP_MISC where we invoke a FS provided
> callback function for FS specific post read processing? 
> 
> Something like the following can be implemented in read_callbacks(),
> case STEP_MISC:
> if (ctx->enabled_steps & (1 << STEP_MISC)) {
> /*
>   ctx->fs_misc() must process bio in a workqueue
>   and later invoke read_callbacks() with
>   bio->bi_private's value as an argument.
> */
> ctx->fs_misc(ctx->bio);
> return;
> }
> ctx->cur_step++;
> 
> The fs_misc() callback can be passed in by the filesystem when invoking
> read_callbacks_setup_bio().

Hi,

Yes, something like this, can we just use STEP_DECOMPRESS and fs_decompress(),
not sure, I doubt this interface may has potential user which has compression
feature.

One more concern is that to avoid more context switch, maybe we can merge all
background works into one workqueue if there is no conflict when call wants to.

static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
 {
switch (++ctx->cur_step) {
case STEP_DECRYPT:
if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
...
if (ctx->separated_wq)
return;
}
ctx->cur_step++;
/* fall-through */
case STEP_DECOMPRESS:
...
default:
__read_end_io(ctx->bio);

> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix to avoid data corruption by forbidding SSR overwrite

2019-08-20 Thread Chao Yu
On 2019/8/20 9:00, Jaegeuk Kim wrote:
> On 08/19, Chao Yu wrote:
>> On 2019/8/16 11:03, Chao Yu wrote:
>>> There is one case can cause data corruption.
>>>
>>> - write 4k to fileA
>>> - fsync fileA, 4k data is writebacked to lbaA
>>> - write 4k to fileA
>>> - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
>>> - write 4k to fileB
>>> - kworker flush 4k to lbaA due to SSR
>>> - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
>>> data
>>>
>>> One solution is tracking all fsynced file's block history, and disallow
>>> SSR overwrite on newly invalidated block on that file.
>>>
>>> However, during recovery, no matter the dnode is flushed or fsynced, all
>>> previous dnodes until last fsynced one in node chain can be recovered,
>>> that means we need to record all block change in flushed dnode, which
>>> will cause heavy cost, so let's just use simple fix by forbidding SSR
>>> overwrite directly.
>>>
>>
>> Jaegeuk,
>>
>> Please help to add below missed tag to keep this patch being merged in stable
>> kernel.
>>
>> Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
> 
> Done.

Thanks! :)

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] fsck.f2fs: check only max extra_isize

2019-08-20 Thread Chao Yu
On 2019/8/20 9:16, Jaegeuk Kim wrote:
> On 08/18, Chao Yu wrote:
>> On 2019-8-17 9:03, Jaegeuk Kim wrote:
>>> If we use later kernel having larger extra_isize, old fsck will delete
>>> entire old files.
>>
>> Would it be better to construct the length based on existed features?
> 
> We can't judge the size for future fields.

Alright, it's okay to just check it with the maximum value, though I think it
needs to consider inline xattr space as well.

Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [PATCH v2] f2fs: allocate memory in batch in build_sit_info()

2019-08-20 Thread Chao Yu
On 2019/8/20 4:20, Jaegeuk Kim wrote:
> On 07/26, Chao Yu wrote:
>> build_sit_info() allocate all bitmaps for each segment one by one,
>> it's quite low efficiency, this pach changes to allocate large
>> continuous memory at a time, and divide it and assign for each bitmaps
>> of segment. For large size image, it can expect improving its mount
>> speed.
> 
> Hmm, I hit a kernel panic when mounting a partition during fault injection 
> test:
> 
> 726 #ifdef CONFIG_F2FS_CHECK_FS
> 727 if (f2fs_test_bit(offset, sit_i->sit_bitmap) !=
> 728 f2fs_test_bit(offset, sit_i->sit_bitmap_mir))
> 729 f2fs_bug_on(sbi, 1);
> 730 #endif

We didn't change anything about sit_i->sit_bitmap{_mir,}, it's so wired we panic
here... :(

I double check the change, but find nothing suspicious, btw, my fault injection
testcase shows normal.

Jaegeuk, do you have any idea about this issue?

Thanks,

> 
> For your information, I'm testing without this patch.
> 
> Thanks,
> 
>>
>> Signed-off-by: Chen Gong 
>> Signed-off-by: Chao Yu 
>> ---
>> v2:
>> - fix warning triggered in kmalloc() if requested memory size exceeds 4MB.
>>  fs/f2fs/segment.c | 51 +--
>>  fs/f2fs/segment.h |  1 +
>>  2 files changed, 24 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index a661ac32e829..d720eacd9c57 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -3941,7 +3941,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>>  struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
>>  struct sit_info *sit_i;
>>  unsigned int sit_segs, start;
>> -char *src_bitmap;
>> +char *src_bitmap, *bitmap;
>>  unsigned int bitmap_size;
>>  
>>  /* allocate memory for SIT information */
>> @@ -3964,27 +3964,31 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>>  if (!sit_i->dirty_sentries_bitmap)
>>  return -ENOMEM;
>>  
>> +#ifdef CONFIG_F2FS_CHECK_FS
>> +bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4;
>> +#else
>> +bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3;
>> +#endif
>> +sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
>> +if (!sit_i->bitmap)
>> +return -ENOMEM;
>> +
>> +bitmap = sit_i->bitmap;
>> +
>>  for (start = 0; start < MAIN_SEGS(sbi); start++) {
>> -sit_i->sentries[start].cur_valid_map
>> -= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>> -sit_i->sentries[start].ckpt_valid_map
>> -= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>> -if (!sit_i->sentries[start].cur_valid_map ||
>> -!sit_i->sentries[start].ckpt_valid_map)
>> -return -ENOMEM;
>> +sit_i->sentries[start].cur_valid_map = bitmap;
>> +bitmap += SIT_VBLOCK_MAP_SIZE;
>> +
>> +sit_i->sentries[start].ckpt_valid_map = bitmap;
>> +bitmap += SIT_VBLOCK_MAP_SIZE;
>>  
>>  #ifdef CONFIG_F2FS_CHECK_FS
>> -sit_i->sentries[start].cur_valid_map_mir
>> -= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>> -if (!sit_i->sentries[start].cur_valid_map_mir)
>> -return -ENOMEM;
>> +sit_i->sentries[start].cur_valid_map_mir = bitmap;
>> +bitmap += SIT_VBLOCK_MAP_SIZE;
>>  #endif
>>  
>> -sit_i->sentries[start].discard_map
>> -= f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE,
>> -GFP_KERNEL);
>> -if (!sit_i->sentries[start].discard_map)
>> -return -ENOMEM;
>> +sit_i->sentries[start].discard_map = bitmap;
>> +bitmap += SIT_VBLOCK_MAP_SIZE;
>>  }
>>  
>>  sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>> @@ -4492,21 +4496,12 @@ static void destroy_free_segmap(struct f2fs_sb_info 
>> *sbi)
>>  static void destroy_sit_info(struct f2fs_sb_info *sbi)
>>  {
>>  struct sit_info *sit_i = SIT_I(sbi);
>> -unsigned int start;
>>  
>>  if (!sit_i)
>>  return;
>>  
>> -if (sit_i->sentries) {
>> -for (start = 0; start < MAIN_SEGS(sbi); start++) {
>> -kvfree(sit_i->sentries[start].cur_valid_map);
>> -#ifdef CONFIG_F2FS_CHECK_FS
>> -kvfree(sit_i->sentries[start].cur_valid_map_mir);
>> -#endif
>> -kvfree(sit_i->sentries[start].ckpt_valid_map);
>> -kvfree(sit_i->sentries[start].discard_map);
>> -}
>> -}
>> +if (sit_i->sentries)
>> +kvfree(sit_i->bitmap);
>>  kvfree(sit_i->tmp_map);
>>  
>>  kvfree(sit_i->sentries);
>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>> index b74602813a05..ec4d568fd58c 100644
>> --- a/fs/f2fs/segment.h
>> +++ b/fs/f2fs/segment.h
>> @@ -226,6 +226,7 @@ 

Re: [f2fs-dev] [PATCH v8 00/20] vfs: Add support for timestamp limits

2019-08-20 Thread Jeff Layton
On Sun, 2019-08-18 at 09:57 -0700, Deepa Dinamani wrote:
> The series is an update and a more complete version of the
> previously posted series at
> https://lore.kernel.org/linux-fsdevel/20180122020426.2988-1-deepa.ker...@gmail.com/
> 
> Thanks to Arnd Bergmann for doing a few preliminary reviews.
> They helped me fix a few issues I had overlooked.
> 
> The limits (sometimes granularity also) for the filesystems updated here are 
> according to the
> following table:
> 
> File system   Time type  Start year Expiration year 
> Granularity
> cramfsfixed  0  0
> romfs fixed  0  0
> pstoreascii seconds (27 digit ascii) S64_MINS64_MAX 1
> coda  INT64  S64_MINS64_MAX 1
> omfs  64-bit milliseconds0  U64_MAX/ 1000   
> NSEC_PER_MSEC
> befs  unsigned 48-bit seconds0  0x  
> alloc_super
> bfs   unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> efs   unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> ext2  signed 32-bit seconds  S32_MINS32_MAX 
> alloc_super
> ext3  signed 32-bit seconds  S32_MINS32_MAX 
> alloc_super
> ext4 (old)signed 32-bit seconds  S32_MINS32_MAX 
> alloc_super
> ext4 (extra)  34-bit seconds, 30-bit ns  S32_MIN0x37fff   1
> freevxfs  u32 secs/usecs 0  U32_MAX 
> alloc_super
> jffs2 unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> jfs   unsigned 32-bit seconds/ns 0  U32_MAX 1
> minix unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> orangefs  u64 seconds0  U64_MAX 
> alloc_super
> qnx4  unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> qnx6  unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> reiserfs  unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> squashfs  unsigned 32-bit seconds0  U32_MAX 
> alloc_super
> ufs1  signed 32-bit seconds  S32_MINS32_MAX 
> NSEC_PER_SEC
> ufs2  signed 64-bit seconds/u32 ns   S64_MINS64_MAX 1
> xfs   signed 32-bit seconds/ns   S32_MINS32_MAX 1
> ceph  unsigned 32-bit second/ns  0  U32_MAX 1000

Looks reasonable, overall.

Note that the granularity changed recently for cephfs. See commit
0f7cf80ae96c2a (ceph: initialize superblock s_time_gran to 1).

In any case, you can add my Acked-by

-- 
Jeff Layton 



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/4] fsck.f2fs: fix to check c.fix_on before repair

2019-08-20 Thread Chao Yu
Jaegeuk,

Could you merge [2,3,4/4] patch as well, I just hit one case with por_fsstress.

[ASSERT] (fsck_chk_inode_blk: 803)  --> [0x1ec84] wrong inline size:149820

Thanks,

On 2019/8/16 9:20, Chao Yu wrote:
> On 2019/8/16 9:06, Jaegeuk Kim wrote:
>> On 08/12, Chao Yu wrote:
>>> We should always set c.bug_on whenever found a bug, then fix them
>>> if c.fix_on is on, otherwise, some bugs won't be shown unless we
>>> enable debug log.
>>>
>>> Signed-off-by: Chao Yu 
>>> ---
>>>  fsck/fsck.c | 137 +++-
>>>  1 file changed, 83 insertions(+), 54 deletions(-)
>>>
>>> diff --git a/fsck/fsck.c b/fsck/fsck.c
>>> index 7eb599d..91ddd49 100644
>>> --- a/fsck/fsck.c
>>> +++ b/fsck/fsck.c
>>> @@ -712,12 +712,13 @@ void fsck_chk_inode_blk(struct f2fs_sb_info *sbi, u32 
>>> nid,
>>> fsck_reada_node_block(sbi, le32_to_cpu(node_blk->i.i_xattr_nid));
>>>  
>>> if (fsck_chk_xattr_blk(sbi, nid,
>>> -   le32_to_cpu(node_blk->i.i_xattr_nid), blk_cnt) &&
>>> -   c.fix_on) {
>>> -   node_blk->i.i_xattr_nid = 0;
>>> -   need_fix = 1;
>>> -   FIX_MSG("Remove xattr block: 0x%x, x_nid = 0x%x",
>>> -   nid, le32_to_cpu(node_blk->i.i_xattr_nid));
>>> +   le32_to_cpu(node_blk->i.i_xattr_nid), blk_cnt)) {
>>> +   if (c.fix_on) {
>>> +   node_blk->i.i_xattr_nid = 0;
>>> +   need_fix = 1;
>>> +   FIX_MSG("Remove xattr block: 0x%x, x_nid = 0x%x",
>>> +   nid, 
>>> le32_to_cpu(node_blk->i.i_xattr_nid));
>>> +   }
>>
>> Why do we need this change?
> 
> Actually, it's not necessary, but just cleanup to keep the same style. :P
> 
> Thanks,
> 
>>
>>> }
>>>  
>>> if (ftype == F2FS_FT_CHRDEV || ftype == F2FS_FT_BLKDEV ||
>>> @@ -730,24 +731,32 @@ void fsck_chk_inode_blk(struct f2fs_sb_info *sbi, u32 
>>> nid,
>>>  
>>> if (f2fs_has_extra_isize(_blk->i)) {
>>> if (c.feature & cpu_to_le32(F2FS_FEATURE_EXTRA_ATTR)) {
>>> -   if (node_blk->i.i_extra_isize >
>>> -   cpu_to_le16(F2FS_TOTAL_EXTRA_ATTR_SIZE)) {
>>> -   FIX_MSG("ino[0x%x] recover i_extra_isize "
>>> -   "from %u to %u",
>>> -   nid,
>>> -   le16_to_cpu(node_blk->i.i_extra_isize),
>>> -   calc_extra_isize());
>>> -   node_blk->i.i_extra_isize =
>>> -   cpu_to_le16(calc_extra_isize());
>>> -   need_fix = 1;
>>> +   unsigned int isize =
>>> +   le16_to_cpu(node_blk->i.i_extra_isize);
>>> +
>>> +   if (isize > F2FS_TOTAL_EXTRA_ATTR_SIZE) {
>>> +   ASSERT_MSG("[0x%x] wrong i_extra_isize=0x%x",
>>> +   nid, isize);
>>> +   if (c.fix_on) {
>>> +   FIX_MSG("ino[0x%x] recover 
>>> i_extra_isize "
>>> +   "from %u to %u",
>>> +   nid, isize,
>>> +   calc_extra_isize());
>>> +   node_blk->i.i_extra_isize =
>>> +   cpu_to_le16(calc_extra_isize());
>>> +   need_fix = 1;
>>> +   }
>>> }
>>> } else {
>>> -   FIX_MSG("ino[0x%x] remove F2FS_EXTRA_ATTR "
>>> -   "flag in i_inline:%u",
>>> -   nid, node_blk->i.i_inline);
>>> -   /* we don't support tuning F2FS_FEATURE_EXTRA_ATTR now 
>>> */
>>> -   node_blk->i.i_inline &= ~F2FS_EXTRA_ATTR;
>>> -   need_fix = 1;
>>> +   ASSERT_MSG("[0x%x] wrong extra_attr flag", nid);
>>> +   if (c.fix_on) {
>>> +   FIX_MSG("ino[0x%x] remove F2FS_EXTRA_ATTR "
>>> +   "flag in i_inline:%u",
>>> +   nid, node_blk->i.i_inline);
>>> +   /* we don't support tuning 
>>> F2FS_FEATURE_EXTRA_ATTR now */
>>> +   node_blk->i.i_inline &= ~F2FS_EXTRA_ATTR;
>>> +   need_fix = 1;
>>> +   }
>>> }
>>>  
>>> if ((c.feature &
>>> @@ -758,13 +767,17 @@ void fsck_chk_inode_blk(struct f2fs_sb_info *sbi, u32 
>>> nid,
>>>  
>>> if (!inline_size ||
>>> inline_size > MAX_INLINE_XATTR_SIZE) {
>>> -   FIX_MSG("ino[0x%x] recover inline xattr size "
>>> -   "from %u to %u",

Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Chandan Rajendra
On Tuesday, August 20, 2019 11:01 PM Jaegeuk Kim wrote:
> Hi Chandan,
> 
> On 08/20, Theodore Y. Ts'o wrote:
> > On Tue, Aug 20, 2019 at 10:35:29AM +0530, Chandan Rajendra wrote:
> > > Looks like F2FS requires a lot more flexiblity than what can be offered by
> > > read callbacks i.e.
> > > 
> > > 1. F2FS wants to make use of its own workqueue for decryption, verity and
> > >decompression.
> > > 2. F2FS' decompression code is not an FS independent entity like fscrypt 
> > > and
> > >fsverity. Hence they would need Filesystem specific callback functions 
> > > to
> > >be invoked from "read callbacks". 
> > > 
> > > Hence I would suggest that we should drop F2FS changes made in this
> > > patchset. Please let me know your thoughts on this.
> > 
> > That's probably the best way to go for now.  My one concern is that it
> > means that only ext4 will be using your framework.  I could imagine
> > that some people might argue that should just move the callback scheme
> > into ext4 code as opposed to leaving it in fscrypt --- at least until
> > we can find other file systems where we can show that it will be
> > useful for those other file systems.
> 
> I also have to raise a flag on this. Doesn't this patch series try to get rid
> of redundant work? What'd be the rationale, if it only supports ext4?

This patchset gets encryption working with subpage blocksize by making
relevant changes in the generic code (i.e. do_mpage_readpage() and
block_read_full_page()) and removing duplicate code from ext4
(i.e. ext4_readpage() and friends). Without these changes the only way to get
subpage blocksize support was to add more duplicate code into Ext4 i.e. import
a copy of block_read_full_page() into Ext4 and make necessary edits to support
encryption.

So this patchset actually does help in removing exiting duplicate code in Ext4
and also prevents addition of more such code.

> 
> How about generalizing the framework to support generic_post_read and per-fs
> post_read for fscrypt/fsverity/... selectively?

Quoting what I had said earlier,
> > > 1. F2FS wants to make use of its own workqueue for decryption, verity and
> > >decompression.
> > > 2. F2FS' decompression code is not an FS independent entity like fscrypt 
> > > and
> > >fsverity. Hence they would need Filesystem specific callback functions 
> > > to
> > >be invoked from "read callbacks". 

I am not sure if read callbacks can be made flexible enough to support the
above use cases. fscrypt and fsverity already provide workqueues and any new
post processing code added should follow the same convention. I see
that F2FS use case is special since,
1. It uses its own workqueues.
2. Decompression code inside F2FS isn't written as an FS independent subsystem
   like how fscrypt and fsverity are implemented.

To summarize, I believe the users of read callbacks should follow the
conventions set by fscrypt/fsverity and new post processing code that needs to
be plugged into read callbacks should provide APIs similar to
fscrypt/fsverity. Otherwise the state machine logic implemented by read
callbacks will get complex/convoluted.

> 
> Thanks,
> 
> > 
> > (Perhaps a useful experiment would be to have someone implement patches
> > to support fscrypt and fsverity in ext2 --- the patch might or might
> > not be accepted for upstream inclusion, but it would be useful to
> > demonstrate how easy it is to add fscrypt and fsverity.)
> > 
> > The other thing to consider is that there has been some discussion
> > about adding generalized support for I/O submission to the iomap
> > library.  It might be that if that work is accepted, support for
> > fscrypt and fsverity would be a requirement for ext4 to use that
> > portion of iomap's functionality.  So in that eventuality, it might be
> > that we'll want to move your read callbacks code into iomap, or we'll
> > need to rework the read callbacks code so it can work with iomap.
> > 
> > But this is all work for the future.  I'm a firm believe that the
> > perfect should not be the enemy of the good, and that none of this
> > should be a fundamental obstacle in having your code upstream.
> > 
> > Cheers,
> > 
> > - Ted
> > 
> 

-- 
chandan





___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 0/4] fsck: Check write pointers of zoned block devices

2019-08-20 Thread Shinichiro Kawasaki
On Tue, 2019-08-20 at 10:57 -0700, Jaegeuk Kim wrote:
> Hi Shinichiro,
> 
> Thank you so much for the contribution.
> BTW, I failed to compile them. Could you please take a look at them one more
> time? :)

Thank you for the care and sorry about the compile failure. I checked compile
pass with dev-test branch but did not check with dev branch. I took a look again
and found that I used MAX_BLKADDR() macro added only in the dev-test branch,
which causes the compile failure with dev branch. 

I tried IS_VALID_BLK_ADDR() in place of MAX_BLKADDR() and confirmed it is
working as expected. Will post v2.

--
Best Regards,
Shin'ichiro Kawasaki

___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 3/4] fsck.f2fs: Check write pointer consistency with current segments

2019-08-20 Thread Shin'ichiro Kawasaki
On sudden f2fs shutdown, zoned block device status and f2fs current
segment positions in meta data can be inconsistent. When f2fs shutdown
happens before write operations completes, write pointers of zoned block
devices can go further but f2fs meta data keeps current segments at
positions before the write operations. After remounting the f2fs, the
inconsistency causes write operations not at write pointers and
"Unaligned write command" error is reported. This error was observed when
xfstests test case generic/388 was run with f2fs on a zoned block device.

To avoid the error, have f2fs.fsck check consistency between each current
segment's position and the write pointer of the zone the current segment
points to. If the write pointer goes advance from the current segment,
fix the current segment position setting at same as the write pointer
position. In case the write pointer is behind the current segment, write
zero data at the write pointer position to make write pointer position at
same as the current segment.

When inconsistencies are found, turn on c.bug_on flag in fsck_verify() to
ask users to fix them or not. When inconsistencies get fixed, turn on
'force' flag in fsck_verify() to enforce fixes in following checks. This
position fix is done at the beginning of do_fsck() function so that other
checks reflect the current segment modification.

Signed-off-by: Shin'ichiro Kawasaki 
---
 fsck/fsck.c | 133 
 fsck/fsck.h |   3 ++
 fsck/main.c |   2 +
 3 files changed, 138 insertions(+)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index 8953ca1..21a06ac 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -2574,6 +2574,125 @@ out:
return cnt;
 }
 
+struct write_pointer_check_data {
+   struct f2fs_sb_info *sbi;
+   struct device_info *dev;
+};
+
+#define SECTOR_SHIFT 9
+
+static int fsck_chk_write_pointer(int i, struct blk_zone *blkz, void *opaque)
+{
+   struct write_pointer_check_data *wpd = opaque;
+   struct f2fs_sb_info *sbi = wpd->sbi;
+   struct device_info *dev = wpd->dev;
+   struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
+   block_t zone_block, wp_block, wp_blkoff, cs_block, b;
+   unsigned int zone_segno, wp_segno;
+   struct seg_entry *se;
+   struct curseg_info *cs;
+   int cs_index, ret;
+   int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+   unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
+   void *zero_blk;
+
+   if (blk_zone_conv(blkz))
+   return 0;
+
+   zone_block = dev->start_blkaddr
+   + (blk_zone_sector(blkz) >> log_sectors_per_block);
+   zone_segno = GET_SEGNO(sbi, zone_block);
+   wp_block = dev->start_blkaddr
+   + (blk_zone_wp_sector(blkz) >> log_sectors_per_block);
+   wp_segno = GET_SEGNO(sbi, wp_block);
+   wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
+
+   /* find the curseg which points to the zone */
+   for (cs_index = 0; cs_index < NO_CHECK_TYPE; cs_index++) {
+   cs = _I(sbi)->curseg_array[cs_index];
+   if (zone_segno <= cs->segno &&
+   cs->segno < zone_segno + segs_per_zone)
+   break;
+   }
+
+   if (cs_index >= NR_CURSEG_TYPE)
+   return 0;
+
+   /* check write pointer consistency with the curseg in the zone */
+   cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
+   if (wp_block == cs_block)
+   return 0;
+
+   if (!c.fix_on) {
+   MSG(0, "Inconsistent write pointer: "
+   "curseg %d[0x%x,0x%x] wp[0x%x,0x%x]\n",
+   cs_index, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
+   fsck->chk.wp_inconsistent_zones++;
+   return 0;
+   }
+
+   /*
+* If the curseg is in advance from the write pointer, write zero to
+* move the write pointer forward to the same position as the curseg.
+*/
+   if (wp_block < cs_block) {
+   ret = 0;
+   zero_blk = calloc(BLOCK_SZ, 1);
+   if (!zero_blk)
+   return -EINVAL;
+
+   FIX_MSG("Advance write pointer to match with curseg %d: "
+   "[0x%x,0x%x]->[0x%x,0x%x]",
+   cs_index, wp_segno, wp_blkoff,
+   cs->segno, cs->next_blkoff);
+   for (b = wp_block; b < cs_block && !ret; b++)
+   ret = dev_write_block(zero_blk, b);
+
+   fsck->chk.wp_fixed_zones++;
+   free(zero_blk);
+   return ret;
+   }
+
+   /*
+* If the write pointer is in advance from the curseg, modify the
+* curseg position to be same as the write pointer.
+*/
+   FIX_MSG("Advance curseg %d: [0x%x,0x%x]->[0x%x,0x%x]",
+   cs_index, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff);
+   se = get_seg_entry(sbi, 

[f2fs-dev] [PATCH v2 1/4] libf2fs_zoned: Introduce f2fs_report_zones() helper function

2019-08-20 Thread Shin'ichiro Kawasaki
To prepare for write pointer consistency check by fsck, add
f2fs_report_zones() helper function which calls REPORT ZONE command to
get write pointer status. The function is added to lib/libf2fs_zoned
which gathers zoned block device related functions.

To check write pointer consistency with f2fs meta data, fsck needs to
refer both of reported zone information and f2fs super block structure
"f2fs_sb_info". However, libf2fs_zoned does not import f2fs_sb_info. To
keep f2fs_sb_info structure out of libf2fs_zoned, provide a callback
function in fsck to f2fs_report_zones() and call it for each zone.

Signed-off-by: Shin'ichiro Kawasaki 
---
 include/f2fs_fs.h   |  2 ++
 lib/libf2fs_zoned.c | 57 +
 2 files changed, 59 insertions(+)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 0d9a036..abadd1b 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -1279,6 +1279,8 @@ blk_zone_cond_str(struct blk_zone *blkz)
 
 extern int f2fs_get_zoned_model(int);
 extern int f2fs_get_zone_blocks(int);
+typedef int (report_zones_cb_t)(int i, struct blk_zone *blkz, void *opaque);
+extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
 extern int f2fs_check_zones(int);
 extern int f2fs_reset_zones(int);
 
diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
index af00b44..fc4974f 100644
--- a/lib/libf2fs_zoned.c
+++ b/lib/libf2fs_zoned.c
@@ -193,6 +193,57 @@ int f2fs_get_zone_blocks(int i)
 
 #define F2FS_REPORT_ZONES_BUFSZ524288
 
+int f2fs_report_zones(int j, report_zones_cb_t *report_zones_cb, void *opaque)
+{
+   struct device_info *dev = c.devices + j;
+   struct blk_zone_report *rep;
+   struct blk_zone *blkz;
+   unsigned int i, n = 0;
+   u_int64_t total_sectors = (dev->total_sectors * c.sector_size) >> 9;
+   u_int64_t sector = 0;
+   int ret = -1;
+
+   rep = malloc(F2FS_REPORT_ZONES_BUFSZ);
+   if (!rep) {
+   ERR_MSG("No memory for report zones\n");
+   return -ENOMEM;
+   }
+
+   while (sector < total_sectors) {
+
+   /* Get zone info */
+   rep->sector = sector;
+   rep->nr_zones = (F2FS_REPORT_ZONES_BUFSZ - sizeof(struct 
blk_zone_report))
+   / sizeof(struct blk_zone);
+
+   ret = ioctl(dev->fd, BLKREPORTZONE, rep);
+   if (ret != 0) {
+   ret = -errno;
+   ERR_MSG("ioctl BLKREPORTZONE failed\n");
+   goto out;
+   }
+
+   if (!rep->nr_zones) {
+   ret = -EIO;
+   ERR_MSG("Unexpected ioctl BLKREPORTZONE result\n");
+   goto out;
+   }
+
+   blkz = (struct blk_zone *)(rep + 1);
+   for (i = 0; i < rep->nr_zones && sector < total_sectors; i++) {
+   ret = report_zones_cb(n, blkz, opaque);
+   if (ret)
+   goto out;
+   sector = blk_zone_sector(blkz) + blk_zone_length(blkz);
+   n++;
+   blkz++;
+   }
+   }
+out:
+   free(rep);
+   return ret;
+}
+
 int f2fs_check_zones(int j)
 {
struct device_info *dev = c.devices + j;
@@ -372,6 +423,12 @@ out:
 
 #else
 
+int f2fs_report_zones(int j, report_zones_cb_t *report_zones_cb, void *opaque)
+{
+   ERR_MSG("%d: Zoned block devices are not supported\n", i);
+   return -1;
+}
+
 int f2fs_get_zoned_model(int i)
 {
struct device_info *dev = c.devices + i;
-- 
2.21.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 0/4] fsck: Check write pointers of zoned block devices

2019-08-20 Thread Shin'ichiro Kawasaki
On sudden f2fs shutdown, zoned block device status and f2fs meta data can be
inconsistent. When f2fs shutdown happens during write operations, write pointers
on the device go forward but the f2fs meta data does not reflect write pointer
progress. This inconsistency will eventually cause "Unaligned write command"
error when restarting write operation after the next mount. This error can be
observed with xfstests test case generic/388, which enforces sudden shutdown
during write operation and checks the file system recovery. Once the error
happens because of the inconsistency, the file system requires fix. However,
fsck.f2fs does not have a feature to check and fix it.

This patch series adds a new feature to fsck.f2fs to check and fix the
inconsistency. First and second patches add two functions which helps fsck to
call report zone and reset zone commands to zoned block devices. Third patch
checks write pointers of zones that current segments recorded in meta data point
to. This covers the failure symptom observed with xfstests. The last patch
checks write pointers of zones that current segments do not point to, which
covers a potential failure scenario.

This patch series depends on other patches for zoned block devices, then it
conflicts with the master branch in f2fs-tools.git as of Aug/19/2019. It can be
applied without conflict to dev and dev-test branch tips.

Changes from v1:
* Fixed build failure on dev branch

Shin'ichiro Kawasaki (4):
  libf2fs_zoned: Introduce f2fs_report_zones() helper function
  libf2fs_zoned: Introduce f2fs_reset_zone() function
  fsck.f2fs: Check write pointer consistency with current segments
  fsck.f2fs: Check write pointer consistency with valid blocks count

 fsck/fsck.c | 161 
 fsck/fsck.h |   3 +
 fsck/main.c |   2 +
 include/f2fs_fs.h   |   3 +
 lib/libf2fs_zoned.c |  81 ++
 5 files changed, 250 insertions(+)

-- 
2.21.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 4/4] fsck.f2fs: Check write pointer consistency with valid blocks count

2019-08-20 Thread Shin'ichiro Kawasaki
When sudden f2fs shutdown happens on zoned block devices, write
pointers can be inconsistent with valid blocks counts in meta data.
The failure scenario is as follows:

- Just before a sudden shutdown, a new segment in a new zone is selected
  for a current segment. Write commands were executed to the segment.
  and the zone has a write pointer not at zone start.
- Before the write commands complete, shutdown happens. Meta data is
  not updated and still keeps zero valid blocks count for the zone.
- After next mount of the file system, the zone is selected for the next
  write target because it has zero valid blocks count. However, it has
  the write pointer not at zone start. Then "Unaligned write command"
  error happens.

To avoid this potential error path, reset write pointers if the zone
does not have a current segment, the write pointer is not at the zone
start and the zone has no valid blocks.

Signed-off-by: Shin'ichiro Kawasaki 
---
 fsck/fsck.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index 21a06ac..cc9bbc0 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -2595,6 +2595,7 @@ static int fsck_chk_write_pointer(int i, struct blk_zone 
*blkz, void *opaque)
int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
void *zero_blk;
+   block_t zone_valid_blocks = 0;
 
if (blk_zone_conv(blkz))
return 0;
@@ -2615,8 +2616,35 @@ static int fsck_chk_write_pointer(int i, struct blk_zone 
*blkz, void *opaque)
break;
}
 
-   if (cs_index >= NR_CURSEG_TYPE)
+   if (cs_index >= NR_CURSEG_TYPE) {
+   for (b = zone_block; b < zone_block + c.zone_blocks &&
+IS_VALID_BLK_ADDR(sbi, b); b += c.blks_per_seg) {
+   se = get_seg_entry(sbi, GET_SEGNO(sbi, b));
+   zone_valid_blocks += se->valid_blocks;
+   }
+   if (wp_block == zone_block || zone_valid_blocks)
+   return 0;
+
+   /*
+* The write pointer is not at zone start but there is no valid
+* block in the zone. Segments in the zone can be selected for
+* next write. Need to reset the write pointer to avoid
+* unaligned write command error.
+*/
+   if (c.fix_on) {
+   FIX_MSG("Reset write pointer at segment 0x%x",
+   zone_segno);
+   ret = f2fs_reset_zone(dev, blkz);
+   if (ret)
+   return ret;
+   fsck->chk.wp_fixed_zones++;
+   } else {
+   MSG(0, "Inconsistent write pointer at segment 0x%x\n",
+   zone_segno);
+   fsck->chk.wp_inconsistent_zones++;
+   }
return 0;
+   }
 
/* check write pointer consistency with the curseg in the zone */
cs_block = START_BLOCK(sbi, cs->segno) + cs->next_blkoff;
-- 
2.21.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 2/4] libf2fs_zoned: Introduce f2fs_reset_zone() function

2019-08-20 Thread Shin'ichiro Kawasaki
To allow fsck to reset a zone with inconsistent write pointer, introduce
a helper function f2fs_reset_zone().

Signed-off-by: Shin'ichiro Kawasaki 
---
 include/f2fs_fs.h   |  1 +
 lib/libf2fs_zoned.c | 24 
 2 files changed, 25 insertions(+)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index abadd1b..34679d8 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -1282,6 +1282,7 @@ extern int f2fs_get_zone_blocks(int);
 typedef int (report_zones_cb_t)(int i, struct blk_zone *blkz, void *opaque);
 extern int f2fs_report_zones(int, report_zones_cb_t *, void *);
 extern int f2fs_check_zones(int);
+int f2fs_reset_zone(struct device_info *dev, struct blk_zone *blkz);
 extern int f2fs_reset_zones(int);
 
 #define SIZE_ALIGN(val, size)  ((val) + (size) - 1) / (size)
diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
index fc4974f..f56fa62 100644
--- a/lib/libf2fs_zoned.c
+++ b/lib/libf2fs_zoned.c
@@ -359,6 +359,24 @@ out:
return ret;
 }
 
+int f2fs_reset_zone(struct device_info *dev, struct blk_zone *blkz)
+{
+   struct blk_zone_range range;
+   int ret;
+
+   if (!blk_zone_seq(blkz) || blk_zone_empty(blkz))
+   return 0;
+
+   /* Non empty sequential zone: reset */
+   range.sector = blk_zone_sector(blkz);
+   range.nr_sectors = blk_zone_length(blkz);
+   ret = ioctl(dev->fd, BLKRESETZONE, );
+   if (ret != 0)
+   ERR_MSG("ioctl BLKRESETZONE failed\n");
+
+   return ret;
+}
+
 int f2fs_reset_zones(int j)
 {
struct device_info *dev = c.devices + j;
@@ -456,6 +474,12 @@ int f2fs_check_zones(int i)
return -1;
 }
 
+int f2fs_reset_zone(struct device_info *dev, struct blk_zone *blkz)
+{
+   ERR_MSG("%d: Zoned block devices are not supported\n", i);
+   return -1;
+}
+
 int f2fs_reset_zones(int i)
 {
ERR_MSG("%d: Zoned block devices are not supported\n", i);
-- 
2.21.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Jaegeuk Kim
Hi Chandan,

On 08/20, Theodore Y. Ts'o wrote:
> On Tue, Aug 20, 2019 at 10:35:29AM +0530, Chandan Rajendra wrote:
> > Looks like F2FS requires a lot more flexiblity than what can be offered by
> > read callbacks i.e.
> > 
> > 1. F2FS wants to make use of its own workqueue for decryption, verity and
> >decompression.
> > 2. F2FS' decompression code is not an FS independent entity like fscrypt and
> >fsverity. Hence they would need Filesystem specific callback functions to
> >be invoked from "read callbacks". 
> > 
> > Hence I would suggest that we should drop F2FS changes made in this
> > patchset. Please let me know your thoughts on this.
> 
> That's probably the best way to go for now.  My one concern is that it
> means that only ext4 will be using your framework.  I could imagine
> that some people might argue that should just move the callback scheme
> into ext4 code as opposed to leaving it in fscrypt --- at least until
> we can find other file systems where we can show that it will be
> useful for those other file systems.

I also have to raise a flag on this. Doesn't this patch series try to get rid
of redundant work? What'd be the rationale, if it only supports ext4?

How about generalizing the framework to support generic_post_read and per-fs
post_read for fscrypt/fsverity/... selectively?

Thanks,

> 
> (Perhaps a useful experiment would be to have someone implement patches
> to support fscrypt and fsverity in ext2 --- the patch might or might
> not be accepted for upstream inclusion, but it would be useful to
> demonstrate how easy it is to add fscrypt and fsverity.)
> 
> The other thing to consider is that there has been some discussion
> about adding generalized support for I/O submission to the iomap
> library.  It might be that if that work is accepted, support for
> fscrypt and fsverity would be a requirement for ext4 to use that
> portion of iomap's functionality.  So in that eventuality, it might be
> that we'll want to move your read callbacks code into iomap, or we'll
> need to rework the read callbacks code so it can work with iomap.
> 
> But this is all work for the future.  I'm a firm believe that the
> perfect should not be the enemy of the good, and that none of this
> should be a fundamental obstacle in having your code upstream.
> 
> Cheers,
> 
>   - Ted
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Gao Xiang via Linux-f2fs-devel
Hi Ted,

On Tue, Aug 20, 2019 at 12:25:10PM -0400, Theodore Y. Ts'o wrote:
> On Tue, Aug 20, 2019 at 01:12:36PM +0800, Gao Xiang wrote:
> > Add a word, I have some little concern about post read procession order
> > a bit as I mentioned before, because I'd like to move common EROFS
> > decompression code out in the future as well for other fses to use
> > after we think it's mature enough.
> > 
> > It seems the current code mainly addresses eliminating duplicated code,
> > therefore I have no idea about that...
> 
> Actually, we should chat.  I was actually thinking about "borrowing"
> code from erofs to provide ext4-specific compression.  I was really
> impressed with the efficiency goals in the erofs design[1] when I
> reviewed the Usenix ATC paper, and as the saying goes, the best
> artists know how to steal from the best.  :-)
> 
> [1] https://www.usenix.org/conference/atc19/presentation/gao

I also guessed it's you reviewed our work as well from some written words :)
(even though it's analymous...) and I personally think there are some
useful stuffs in our EROFS effort.

> 
> My original specific thinking was to do code reuse by copy and paste,
> simply because it was simpler, and I have limited time to work on it.
> But if you are interested in making the erofs pipeline reusable by
> other file systems, and have the time to do the necessary code
> refactoring, I'd love to work with you on that.

Yes, I have interest in making the erofs pipeline for generic fses.
Now I'm still investigating sequential read on very high speed NVME
(like SAMSUNG 970pro, one thread seq read >3GB/s), it seems it still
has some optimization space.

And then I will do that work for generic fses as well... (but the first
thing I want to do is getting erofs out of staging, as Greg said [1])

Metadata should be designed for each fs like ext4, but maybe not flexible
and compacted as EROFS, therefore it could be some extra metadata
overhead than EROFS.

[1] https://lore.kernel.org/lkml/20190618064523.ga6...@kroah.com/

> 
> It should be noted that the f2fs developers have been working on their
> own compression scheme that was going to be f2fs-specific, unlike the
> file system generic approach used with fsverity and fscrypt.
> 
> My expectation is that we will need to modify the read pipeling code
> to support compression.  That's true whether we are looking at the
> existing file system-specific code used by ext4 and f2fs or in some
> combined work such as what Chandan has proposed.

I think either form is fine with me. :) But it seems that is some minor
which tree we will work on (Maybe Chandan's work will be merged then).

The first thing I need to do is to tidy up the code, and making it more
general, and then it will be very easy for fses to integrate :)

Thanks,
Gao Xiang


> 
> Cheers,
> 
>   - Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 0/4] fsck: Check write pointers of zoned block devices

2019-08-20 Thread Jaegeuk Kim
Hi Shinichiro,

Thank you so much for the contribution.
BTW, I failed to compile them. Could you please take a look at them one more
time? :)

Thanks,

On 08/19, Shin'ichiro Kawasaki wrote:
> On sudden f2fs shutdown, zoned block device status and f2fs meta data can be
> inconsistent. When f2fs shutdown happens during write operations, write 
> pointers
> on the device go forward but the f2fs meta data does not reflect write pointer
> progress. This inconsistency will eventually cause "Unaligned write command"
> error when restarting write operation after the next mount. This error can be
> observed with xfstests test case generic/388, which enforces sudden shutdown
> during write operation and checks the file system recovery. Once the error
> happens because of the inconsistency, the file system requires fix. However,
> fsck.f2fs does not have a feature to check and fix it.
> 
> This patch series adds a new feature to fsck.f2fs to check and fix the
> inconsistency. First and second patches add two functions which helps fsck to
> call report zone and reset zone commands to zoned block devices. Third patch
> checks write pointers of zones that current segments recorded in meta data 
> point
> to. This covers the failure symptom observed with xfstests. The last patch
> checks write pointers of zones that current segments do not point to, which
> covers a potential failure scenario.
> 
> This patch series depends on other patches for zoned block devices, then it
> conflicts with the master branch in f2fs-tools.git as of Aug/19/2019. It can 
> be
> applied without conflict to dev and dev-test branch tips.
> 
> Shin'ichiro Kawasaki (4):
>   libf2fs_zoned: Introduce f2fs_report_zones() helper function
>   libf2fs_zoned: Introduce f2fs_reset_zone() function
>   fsck.f2fs: Check write pointer consistency with current segments
>   fsck.f2fs: Check write pointer consistency with valid blocks count
> 
>  fsck/fsck.c | 161 
>  fsck/fsck.h |   3 +
>  fsck/main.c |   2 +
>  include/f2fs_fs.h   |   3 +
>  lib/libf2fs_zoned.c |  81 ++
>  5 files changed, 250 insertions(+)
> 
> -- 
> 2.21.0


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] f2fs: allocate memory in batch in build_sit_info()

2019-08-20 Thread Jaegeuk Kim
On 08/20, Chao Yu wrote:
> On 2019/8/20 4:20, Jaegeuk Kim wrote:
> > On 07/26, Chao Yu wrote:
> >> build_sit_info() allocate all bitmaps for each segment one by one,
> >> it's quite low efficiency, this pach changes to allocate large
> >> continuous memory at a time, and divide it and assign for each bitmaps
> >> of segment. For large size image, it can expect improving its mount
> >> speed.
> > 
> > Hmm, I hit a kernel panic when mounting a partition during fault injection 
> > test:
> > 
> > 726 #ifdef CONFIG_F2FS_CHECK_FS
> > 727 if (f2fs_test_bit(offset, sit_i->sit_bitmap) !=
> > 728 f2fs_test_bit(offset, sit_i->sit_bitmap_mir))
> > 729 f2fs_bug_on(sbi, 1);
> > 730 #endif
> 
> We didn't change anything about sit_i->sit_bitmap{_mir,}, it's so wired we 
> panic
> here... :(
> 
> I double check the change, but find nothing suspicious, btw, my fault 
> injection
> testcase shows normal.
> 
> Jaegeuk, do you have any idea about this issue?

I'm bisecting. :P

> 
> Thanks,
> 
> > 
> > For your information, I'm testing without this patch.
> > 
> > Thanks,
> > 
> >>
> >> Signed-off-by: Chen Gong 
> >> Signed-off-by: Chao Yu 
> >> ---
> >> v2:
> >> - fix warning triggered in kmalloc() if requested memory size exceeds 4MB.
> >>  fs/f2fs/segment.c | 51 +--
> >>  fs/f2fs/segment.h |  1 +
> >>  2 files changed, 24 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >> index a661ac32e829..d720eacd9c57 100644
> >> --- a/fs/f2fs/segment.c
> >> +++ b/fs/f2fs/segment.c
> >> @@ -3941,7 +3941,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
> >>struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
> >>struct sit_info *sit_i;
> >>unsigned int sit_segs, start;
> >> -  char *src_bitmap;
> >> +  char *src_bitmap, *bitmap;
> >>unsigned int bitmap_size;
> >>  
> >>/* allocate memory for SIT information */
> >> @@ -3964,27 +3964,31 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
> >>if (!sit_i->dirty_sentries_bitmap)
> >>return -ENOMEM;
> >>  
> >> +#ifdef CONFIG_F2FS_CHECK_FS
> >> +  bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4;
> >> +#else
> >> +  bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3;
> >> +#endif
> >> +  sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
> >> +  if (!sit_i->bitmap)
> >> +  return -ENOMEM;
> >> +
> >> +  bitmap = sit_i->bitmap;
> >> +
> >>for (start = 0; start < MAIN_SEGS(sbi); start++) {
> >> -  sit_i->sentries[start].cur_valid_map
> >> -  = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> >> -  sit_i->sentries[start].ckpt_valid_map
> >> -  = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> >> -  if (!sit_i->sentries[start].cur_valid_map ||
> >> -  !sit_i->sentries[start].ckpt_valid_map)
> >> -  return -ENOMEM;
> >> +  sit_i->sentries[start].cur_valid_map = bitmap;
> >> +  bitmap += SIT_VBLOCK_MAP_SIZE;
> >> +
> >> +  sit_i->sentries[start].ckpt_valid_map = bitmap;
> >> +  bitmap += SIT_VBLOCK_MAP_SIZE;
> >>  
> >>  #ifdef CONFIG_F2FS_CHECK_FS
> >> -  sit_i->sentries[start].cur_valid_map_mir
> >> -  = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> >> -  if (!sit_i->sentries[start].cur_valid_map_mir)
> >> -  return -ENOMEM;
> >> +  sit_i->sentries[start].cur_valid_map_mir = bitmap;
> >> +  bitmap += SIT_VBLOCK_MAP_SIZE;
> >>  #endif
> >>  
> >> -  sit_i->sentries[start].discard_map
> >> -  = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE,
> >> -  GFP_KERNEL);
> >> -  if (!sit_i->sentries[start].discard_map)
> >> -  return -ENOMEM;
> >> +  sit_i->sentries[start].discard_map = bitmap;
> >> +  bitmap += SIT_VBLOCK_MAP_SIZE;
> >>}
> >>  
> >>sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> >> @@ -4492,21 +4496,12 @@ static void destroy_free_segmap(struct 
> >> f2fs_sb_info *sbi)
> >>  static void destroy_sit_info(struct f2fs_sb_info *sbi)
> >>  {
> >>struct sit_info *sit_i = SIT_I(sbi);
> >> -  unsigned int start;
> >>  
> >>if (!sit_i)
> >>return;
> >>  
> >> -  if (sit_i->sentries) {
> >> -  for (start = 0; start < MAIN_SEGS(sbi); start++) {
> >> -  kvfree(sit_i->sentries[start].cur_valid_map);
> >> -#ifdef CONFIG_F2FS_CHECK_FS
> >> -  kvfree(sit_i->sentries[start].cur_valid_map_mir);
> >> -#endif
> >> -  kvfree(sit_i->sentries[start].ckpt_valid_map);
> >> -  kvfree(sit_i->sentries[start].discard_map);
> >> -  }
> >> -  }
> >> +  if (sit_i->sentries)
> >> +  kvfree(sit_i->bitmap);
> >>kvfree(sit_i->tmp_map);
> >>  
> >>

[f2fs-dev] [PATCH v7] Add flags option to get xattr method paired to __vfs_getxattr

2019-08-20 Thread Mark Salyzyn via Linux-f2fs-devel
Replace arguments for get and set xattr methods, and __vfs_getxattr
and __vfs_setaxtr functions with a reference to the following now
common argument structure:

struct xattr_gs_args {
struct dentry *dentry;
struct inode *inode;
const char *name;
union {
void *buffer;
const void *value;
};
size_t size;
int flags;
};

Which in effect adds a flags option to the get method and
__vfs_getxattr function.

Add a flag option to get xattr method that has bit flag of
XATTR_NOSECURITY passed to it.  XATTR_NOSECURITY is generally then
set in the __vfs_getxattr path when called by security
infrastructure.

This handles the case of a union filesystem driver that is being
requested by the security layer to report back the xattr data.

For the use case where access is to be blocked by the security layer.

The path then could be security(dentry) ->
__vfs_getxattr({dentry...XATTR_NOSECURITY}) ->
handler->get({dentry...XATTR_NOSECURITY}) ->
__vfs_getxattr({lower_dentry...XATTR_NOSECURITY}) ->
lower_handler->get({lower_dentry...XATTR_NOSECURITY})
which would report back through the chain data and success as
expected, the logging security layer at the top would have the
data to determine the access permissions and report back the target
context that was blocked.

Without the get handler flag, the path on a union filesystem would be
the errant security(dentry) -> __vfs_getxattr(dentry) ->
handler->get(dentry) -> vfs_getxattr(lower_dentry) -> nested ->
security(lower_dentry, log off) -> lower_handler->get(lower_dentry)
which would report back through the chain no data, and -EACCES.

For selinux for both cases, this would translate to a correctly
determined blocked access. In the first case with this change a correct avc
log would be reported, in the second legacy case an incorrect avc log
would be reported against an uninitialized u:object_r:unlabeled:s0
context making the logs cosmetically useless for audit2allow.

This patch series is inert and is the wide-spread addition of the
flags option for xattr functions, and a replacement of __vfs_getxattr
with __vfs_getxattr({...XATTR_NOSECURITY}).

Signed-off-by: Mark Salyzyn 
Cc: Stephen Smalley 
Cc: linux-ker...@vger.kernel.org
Cc: kernel-t...@android.com
Cc: linux-security-mod...@vger.kernel.org
Cc: sta...@vger.kernel.org # 4.4, 4.9, 4.14 & 4.19
---
v7:
- missed spots in fs/9p/acl.c, fs/afs/xattr.c, fs/ecryptfs/crypto.c,
  fs/ubifs/xattr.c, fs/xfs/libxfs/xfs_attr.c,
  security/integrity/evm/evm_main.c and security/smack/smack_lsm.c.

v6:
- kernfs missed a spot

v5:
- introduce struct xattr_gs_args for get and set methods,
  __vfs_getxattr and __vfs_setxattr functions.
- cover a missing spot in ext2.
- switch from snprintf to scnprintf for correctness.

v4:
- ifdef __KERNEL__ around XATTR_NOSECURITY to
  keep it colocated in uapi headers.

v3:
- poor aim on ubifs not ubifs_xattr_get, but static xattr_get

v2:
- Missed a spot: ubifs, erofs and afs.

v1:
- Removed from an overlayfs patch set, and made independent.
  Expect this to be the basis of some security improvements.

a

a
---
 Documentation/filesystems/Locking |  10 ++-
 drivers/staging/erofs/xattr.c |   8 +--
 fs/9p/acl.c   |  51 +++---
 fs/9p/xattr.c |  19 +++--
 fs/afs/xattr.c| 112 +-
 fs/btrfs/xattr.c  |  36 +-
 fs/ceph/xattr.c   |  40 +--
 fs/cifs/xattr.c   |  72 +--
 fs/ecryptfs/crypto.c  |  20 +++---
 fs/ecryptfs/inode.c   |  36 ++
 fs/ecryptfs/mmap.c|  39 ++-
 fs/ext2/xattr_security.c  |  16 ++---
 fs/ext2/xattr_trusted.c   |  15 ++--
 fs/ext2/xattr_user.c  |  19 +++--
 fs/ext4/xattr_security.c  |  15 ++--
 fs/ext4/xattr_trusted.c   |  15 ++--
 fs/ext4/xattr_user.c  |  19 +++--
 fs/f2fs/xattr.c   |  42 +--
 fs/fuse/xattr.c   |  23 +++---
 fs/gfs2/xattr.c   |  18 ++---
 fs/hfs/attr.c |  15 ++--
 fs/hfsplus/xattr.c|  17 +++--
 fs/hfsplus/xattr_security.c   |  13 ++--
 fs/hfsplus/xattr_trusted.c|  13 ++--
 fs/hfsplus/xattr_user.c   |  13 ++--
 fs/jffs2/security.c   |  16 ++---
 fs/jffs2/xattr_trusted.c  |  16 ++---
 fs/jffs2/xattr_user.c |  16 ++---
 fs/jfs/xattr.c|  33 -
 fs/kernfs/inode.c |  23 +++---
 fs/nfs/nfs4proc.c |  28 
 fs/ocfs2/xattr.c  |  52 ++
 fs/orangefs/xattr.c   |  19 ++---
 fs/overlayfs/inode.c  |  43 ++--
 fs/overlayfs/overlayfs.h  |   6 +-
 fs/overlayfs/super.c  |  53 ++
 fs/posix_acl.c|  23 +++---
 

[PATCH] f2fs: fix to avoid corruption during inline conversion

2019-08-20 Thread Chao Yu
From: Chao Yu 

- f2fs_setattr
 - truncate_setsize (expand i_size)
  - f2fs_convert_inline_inode
   - f2fs_convert_inline_page
- f2fs_reserve_block
- f2fs_get_node_info failed

Once we fail in above path, inline flag will remain, however
- we've reserved one block at inode.i_addr[0]
- i_size has expanded

Fix error path to avoid inode corruption.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c   | 8 ++--
 fs/f2fs/inline.c | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2284ec706a40..05d60082da3a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -812,7 +812,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
}
 
if (attr->ia_valid & ATTR_SIZE) {
-   bool to_smaller = (attr->ia_size <= i_size_read(inode));
+   loff_t old_size = i_size_read(inode);
+   bool to_smaller = (attr->ia_size <= old_size);
 
down_write(_I(inode)->i_gc_rwsem[WRITE]);
down_write(_I(inode)->i_mmap_sem);
@@ -835,8 +836,11 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
/* should convert inline inode here */
if (!f2fs_may_inline_data(inode)) {
err = f2fs_convert_inline_inode(inode);
-   if (err)
+   if (err) {
+   /* recover old i_size */
+   i_size_write(inode, old_size);
return err;
+   }
}
inode->i_mtime = inode->i_ctime = current_time(inode);
}
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 78d6ebe165cd..16ebdd4d1f2c 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -131,6 +131,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, 
struct page *page)
 
err = f2fs_get_node_info(fio.sbi, dn->nid, );
if (err) {
+   f2fs_truncate_data_blocks_range(dn, 1);
f2fs_put_dnode(dn);
return err;
}
-- 
2.22.0



Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Theodore Y. Ts'o
On Tue, Aug 20, 2019 at 10:35:29AM +0530, Chandan Rajendra wrote:
> Looks like F2FS requires a lot more flexiblity than what can be offered by
> read callbacks i.e.
> 
> 1. F2FS wants to make use of its own workqueue for decryption, verity and
>decompression.
> 2. F2FS' decompression code is not an FS independent entity like fscrypt and
>fsverity. Hence they would need Filesystem specific callback functions to
>be invoked from "read callbacks". 
> 
> Hence I would suggest that we should drop F2FS changes made in this
> patchset. Please let me know your thoughts on this.

That's probably the best way to go for now.  My one concern is that it
means that only ext4 will be using your framework.  I could imagine
that some people might argue that should just move the callback scheme
into ext4 code as opposed to leaving it in fscrypt --- at least until
we can find other file systems where we can show that it will be
useful for those other file systems.

(Perhaps a useful experiment would be to have someone implement patches
to support fscrypt and fsverity in ext2 --- the patch might or might
not be accepted for upstream inclusion, but it would be useful to
demonstrate how easy it is to add fscrypt and fsverity.)

The other thing to consider is that there has been some discussion
about adding generalized support for I/O submission to the iomap
library.  It might be that if that work is accepted, support for
fscrypt and fsverity would be a requirement for ext4 to use that
portion of iomap's functionality.  So in that eventuality, it might be
that we'll want to move your read callbacks code into iomap, or we'll
need to rework the read callbacks code so it can work with iomap.

But this is all work for the future.  I'm a firm believe that the
perfect should not be the enemy of the good, and that none of this
should be a fundamental obstacle in having your code upstream.

Cheers,

- Ted




___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH V4 5/8] f2fs: Use read_callbacks for decrypting file data

2019-08-20 Thread Theodore Y. Ts'o
On Tue, Aug 20, 2019 at 01:12:36PM +0800, Gao Xiang wrote:
> Add a word, I have some little concern about post read procession order
> a bit as I mentioned before, because I'd like to move common EROFS
> decompression code out in the future as well for other fses to use
> after we think it's mature enough.
> 
> It seems the current code mainly addresses eliminating duplicated code,
> therefore I have no idea about that...

Actually, we should chat.  I was actually thinking about "borrowing"
code from erofs to provide ext4-specific compression.  I was really
impressed with the efficiency goals in the erofs design[1] when I
reviewed the Usenix ATC paper, and as the saying goes, the best
artists know how to steal from the best.  :-)

[1] https://www.usenix.org/conference/atc19/presentation/gao

My original specific thinking was to do code reuse by copy and paste,
simply because it was simpler, and I have limited time to work on it.
But if you are interested in making the erofs pipeline reusable by
other file systems, and have the time to do the necessary code
refactoring, I'd love to work with you on that.

It should be noted that the f2fs developers have been working on their
own compression scheme that was going to be f2fs-specific, unlike the
file system generic approach used with fsverity and fscrypt.

My expectation is that we will need to modify the read pipeling code
to support compression.  That's true whether we are looking at the
existing file system-specific code used by ext4 and f2fs or in some
combined work such as what Chandan has proposed.

Cheers,

- Ted


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v6] Add flags option to get xattr method paired to __vfs_getxattr

2019-08-20 Thread Mark Salyzyn via Linux-f2fs-devel
Replace arguments for get and set xattr methods, and __vfs_getxattr
and __vfs_setaxtr functions with a reference to the following now
common argument structure:

struct xattr_gs_args {
struct dentry *dentry;
struct inode *inode;
const char *name;
union {
void *buffer;
const void *value;
};
size_t size;
int flags;
};

Which in effect adds a flags option to the get method and
__vfs_getxattr function.

Add a flag option to get xattr method that has bit flag of
XATTR_NOSECURITY passed to it.  XATTR_NOSECURITY is generally then
set in the __vfs_getxattr path when called by security
infrastructure.

This handles the case of a union filesystem driver that is being
requested by the security layer to report back the xattr data.

For the use case where access is to be blocked by the security layer.

The path then could be security(dentry) ->
__vfs_getxattr({dentry...XATTR_NOSECURITY}) ->
handler->get({dentry...XATTR_NOSECURITY}) ->
__vfs_getxattr({lower_dentry...XATTR_NOSECURITY}) ->
lower_handler->get({lower_dentry...XATTR_NOSECURITY})
which would report back through the chain data and success as
expected, the logging security layer at the top would have the
data to determine the access permissions and report back the target
context that was blocked.

Without the get handler flag, the path on a union filesystem would be
the errant security(dentry) -> __vfs_getxattr(dentry) ->
handler->get(dentry) -> vfs_getxattr(lower_dentry) -> nested ->
security(lower_dentry, log off) -> lower_handler->get(lower_dentry)
which would report back through the chain no data, and -EACCES.

For selinux for both cases, this would translate to a correctly
determined blocked access. In the first case with this change a correct avc
log would be reported, in the second legacy case an incorrect avc log
would be reported against an uninitialized u:object_r:unlabeled:s0
context making the logs cosmetically useless for audit2allow.

This patch series is inert and is the wide-spread addition of the
flags option for xattr functions, and a replacement of __vfs_getxattr
with __vfs_getxattr({...XATTR_NOSECURITY}).

Signed-off-by: Mark Salyzyn 
Cc: Stephen Smalley 
Cc: linux-ker...@vger.kernel.org
Cc: kernel-t...@android.com
Cc: linux-security-mod...@vger.kernel.org
Cc: sta...@vger.kernel.org # 4.4, 4.9, 4.14 & 4.19
---
v6:
- kernfs missed a spot

v5:
- introduce struct xattr_gs_args for get and set methods,
  __vfs_getxattr and __vfs_setxattr functions.
- cover a missing spot in ext2.
- switch from snprintf to scnprintf for correctness.

v4:
- ifdef __KERNEL__ around XATTR_NOSECURITY to
  keep it colocated in uapi headers.

v3:
- poor aim on ubifs not ubifs_xattr_get, but static xattr_get

v2:
- Missed a spot: ubifs, erofs and afs.

v1:
- Removed from an overlayfs patch set, and made independent.
  Expect this to be the basis of some security improvements.

a
---
 Documentation/filesystems/Locking |  10 ++-
 drivers/staging/erofs/xattr.c |   8 +--
 fs/9p/acl.c   |  37 +-
 fs/9p/xattr.c |  19 +++--
 fs/afs/xattr.c| 110 +
 fs/btrfs/xattr.c  |  36 +-
 fs/ceph/xattr.c   |  40 +--
 fs/cifs/xattr.c   |  72 +--
 fs/ecryptfs/crypto.c  |  20 +++---
 fs/ecryptfs/inode.c   |  36 ++
 fs/ecryptfs/mmap.c|  39 ++-
 fs/ext2/xattr_security.c  |  16 ++---
 fs/ext2/xattr_trusted.c   |  15 ++--
 fs/ext2/xattr_user.c  |  19 +++--
 fs/ext4/xattr_security.c  |  15 ++--
 fs/ext4/xattr_trusted.c   |  15 ++--
 fs/ext4/xattr_user.c  |  19 +++--
 fs/f2fs/xattr.c   |  42 +--
 fs/fuse/xattr.c   |  23 +++---
 fs/gfs2/xattr.c   |  18 ++---
 fs/hfs/attr.c |  15 ++--
 fs/hfsplus/xattr.c|  17 +++--
 fs/hfsplus/xattr_security.c   |  13 ++--
 fs/hfsplus/xattr_trusted.c|  13 ++--
 fs/hfsplus/xattr_user.c   |  13 ++--
 fs/jffs2/security.c   |  16 ++---
 fs/jffs2/xattr_trusted.c  |  16 ++---
 fs/jffs2/xattr_user.c |  16 ++---
 fs/jfs/xattr.c|  33 -
 fs/kernfs/inode.c |  23 +++---
 fs/nfs/nfs4proc.c |  28 
 fs/ocfs2/xattr.c  |  52 ++
 fs/orangefs/xattr.c   |  19 ++---
 fs/overlayfs/inode.c  |  43 ++--
 fs/overlayfs/overlayfs.h  |   6 +-
 fs/overlayfs/super.c  |  53 ++
 fs/posix_acl.c|  23 +++---
 fs/reiserfs/xattr.c   |   2 +-
 fs/reiserfs/xattr_security.c  |  22 +++---
 fs/reiserfs/xattr_trusted.c   |  22 +++---
 fs/reiserfs/xattr_user.c  |  22 +++---