from:"Chao Yu"

Re: [PATCH] f2fs: fix to cover allocate_segment() with lock

2021-04-19 Thread Chao Yu


On 2021/4/20 0:57, Jaegeuk Kim wrote:

On 04/14, Chao Yu wrote:

As we did for other cases, in fix_curseg_write_pointer(), let's
change as below:
- use callback function s_ops->allocate_segment() instead of
raw function allocate_segment_by_default();
- cover allocate_segment() with curseg_lock and sentry_lock.

Signed-off-by: Chao Yu 
---
  fs/f2fs/segment.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b2ee6b7791b0..daf9531ec58f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info 
*sbi, int type)
  
  	f2fs_notice(sbi, "Assign new section to curseg[%d]: "

"curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
-   allocate_segment_by_default(sbi, type, true);
+
+   down_read(&SM_I(sbi)->curseg_lock);
+   down_write(&SIT_I(sbi)->sentry_lock);
+   SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
+   up_write(&SIT_I(sbi)->sentry_lock);
+   up_read(&SM_I(sbi)->curseg_lock);


Seems f2fs_allocate_new_section()?


f2fs_allocate_new_section() will allocate new section only when current
section has been initialized and has valid block/ckpt_block.

It looks fix_curseg_write_pointer() wants to force migrating current segment
to new section whenever write pointer and curseg->next_blkoff is inconsistent.

So how about adding a parameter to force f2fs_allocate_new_section() to
allocate new section?

Thanks,



  
  	/* check consistency of the zone curseg pointed to */

if (check_zone_write_pointer(sbi, zbd, &zone))
--
2.29.2

.

Re: [f2fs-dev] [PATCH] fs: f2fs: Remove unnecessary struct declaration

2021-04-18 Thread Chao Yu


On 2021/4/19 10:20, Wan Jiabing wrote:

struct dnode_of_data is defined at 897th line.
The declaration here is unnecessary. Remove it.

Signed-off-by: Wan Jiabing 


Reviewed-by: Chao Yu 

Thanks,

[RFC PATCH] f2fs: reduce expensive checkpoint trigger frequency

2021-04-16 Thread Chao Yu

We may trigger high frequent checkpoint for below case:
1. mkdir /mnt/dir1; set dir1 encrypted
2. touch /mnt/file1; fsync /mnt/file1
3. mkdir /mnt/dir2; set dir2 encrypted
4. touch /mnt/file2; fsync /mnt/file2
...

Although, newly created dir and file are not related, due to
commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will
trigger checkpoint whenever fsync() comes after a new encrypted dir
created.

In order to avoid such condition, let's record an entry including
directory's ino into global cache when we initialize encryption policy
in a checkpointed directory, and then only trigger checkpoint() when
target file's parent has non-persisted encryption policy, for the case
its parent is not checkpointed, need_do_checkpoint() has cover that
by verifying it with f2fs_is_checkpointed_node().

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h  | 2 ++
 fs/f2fs/file.c  | 3 +++
 fs/f2fs/xattr.c | 6 --
 include/trace/events/f2fs.h | 3 ++-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 87d734f5589d..34487e527d12 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -246,6 +246,7 @@ enum {
APPEND_INO, /* for append ino list */
UPDATE_INO, /* for update ino list */
TRANS_DIR_INO,  /* for trasactions dir ino list */
+   ENC_DIR_INO,/* for encrypted dir ino list */
FLUSH_INO,  /* for multiple device flushing */
MAX_INO_ENTRY,  /* max. list */
 };
@@ -1090,6 +1091,7 @@ enum cp_reason_type {
CP_FASTBOOT_MODE,
CP_SPEC_LOG_NUM,
CP_RECOVER_DIR,
+   CP_ENC_DIR,
 };
 
 enum iostat_type {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6284b2f4a60b..a6c38d8b1ec3 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -218,6 +218,9 @@ static inline enum cp_reason_type need_do_checkpoint(struct 
inode *inode)
f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino,
TRANS_DIR_INO))
cp_reason = CP_RECOVER_DIR;
+   else if (f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino,
+   ENC_DIR_INO))
+   cp_reason = CP_ENC_DIR;
 
return cp_reason;
 }
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index c8f34decbf8e..38796d488d15 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -630,6 +630,7 @@ static int __f2fs_setxattr(struct inode *inode, int index,
const char *name, const void *value, size_t size,
struct page *ipage, int flags)
 {
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_xattr_entry *here, *last;
void *base_addr, *last_base_addr;
int found, newsize;
@@ -745,8 +746,9 @@ static int __f2fs_setxattr(struct inode *inode, int index,
!strcmp(name, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT))
f2fs_set_encrypted_inode(inode);
f2fs_mark_inode_dirty_sync(inode, true);
-   if (!error && S_ISDIR(inode->i_mode))
-   set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP);
+   if (!error && S_ISDIR(inode->i_mode) &&
+   f2fs_is_checkpointed_node(sbi, inode->i_ino))
+   f2fs_add_ino_entry(sbi, inode->i_ino, ENC_DIR_INO);
 
 same:
if (is_inode_flag_set(inode, FI_ACL_MODE)) {
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 56b113e3cd6a..ca0cf12226e9 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -145,7 +145,8 @@ TRACE_DEFINE_ENUM(CP_RESIZE);
{ CP_NODE_NEED_CP,  "node needs cp" },  \
{ CP_FASTBOOT_MODE, "fastboot mode" },  \
{ CP_SPEC_LOG_NUM,  "log type is 2" },  \
-   { CP_RECOVER_DIR,   "dir needs recovery" })
+   { CP_RECOVER_DIR,   "dir needs recovery" }, \
+   { CP_ENC_DIR,   "persist encryption policy" })
 
 #define show_shutdown_mode(type)   \
__print_symbolic(type,  \
-- 
2.29.2

Re: [PATCH] direct-io: use read lock for DIO_LOCKING flag

2021-04-15 Thread Chao Yu


On 2021/4/16 8:43, Al Viro wrote:

On Thu, Apr 15, 2021 at 12:24:13PM +0200, Jan Kara wrote:

On Thu 15-04-21 17:43:32, Chao Yu wrote:

9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
lock from mutex to rwsem, however, we forgot to adjust lock for
DIO_LOCKING flag in do_blockdev_direct_IO(),


The change in question had nothing to do with the use of ->i_mutex for
regular files data access.


so let's change to hold read
lock to mitigate performance regression in the case of read DIO vs read DIO,
meanwhile it still keeps original functionality of avoiding buffered access
vs direct access.

Signed-off-by: Chao Yu 


Thanks for the patch but this is not safe. Originally we had exclusive lock
(with i_mutex), switching to rwsem doesn't change that requirement. It may
be OK for some filesystems to actually use shared acquisition of rwsem for
DIO reads but it is not clear that is fine for all filesystems (and I
suspect those filesystems that actually do care already don't use
DIO_LOCKING flag or were already converted to iomap_dio_rw()). So unless
you do audit of all filesystems using do_blockdev_direct_IO() with
DIO_LOCKING flag and make sure they are all fine with inode lock in shared
mode, this is a no-go.


Aye.  Frankly, I would expect that anyone bothering with that kind of
analysis for given filesystem (and there are fairly unpleasant ones in the
list) would just use the fruits of those efforts to convert it over to
iomap.


Actually, I was misguided by DIO_LOCKING comments more or less, it looks it
was introduced to avoid race case only in between buffered IO and DIO.

/* need locking between buffered and direct access */
DIO_LOCKING = 0x01,

I don't think it is easy for me to analyse usage scenario/restriction of all
DIO_LOCKING users, and get their developers' acks for this change.

Converting fs to use iomap_dio_rw looks a better option for me, thanks, Jan
and Al. :)

Thanks,



"Read DIO" does not mean that accesses to private in-core data structures used
by given filesystem can be safely done in parallel.  So blanket patch like
that is not safe at all.
.

[PATCH] direct-io: use read lock for DIO_LOCKING flag

2021-04-15 Thread Chao Yu

9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
lock from mutex to rwsem, however, we forgot to adjust lock for
DIO_LOCKING flag in do_blockdev_direct_IO(), so let's change to hold read
lock to mitigate performance regression in the case of read DIO vs read DIO,
meanwhile it still keeps original functionality of avoiding buffered access
vs direct access.

Signed-off-by: Chao Yu 
---
 fs/direct-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index b2e86e739d7a..93ff912f2749 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1166,7 +1166,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
dio->flags = flags;
if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
/* will be released by direct_io_worker */
-   inode_lock(inode);
+   inode_lock_shared(inode);
}
 
/* Once we sampled i_size check for reads beyond EOF */
@@ -1316,7 +1316,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
 * of protecting us from looking up uninitialized blocks.
 */
if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
-   inode_unlock(dio->inode);
+   inode_unlock_shared(dio->inode);
 
/*
 * The only time we want to leave bios in flight is when a successful
@@ -1341,7 +1341,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
 
 fail_dio:
if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
-   inode_unlock(inode);
+   inode_unlock_shared(inode);
 
kmem_cache_free(dio_cache, dio);
return retval;
-- 
2.29.2

[PATCH] f2fs: fix to cover allocate_segment() with lock

2021-04-13 Thread Chao Yu

As we did for other cases, in fix_curseg_write_pointer(), let's
change as below:
- use callback function s_ops->allocate_segment() instead of
raw function allocate_segment_by_default();
- cover allocate_segment() with curseg_lock and sentry_lock.

Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b2ee6b7791b0..daf9531ec58f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info 
*sbi, int type)
 
f2fs_notice(sbi, "Assign new section to curseg[%d]: "
"curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
-   allocate_segment_by_default(sbi, type, true);
+
+   down_read(&SM_I(sbi)->curseg_lock);
+   down_write(&SIT_I(sbi)->sentry_lock);
+   SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
+   up_write(&SIT_I(sbi)->sentry_lock);
+   up_read(&SM_I(sbi)->curseg_lock);
 
/* check consistency of the zone curseg pointed to */
if (check_zone_write_pointer(sbi, zbd, &zone))
-- 
2.29.2

[PATCH] f2fs: document: add description about compressed space handling

2021-04-13 Thread Chao Yu

User or developer may still be confused about why f2fs doesn't expose
compressed space to userspace, add description about compressed space
handling policy into f2fs documentation.

Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index 63c0c49b726d..992bf91eeec8 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -819,6 +819,14 @@ Compression implementation
   * chattr +c file
   * chattr +c dir; touch dir/file
   * mount w/ -o compress_extension=ext; touch file.ext
+  * mount w/ -o compress_extension=*; touch any_file
+
+- At this point, compression feature doesn't expose compressed space to user
+  directly in order to guarantee potential data updates later to the space.
+  Instead, the main goal is to reduce data writes to flash disk as much as
+  possible, resulting in extending disk life time as well as relaxing IO
+  congestion. Alternatively, we've added ioctl interface to reclaim compressed
+  space and show it to user after putting the immutable bit.
 
 Compress metadata layout::
 
-- 
2.29.2

Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-04-13 Thread Chao Yu


On 2021/4/13 10:59, Jaegeuk Kim wrote:

On 04/11, Chao Yu wrote:

Hi Jaegeuk,

Could you please help to merge below cleanup diff into original patch?
or merge this separately if it is too late since it is near rc7.


I didn't review this tho, this gives an error in xfstests/083.



From 59c2bd34fb0c77bcede2af7e5d308c014c544a1e Mon Sep 17 00:00:00 2001
From: Chao Yu 
Date: Sun, 11 Apr 2021 14:29:34 +0800
Subject: [PATCH] f2fs: avoid duplicated codes for cleanup

f2fs_segment_has_free_slot() was copied and modified from
__next_free_blkoff(), they are almost the same, clean up to
reuse common code as much as possible.

Signed-off-by: Chao Yu 
---
- fix to assign .next_blkoff in change_curseg()
 fs/f2fs/segment.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index d6c6c13feb43..6e740ecf0814 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2638,22 +2638,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int 
type, bool new_sec)
curseg->alloc_type = LFS;
 }

-static void __next_free_blkoff(struct f2fs_sb_info *sbi,
-   struct curseg_info *seg, block_t start)
+static int __next_free_blkoff(struct f2fs_sb_info *sbi,
+   int segno, block_t start)
 {
-   struct seg_entry *se = get_seg_entry(sbi, seg->segno);
+   struct seg_entry *se = get_seg_entry(sbi, segno);
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
unsigned long *target_map = SIT_I(sbi)->tmp_map;
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
+   int i;

for (i = 0; i < entries; i++)
target_map[i] = ckpt_map[i] | cur_map[i];

-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
-
-   seg->next_blkoff = pos;
+   return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
 }

 /*
@@ -2665,26 +2663,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
*sbi,
struct curseg_info *seg)
 {
if (seg->alloc_type == SSR)
-   __next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
+   seg->next_blkoff =
+   __next_free_blkoff(sbi, seg->segno,
+   seg->next_blkoff + 1);
else
seg->next_blkoff++;
 }

 bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
 {
-   struct seg_entry *se = get_seg_entry(sbi, segno);
-   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
-   unsigned long *target_map = SIT_I(sbi)->tmp_map;
-   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
-   unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
-
-   for (i = 0; i < entries; i++)
-   target_map[i] = ckpt_map[i] | cur_map[i];
-
-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
-
-   return pos < sbi->blocks_per_seg;
+   return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg;
 }

 /*
@@ -2712,7 +2700,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int 
type, bool flush)

reset_curseg(sbi, type, 1);
curseg->alloc_type = SSR;
-   __next_free_blkoff(sbi, curseg, 0);
+   curseg->next_blkoff = __next_free_blkoff(sbi, curseg->segno, 0);

sum_page = f2fs_get_sum_page(sbi, new_segno);
if (IS_ERR(sum_page)) {
--
2.29.2

[PATCH] f2fs: fix to avoid NULL pointer dereference

2021-04-13 Thread Chao Yu

From: Yi Chen 

Unable to handle kernel NULL pointer dereference at virtual address 

pc : f2fs_put_page+0x1c/0x26c
lr : __revoke_inmem_pages+0x544/0x75c
f2fs_put_page+0x1c/0x26c
__revoke_inmem_pages+0x544/0x75c
__f2fs_commit_inmem_pages+0x364/0x3c0
f2fs_commit_inmem_pages+0xc8/0x1a0
f2fs_ioc_commit_atomic_write+0xa4/0x15c
f2fs_ioctl+0x5b0/0x1574
file_ioctl+0x154/0x320
do_vfs_ioctl+0x164/0x740
__arm64_sys_ioctl+0x78/0xa4
el0_svc_common+0xbc/0x1d0
el0_svc_handler+0x74/0x98
el0_svc+0x8/0xc

In f2fs_put_page, we access page->mapping is NULL.
The root cause is:
In some cases, the page refcount and ATOMIC_WRITTEN_PAGE
flag miss set for page-priavte flag has been set.
We add f2fs_bug_on like this:

f2fs_register_inmem_page()
{
...
f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE);

f2fs_bug_on(F2FS_I_SB(inode), !IS_ATOMIC_WRITTEN_PAGE(page));
...
}

The bug on stack follow link this:
PC is at f2fs_register_inmem_page+0x238/0x2b4
LR is at f2fs_register_inmem_page+0x2a8/0x2b4
f2fs_register_inmem_page+0x238/0x2b4
f2fs_set_data_page_dirty+0x104/0x164
set_page_dirty+0x78/0xc8
f2fs_write_end+0x1b4/0x444
generic_perform_write+0x144/0x1cc
__generic_file_write_iter+0xc4/0x174
f2fs_file_write_iter+0x2c0/0x350
__vfs_write+0x104/0x134
vfs_write+0xe8/0x19c
SyS_pwrite64+0x78/0xb8

To fix this issue, let's add page refcount add page-priavte flag.
The page-private flag is not cleared and needs further analysis.

Signed-off-by: Chao Yu 
Signed-off-by: Ge Qiu 
Signed-off-by: Dehe Gu 
Signed-off-by: Yi Chen 
---
 fs/f2fs/segment.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0cb1ca88d4aa..d6c6c13feb43 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -186,7 +186,10 @@ void f2fs_register_inmem_page(struct inode *inode, struct 
page *page)
 {
struct inmem_pages *new;
 
-   f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE);
+   if (PagePrivate(page))
+   set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE);
+   else
+   f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE);
 
new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS);
 
-- 
2.29.2

Re: [PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-12 Thread Chao Yu


On 2021/4/13 11:27, Jaegeuk Kim wrote:

On 04/12, Chao Yu wrote:

As Yi Chen reported, there is a potential race case described as below:

Thread AThread B
- f2fs_ioc_start_atomic_write
- mkwrite
 - set_page_dirty
  - f2fs_set_page_private(page, 0)
  - set_inode_flag(FI_ATOMIC_FILE)
- mkwrite same page
 - set_page_dirty
  - f2fs_register_inmem_page
   - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
 failed due to PagePrivate flag has been set
   - list_add_tail
- truncate_inode_pages
 - f2fs_invalidate_page
  - clear page private but w/o remove it from
inmem_list
 - set page->mapping to NULL
- f2fs_ioc_commit_atomic_write
  - __f2fs_commit_inmem_pages
- __revoke_inmem_pages
 - f2fs_put_page panic as page->mapping is NULL

The root cause is we missed to keep isolation of atomic write in the case
of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
lock to avoid this issue.


My only concern is performance regression. Could you please verify the numbers?


Do you have specific test script?

IIRC, the scenario you mean is multi-threads write/mmap the same db, right?

Thanks,





Reported-by: Yi Chen 
Signed-off-by: Chao Yu 
---
v3:
- rebase to last dev branch
- update commit message because this patch fixes a different racing issue
of atomic write
  fs/f2fs/file.c| 3 +++
  fs/f2fs/segment.c | 6 ++
  2 files changed, 9 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d697c8900fa7..6284b2f4a60b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
  
  	down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);

+   down_write(&F2FS_I(inode)->i_mmap_sem);
  
  	/*

 * Should wait end_io to count F2FS_WB_CP_DATA correctly by
@@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
  inode->i_ino, get_dirty_pages(inode));
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) {
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
goto out;
}
@@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
/* add inode in inmem_list first and set atomic_file */
set_inode_flag(inode, FI_ATOMIC_FILE);
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
  
  	f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0cb1ca88d4aa..78c8342f52fd 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
struct f2fs_inode_info *fi = F2FS_I(inode);
  
  	do {

+   down_write(&F2FS_I(inode)->i_mmap_sem);
mutex_lock(&fi->inmem_lock);
if (list_empty(&fi->inmem_pages)) {
fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
@@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
  
  			mutex_unlock(&fi->inmem_lock);

+   up_write(&F2FS_I(inode)->i_mmap_sem);
break;
}
__revoke_inmem_pages(inode, &fi->inmem_pages,
true, false, true);
mutex_unlock(&fi->inmem_lock);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
} while (1);
  }
  
@@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)

f2fs_balance_fs(sbi, true);
  
  	down_write(&fi->i_gc_rwsem[WRITE]);

+   down_write(&F2FS_I(inode)->i_mmap_sem);
  
  	f2fs_lock_op(sbi);

set_inode_flag(inode, FI_ATOMIC_COMMIT);
@@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
  
  	f2fs_unlock_op(sbi);

+
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&fi->i_gc_rwsem[WRITE]);
  
  	return err;

--
2.29.2

.

Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-04-12 Thread Chao Yu


On 2021/4/13 10:59, Jaegeuk Kim wrote:

@@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int 
type, bool flush)

reset_curseg(sbi, type, 1);
curseg->alloc_type = SSR;
-   __next_free_blkoff(sbi, curseg, 0);
+   __next_free_blkoff(sbi, curseg->segno, 0);


Forgot to assign curseg->next_blkoff here, I checked generic/083, it passed,
let me run all testcases.

Thanks,



sum_page = f2fs_get_sum_page(sbi, new_segno);
if (IS_ERR(sum_page)) {

Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-04-12 Thread Chao Yu


On 2021/4/13 10:59, Jaegeuk Kim wrote:

On 04/11, Chao Yu wrote:

Hi Jaegeuk,

Could you please help to merge below cleanup diff into original patch?
or merge this separately if it is too late since it is near rc7.


I didn't review this tho, this gives an error in xfstests/083.


My bad, I hit this issue too, let me check this.

Thanks,





 From 5a342a8f332a1b3281ec0e2b4d41b5287689c8ed Mon Sep 17 00:00:00 2001
From: Chao Yu 
Date: Sun, 11 Apr 2021 14:29:34 +0800
Subject: [PATCH] f2fs: avoid duplicated codes for cleanup

f2fs_segment_has_free_slot() was copied from __next_free_blkoff(),
the main implementation of them is almost the same, clean up them to
reuse common code as much as possible.

Signed-off-by: Chao Yu 
---
  fs/f2fs/segment.c | 32 ++--
  1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b33273aa5c22..bd9056165d62 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2627,22 +2627,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int 
type, bool new_sec)
curseg->alloc_type = LFS;
  }

-static void __next_free_blkoff(struct f2fs_sb_info *sbi,
-   struct curseg_info *seg, block_t start)
+static int __next_free_blkoff(struct f2fs_sb_info *sbi,
+   int segno, block_t start)
  {
-   struct seg_entry *se = get_seg_entry(sbi, seg->segno);
+   struct seg_entry *se = get_seg_entry(sbi, segno);
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
unsigned long *target_map = SIT_I(sbi)->tmp_map;
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
+   int i;

for (i = 0; i < entries; i++)
target_map[i] = ckpt_map[i] | cur_map[i];

-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
-
-   seg->next_blkoff = pos;
+   return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
  }

  /*
@@ -2654,26 +2652,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
*sbi,
struct curseg_info *seg)
  {
if (seg->alloc_type == SSR)
-   __next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
+   seg->next_blkoff =
+   __next_free_blkoff(sbi, seg->segno,
+   seg->next_blkoff + 1);
else
seg->next_blkoff++;
  }

  bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
  {
-   struct seg_entry *se = get_seg_entry(sbi, segno);
-   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
-   unsigned long *target_map = SIT_I(sbi)->tmp_map;
-   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
-   unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
-
-   for (i = 0; i < entries; i++)
-   target_map[i] = ckpt_map[i] | cur_map[i];
-
-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
-
-   return pos < sbi->blocks_per_seg;
+   return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg;
  }

  /*
@@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int 
type, bool flush)

reset_curseg(sbi, type, 1);
curseg->alloc_type = SSR;
-   __next_free_blkoff(sbi, curseg, 0);
+   __next_free_blkoff(sbi, curseg->segno, 0);

sum_page = f2fs_get_sum_page(sbi, new_segno);
if (IS_ERR(sum_page)) {
--
2.22.1



___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
.

[PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-12 Thread Chao Yu

As Yi Chen reported, there is a potential race case described as below:

Thread AThread B
- f2fs_ioc_start_atomic_write
- mkwrite
 - set_page_dirty
  - f2fs_set_page_private(page, 0)
 - set_inode_flag(FI_ATOMIC_FILE)
- mkwrite same page
 - set_page_dirty
  - f2fs_register_inmem_page
   - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
 failed due to PagePrivate flag has been set
   - list_add_tail
- truncate_inode_pages
 - f2fs_invalidate_page
  - clear page private but w/o remove it from
inmem_list
 - set page->mapping to NULL
- f2fs_ioc_commit_atomic_write
 - __f2fs_commit_inmem_pages
   - __revoke_inmem_pages
- f2fs_put_page panic as page->mapping is NULL

The root cause is we missed to keep isolation of atomic write in the case
of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
lock to avoid this issue.

Reported-by: Yi Chen 
Signed-off-by: Chao Yu 
---
v3:
- rebase to last dev branch
- update commit message because this patch fixes a different racing issue
of atomic write
 fs/f2fs/file.c| 3 +++
 fs/f2fs/segment.c | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d697c8900fa7..6284b2f4a60b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
 
down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+   down_write(&F2FS_I(inode)->i_mmap_sem);
 
/*
 * Should wait end_io to count F2FS_WB_CP_DATA correctly by
@@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
  inode->i_ino, get_dirty_pages(inode));
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) {
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
goto out;
}
@@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
/* add inode in inmem_list first and set atomic_file */
set_inode_flag(inode, FI_ATOMIC_FILE);
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0cb1ca88d4aa..78c8342f52fd 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
struct f2fs_inode_info *fi = F2FS_I(inode);
 
do {
+   down_write(&F2FS_I(inode)->i_mmap_sem);
mutex_lock(&fi->inmem_lock);
if (list_empty(&fi->inmem_pages)) {
fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
@@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
 
mutex_unlock(&fi->inmem_lock);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
break;
}
__revoke_inmem_pages(inode, &fi->inmem_pages,
true, false, true);
mutex_unlock(&fi->inmem_lock);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
} while (1);
 }
 
@@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
f2fs_balance_fs(sbi, true);
 
down_write(&fi->i_gc_rwsem[WRITE]);
+   down_write(&F2FS_I(inode)->i_mmap_sem);
 
f2fs_lock_op(sbi);
set_inode_flag(inode, FI_ATOMIC_COMMIT);
@@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
 
f2fs_unlock_op(sbi);
+
+   up_write(&F2FS_I(inode)->i_mmap_sem);
up_write(&fi->i_gc_rwsem[WRITE]);
 
return err;
-- 
2.29.2

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-04-11 Thread Chao Yu


Hi Jaegeuk,

Could you please help to merge below cleanup diff into original patch?
or merge this separately if it is too late since it is near rc7.

From 5a342a8f332a1b3281ec0e2b4d41b5287689c8ed Mon Sep 17 00:00:00 2001
From: Chao Yu 
Date: Sun, 11 Apr 2021 14:29:34 +0800
Subject: [PATCH] f2fs: avoid duplicated codes for cleanup

f2fs_segment_has_free_slot() was copied from __next_free_blkoff(),
the main implementation of them is almost the same, clean up them to
reuse common code as much as possible.

Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b33273aa5c22..bd9056165d62 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2627,22 +2627,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int 
type, bool new_sec)
curseg->alloc_type = LFS;
 }

-static void __next_free_blkoff(struct f2fs_sb_info *sbi,
-   struct curseg_info *seg, block_t start)
+static int __next_free_blkoff(struct f2fs_sb_info *sbi,
+   int segno, block_t start)
 {
-   struct seg_entry *se = get_seg_entry(sbi, seg->segno);
+   struct seg_entry *se = get_seg_entry(sbi, segno);
int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
unsigned long *target_map = SIT_I(sbi)->tmp_map;
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
+   int i;

for (i = 0; i < entries; i++)
target_map[i] = ckpt_map[i] | cur_map[i];

-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
-
-   seg->next_blkoff = pos;
+   return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
 }

 /*
@@ -2654,26 +2652,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
*sbi,
struct curseg_info *seg)
 {
if (seg->alloc_type == SSR)
-   __next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
+   seg->next_blkoff =
+   __next_free_blkoff(sbi, seg->segno,
+   seg->next_blkoff + 1);
else
seg->next_blkoff++;
 }

 bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
 {
-   struct seg_entry *se = get_seg_entry(sbi, segno);
-   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
-   unsigned long *target_map = SIT_I(sbi)->tmp_map;
-   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
-   unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
-   int i, pos;
-
-   for (i = 0; i < entries; i++)
-   target_map[i] = ckpt_map[i] | cur_map[i];
-
-   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
-
-   return pos < sbi->blocks_per_seg;
+   return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg;
 }

 /*
@@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int 
type, bool flush)

reset_curseg(sbi, type, 1);
curseg->alloc_type = SSR;
-   __next_free_blkoff(sbi, curseg, 0);
+   __next_free_blkoff(sbi, curseg->segno, 0);

sum_page = f2fs_get_sum_page(sbi, new_segno);
if (IS_ERR(sum_page)) {
--
2.22.1

Re: [PATCH v2] f2fs: fix the periodic wakeups of discard thread

2021-04-06 Thread Chao Yu


On 2021/4/6 17:09, Sahitya Tummala wrote:

Fix the unnecessary periodic wakeups of discard thread that happens under
below two conditions -

1. When f2fs is heavily utilized over 80%, the current discard policy
sets the max sleep timeout of discard thread as 50ms
(DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are
no pending discard commands to be issued.

2. In the issue_discard_thread() path when there are no pending discard
commands, it fails to reset the wait_ms to max timeout value.

Signed-off-by: Sahitya Tummala 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v2 00/10] erofs: add big pcluster compression support

2021-04-02 Thread Chao Yu

po for big pcluster:
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b 
experimental-bigpcluster-compact

Thanks for your time on reading this!


Nice job!

Acked-by: Chao Yu 

Thanks,



Thanks,
Gao Xiang

changes since v1:
  - add a missing vunmap in erofs_pcpubuf_exit();
  - refine comments and commit messages.

  (btw, I'll apply this patchset for -next first for further integration
   test, which will be aimed to 5.13-rc1.)

Gao Xiang (10):
   erofs: reserve physical_clusterbits[]
   erofs: introduce multipage per-CPU buffers
   erofs: introduce physical cluster slab pools
   erofs: fix up inplace I/O pointer for big pcluster
   erofs: add big physical cluster definition
   erofs: adjust per-CPU buffers according to max_pclusterblks
   erofs: support parsing big pcluster compress indexes
   erofs: support parsing big pcluster compact indexes
   erofs: support decompress big pcluster for lz4 backend
   erofs: enable big pcluster feature

  fs/erofs/Kconfig|  14 ---
  fs/erofs/Makefile   |   2 +-
  fs/erofs/decompressor.c | 216 +---
  fs/erofs/erofs_fs.h |  31 --
  fs/erofs/internal.h |  31 ++
  fs/erofs/pcpubuf.c  | 134 +
  fs/erofs/super.c|   1 +
  fs/erofs/utils.c|  12 ---
  fs/erofs/zdata.c| 193 ++-
  fs/erofs/zdata.h|  14 +--
  fs/erofs/zmap.c | 155 ++--
  11 files changed, 560 insertions(+), 243 deletions(-)
  create mode 100644 fs/erofs/pcpubuf.c

[PATCH v2] f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()

2021-04-02 Thread Chao Yu

Callers may pass fio parameter with NULL value to f2fs_allocate_data_block(),
so we should make sure accessing fio's field after fio's validation check.

Fixes: f608c38c59c6 ("f2fs: clean up parameter of f2fs_allocate_data_block()")
Signed-off-by: Chao Yu 
---
v2:
- relocate fio->retry assignment to avoid race condition.
 fs/f2fs/segment.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c517e689a9a3..44897cfecb1e 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3417,12 +3417,12 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
f2fs_inode_chksum_set(sbi, page);
}
 
-   if (F2FS_IO_ALIGNED(sbi))
-   fio->retry = false;
-
if (fio) {
struct f2fs_bio_info *io;
 
+   if (F2FS_IO_ALIGNED(sbi))
+   fio->retry = false;
+
INIT_LIST_HEAD(&fio->list);
fio->in_list = true;
io = sbi->write_io[fio->type] + fio->temp;
-- 
2.29.2

Re: [f2fs-dev] [PATCH] f2fs: set checkpoint_merge by default

2021-04-01 Thread Chao Yu


On 2021/4/2 8:42, Jaegeuk Kim wrote:

Once we introduced checkpoint_merge, we've seen some contention w/o the option.
In order to avoid it, let's set it by default.

Signed-off-by: Jaegeuk Kim 


Reviewed-by: Chao Yu 

Thanks,

[PATCH] f2fs: fix to avoid GC/mmap race with f2fs_truncate()

2021-03-31 Thread Chao Yu

It missed to hold i_gc_rwsem and i_map_sem around f2fs_truncate()
in f2fs_file_write_iter() to avoid racing with background GC and
mmap, fix it.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index dc79694e512c..f3ca63b55843 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4443,8 +4443,13 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
clear_inode_flag(inode, FI_NO_PREALLOC);
 
/* if we couldn't write data, we should deallocate blocks. */
-   if (preallocated && i_size_read(inode) < target_size)
+   if (preallocated && i_size_read(inode) < target_size) {
+   down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+   down_write(&F2FS_I(inode)->i_mmap_sem);
f2fs_truncate(inode);
+   up_write(&F2FS_I(inode)->i_mmap_sem);
+   up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+   }
 
if (ret > 0)
f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret);
-- 
2.29.2

[PATCH] f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()

2021-03-31 Thread Chao Yu

Callers may pass fio parameter with NULL value to f2fs_allocate_data_block(),
so we should make sure accessing fio's field after fio's validation check.

Fixes: f608c38c59c6 ("f2fs: clean up parameter of f2fs_allocate_data_block()")
Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c517e689a9a3..d076ae03b5dc 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3417,9 +3417,6 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
f2fs_inode_chksum_set(sbi, page);
}
 
-   if (F2FS_IO_ALIGNED(sbi))
-   fio->retry = false;
-
if (fio) {
struct f2fs_bio_info *io;
 
@@ -3429,6 +3426,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, 
struct page *page,
spin_lock(&io->io_lock);
list_add_tail(&fio->list, &io->io_list);
spin_unlock(&io->io_lock);
+
+   if (F2FS_IO_ALIGNED(sbi))
+   fio->retry = false;
}
 
mutex_unlock(&curseg->curseg_mutex);
-- 
2.29.2

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-30 Thread Chao Yu


On 2021/3/31 9:57, Jaegeuk Kim wrote:

On 03/27, Chao Yu wrote:

On 2021/3/27 9:52, Chao Yu wrote:

On 2021/3/27 1:30, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/26 9:19, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/25 9:59, Chao Yu wrote:

On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.


In that case, mount(2) will try readonly, no?


Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.


Any other concern about this patch?


Indeed we're talking about different things. :)

This case is mount(ro) with device(ro) having some data to recover.
My point is why not giving a chance to mount(ro) to show the current data
covered by a valid checkpoint. This doesn't change anything in the disk,

Got your idea.

IMO, it has potential issue in above condition:


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


e.g.

Recovery writes one inode and then triggers a checkpoint, all writes fail


I'm confused. Currently we don't trigger the roll-forward recovery.


Oh, my miss, sorry. :-P

My point is in this condition we can return error and try to notice user to
mount with disable_roll_forward or norecovery option, then at least user can
know he should not expect last fsynced data in newly mounted image.

Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data,
if there is no such data, then let mount() succeed.


Something like this, maybe:

---
  fs/f2fs/super.c | 17 +
  1 file changed, 13 insertions(+), 4 de

[PATCH] f2fs: fix to restrict mount condition on readonly block device

2021-03-30 Thread Chao Yu

When we mount an unclean f2fs image in a readonly block device, let's
make mount() succeed only when there is no recoverable data in that
image, otherwise after mount(), file fsyned won't be recovered as user
expected.

Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 954b1fe97d67..14239e2b7ae7 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3966,10 +3966,18 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 * previous checkpoint was not done by clean system shutdown.
 */
if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
-   f2fs_err(sbi, "Need to recover fsync data, but 
write access unavailable");
-   else
-   f2fs_info(sbi, "write access unavailable, 
skipping recovery");
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
+   err = f2fs_recover_fsync_data(sbi, true);
+   if (err > 0) {
+   err = -EROFS;
+   f2fs_err(sbi, "Need to recover fsync 
data, but "
+   "write access unavailable, 
please try "
+   "mount w/ disable_roll_forward 
or norecovery");
+   }
+   if (err < 0)
+   goto free_meta;
+   }
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}
 
-- 
2.29.2

Re: [PATCH v2 4/4] erofs: add on-disk compression configurations

2021-03-29 Thread Chao Yu


On 2021/3/29 14:55, Gao Xiang wrote:

I found a reference here,
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v5.11#n99
also vaguely remembered some threads from Linus, but hard to find now :-(


Sure, we can follow this rule for erofs code style. :)

Thanks,

Re: [PATCH v2 4/4] erofs: add on-disk compression configurations

2021-03-28 Thread Chao Yu


On 2021/3/29 14:36, Gao Xiang wrote:

Hi Chao,

On Mon, Mar 29, 2021 at 02:26:05PM +0800, Chao Yu wrote:

On 2021/3/29 9:23, Gao Xiang wrote:

From: Gao Xiang 

Add a bitmap for available compression algorithms and a variable-sized
on-disk table for compression options in preparation for upcoming big
pcluster and LZMA algorithm, which follows the end of super block.

To parse the compression options, the bitmap is scanned one by one.
For each available algorithm, there is data followed by 2-byte `length'
correspondingly (it's enough for most cases, or entire fs blocks should
be used.)

With such available algorithm bitmap, kernel itself can also refuse to
mount such filesystem if any unsupported compression algorithm exists.

Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER.

Signed-off-by: Gao Xiang 
---
   fs/erofs/decompressor.c |   2 +-
   fs/erofs/erofs_fs.h |  14 ++--
   fs/erofs/internal.h |   5 +-
   fs/erofs/super.c| 143 +++-
   4 files changed, 157 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 97538ff24a19..27aa6a99b371 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -41,7 +41,7 @@ int z_erofs_load_lz4_config(struct super_block *sb,
}
distance = le16_to_cpu(lz4->max_distance);
} else {
-   distance = le16_to_cpu(dsb->lz4_max_distance);
+   distance = le16_to_cpu(dsb->u1.lz4_max_distance);
}
EROFS_SB(sb)->lz4.max_distance_pages = distance ?
diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index e0f3c0db1f82..5a126493d4d9 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -18,15 +18,16 @@
* be incompatible with this kernel version.
*/
   #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING  0x0001
+#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS  0x0002
   #define EROFS_ALL_FEATURE_INCOMPAT   
EROFS_FEATURE_INCOMPAT_LZ4_0PADDING
-/* 128-byte erofs on-disk super block */
+/* erofs on-disk super block (currently 128 bytes) */
   struct erofs_super_block {
__le32 magic;   /* file system magic number */
__le32 checksum;/* crc32c(super_block) */
__le32 feature_compat;
__u8 blkszbits; /* support block_size == PAGE_SIZE only */
-   __u8 reserved;
+   __u8 sb_extslots;   /* superblock size = 128 + sb_extslots * 16 */
__le16 root_nid;/* nid of root directory */
__le64 inos;/* total valid ino # (== f_files - f_favail) */
@@ -39,8 +40,12 @@ struct erofs_super_block {
__u8 uuid[16];  /* 128-bit uuid for volume */
__u8 volume_name[16];   /* volume name */
__le32 feature_incompat;
-   /* customized lz4 sliding window size instead of 64k by default */
-   __le16 lz4_max_distance;
+   union {
+   /* bitmap for available compression algorithms */
+   __le16 available_compr_algs;
+   /* customized sliding window size instead of 64k by default */
+   __le16 lz4_max_distance;
+   } __packed u1;
__u8 reserved2[42];
   };
@@ -196,6 +201,7 @@ enum {
Z_EROFS_COMPRESSION_LZ4 = 0,
Z_EROFS_COMPRESSION_MAX
   };
+#define Z_EROFS_ALL_COMPR_ALGS (1 << (Z_EROFS_COMPRESSION_MAX - 1))
   /* 14 bytes (+ length field = 16 bytes) */
   struct z_erofs_lz4_cfgs {
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 46b977f348eb..f3fa895d809f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -75,6 +75,7 @@ struct erofs_sb_info {
struct xarray managed_pslots;
unsigned int shrinker_run_no;
+   u16 available_compr_algs;
/* pseudo inode to manage cached pages */
struct inode *managed_cache;
@@ -90,6 +91,7 @@ struct erofs_sb_info {
/* inode slot unit size in bit shift */
unsigned char islotbits;
+   u32 sb_size;/* total superblock size */
u32 build_time_nsec;
u64 build_time;
@@ -233,6 +235,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info 
*sbi) \
   }
   EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING)
+EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS)
   EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
   /* atomic flag definitions */
@@ -454,7 +457,7 @@ static inline int z_erofs_load_lz4_config(struct 
super_block *sb,
  struct erofs_super_block *dsb,
  struct z_erofs_lz4_cfgs *lz4, int len)
   {
-   if (lz4 || dsb->lz4_max_distance) {
+   if (lz4 || dsb->u1.lz4_max_distance) {
erofs_err(sb, "lz4 algorithm isn't enabled");
return -EINVAL;
}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 1ca8da3f2125..628c751634fe 100644
--- a/

Re: [PATCH v2 4/4] erofs: add on-disk compression configurations

2021-03-28 Thread Chao Yu

  return true;
  }
  
+#ifdef CONFIG_EROFS_FS_ZIP

+/* read variable-sized metadata, offset will be aligned by 4-byte */
+static void *erofs_read_metadata(struct super_block *sb, struct page **pagep,
+erofs_off_t *offset, int *lengthp)
+{
+   struct page *page = *pagep;
+   u8 *buffer, *ptr;
+   int len, i, cnt;
+   erofs_blk_t blk;
+
+   *offset = round_up(*offset, 4);
+   blk = erofs_blknr(*offset);
+
+   if (!page || page->index != blk) {
+   if (page) {
+   unlock_page(page);
+   put_page(page);
+   }
+   page = erofs_get_meta_page(sb, blk);
+   if (IS_ERR(page))
+   goto err_nullpage;
+   }
+
+   ptr = kmap(page);
+   len = le16_to_cpu(*(__le16 *)&ptr[erofs_blkoff(*offset)]);
+   if (!len)
+   len = U16_MAX + 1;
+   buffer = kmalloc(len, GFP_KERNEL);
+   if (!buffer) {
+   buffer = ERR_PTR(-ENOMEM);
+   goto out;
+   }
+   *offset += sizeof(__le16);
+   *lengthp = len;
+
+   for (i = 0; i < len; i += cnt) {
+   cnt = min(EROFS_BLKSIZ - (int)erofs_blkoff(*offset), len - i);
+   blk = erofs_blknr(*offset);
+
+   if (!page || page->index != blk) {
+   if (page) {
+   kunmap(page);
+   unlock_page(page);
+   put_page(page);
+   }
+   page = erofs_get_meta_page(sb, blk);
+   if (IS_ERR(page)) {
+   kfree(buffer);
+   goto err_nullpage;
+   }
+   ptr = kmap(page);
+   }
+   memcpy(buffer + i, ptr + erofs_blkoff(*offset), cnt);
+   *offset += cnt;
+   }
+out:
+   kunmap(page);
+   *pagep = page;
+   return buffer;
+err_nullpage:
+   *pagep = NULL;
+   return page;
+}
+
+static int erofs_load_compr_cfgs(struct super_block *sb,
+struct erofs_super_block *dsb)
+{
+   struct erofs_sb_info *sbi;
+   struct page *page;
+   unsigned int algs, alg;
+   erofs_off_t offset;
+   int size, ret;
+
+   sbi = EROFS_SB(sb);
+   sbi->available_compr_algs = le16_to_cpu(dsb->u1.available_compr_algs);
+
+   if (sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS) {
+   erofs_err(sb,
+"try to load compressed image with unsupported algorithms %x",


Minor style issue:

"try to load compressed image with unsupported "
"algorithms %x",
sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS);


+ sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS);
+   return -EINVAL;
+   }
+
+   offset = EROFS_SUPER_OFFSET + sbi->sb_size;
+   page = NULL;
+   alg = 0;
+   ret = 0;
+
+   for (algs = sbi->available_compr_algs; algs; algs >>= 1, ++alg) {
+   void *data;
+
+   if (!(algs & 1))
+   continue;
+
+   data = erofs_read_metadata(sb, &page, &offset, &size);
+   if (IS_ERR(data)) {
+   ret = PTR_ERR(data);
+   goto err;
+   }
+
+   switch (alg) {
+   case Z_EROFS_COMPRESSION_LZ4:
+   ret = z_erofs_load_lz4_config(sb, dsb, data, size);
+   break;
+   default:
+   DBG_BUGON(1);
+   ret = -EFAULT;
+   }
+   kfree(data);
+   if (ret)
+   goto err;
+   }
+err:
+   if (page) {
+   unlock_page(page);
+   put_page(page);
+   }
+   return ret;
+}
+#else
+static int erofs_load_compr_cfgs(struct super_block *sb,
+struct erofs_super_block *dsb)
+{
+   if (dsb->u1.available_compr_algs) {
+   erofs_err(sb,
+"try to load compressed image when compression is disabled");


Ditto,
erofs_err(sb, "try to load compressed image when "
  "compression is disabled");



+   return -EINVAL;
+   }
+   return 0;
+}
+#endif
+
  static int erofs_read_superblock(struct super_block *sb)
  {
struct erofs_sb_info *sbi;
@@ -166,6 +298,12 @@ static int erofs_read_superblock(struct super_block *sb)
if (!check_layout_compatibility(sb, dsb))
goto out;
  
+	sbi->sb_size = 128 + dsb->sb_extslots * 16;


sbi->sb_size = sizeof(struct erofs_super_block) +
dsb->sb_e

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-27 Thread Chao Yu


On 2021/3/27 9:52, Chao Yu wrote:

On 2021/3/27 1:30, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/26 9:19, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/25 9:59, Chao Yu wrote:

On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.


In that case, mount(2) will try readonly, no?


Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.


Any other concern about this patch?


Indeed we're talking about different things. :)

This case is mount(ro) with device(ro) having some data to recover.
My point is why not giving a chance to mount(ro) to show the current data
covered by a valid checkpoint. This doesn't change anything in the disk,

Got your idea.

IMO, it has potential issue in above condition:


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


e.g.

Recovery writes one inode and then triggers a checkpoint, all writes fail


I'm confused. Currently we don't trigger the roll-forward recovery.


Oh, my miss, sorry. :-P

My point is in this condition we can return error and try to notice user to
mount with disable_roll_forward or norecovery option, then at least user can
know he should not expect last fsynced data in newly mounted image.

Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data,
if there is no such data, then let mount() succeed.


Something like this, maybe:

---
 fs/f2fs/super.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 954

[PATCH] f2fs: introduce gc_merge mount option

2021-03-27 Thread Chao Yu

In this patch, we will add two new mount options: "gc_merge" and
"nogc_merge", when background_gc is on, "gc_merge" option can be
set to let background GC thread to handle foreground GC requests,
it can eliminate the sluggish issue caused by slow foreground GC
operation when GC is triggered from a process with limited I/O
and CPU resources.

Original idea is from Xiang.

Signed-off-by: Gao Xiang 
Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.rst |  6 ++
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 26 ++
 fs/f2fs/gc.h   |  6 ++
 fs/f2fs/segment.c  | 15 +--
 fs/f2fs/super.c| 19 +--
 6 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index 35ed01a5fbc9..63c0c49b726d 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -110,6 +110,12 @@ background_gc=%sTurn on/off cleaning operations, 
namely garbage
 on synchronous garbage collection running in 
background.
 Default value for this option is on. So garbage
 collection is on by default.
+gc_mergeWhen background_gc is on, this option can be enabled to
+let background GC thread to handle foreground GC 
requests,
+it can eliminate the sluggish issue caused by slow 
foreground
+GC operation when GC is triggered from a process with 
limited
+I/O and CPU resources.
+nogc_merge  Disable GC merge feature.
 disable_roll_forwardDisable the roll-forward recovery routine
 norecovery  Disable the roll-forward recovery routine, mounted 
read-
 only (i.e., -o ro,disable_roll_forward)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index fe380bcf8d4d..87d734f5589d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -97,6 +97,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
 #define F2FS_MOUNT_NORECOVERY  0x0400
 #define F2FS_MOUNT_ATGC0x0800
 #define F2FS_MOUNT_MERGE_CHECKPOINT0x1000
+#defineF2FS_MOUNT_GC_MERGE 0x2000
 
 #define F2FS_OPTION(sbi)   ((sbi)->mount_opt)
 #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a2ca483f9855..5c48825fd12d 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -31,19 +31,24 @@ static int gc_thread_func(void *data)
struct f2fs_sb_info *sbi = data;
struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head;
+   wait_queue_head_t *fggc_wq = &sbi->gc_thread->fggc_wq;
unsigned int wait_ms;
 
wait_ms = gc_th->min_sleep_time;
 
set_freezable();
do {
-   bool sync_mode;
+   bool sync_mode, foreground = false;
 
wait_event_interruptible_timeout(*wq,
kthread_should_stop() || freezing(current) ||
+   waitqueue_active(fggc_wq) ||
gc_th->gc_wake,
msecs_to_jiffies(wait_ms));
 
+   if (test_opt(sbi, GC_MERGE) && waitqueue_active(fggc_wq))
+   foreground = true;
+
/* give it a try one time */
if (gc_th->gc_wake)
gc_th->gc_wake = 0;
@@ -90,7 +95,10 @@ static int gc_thread_func(void *data)
goto do_gc;
}
 
-   if (!down_write_trylock(&sbi->gc_lock)) {
+   if (foreground) {
+   down_write(&sbi->gc_lock);
+   goto do_gc;
+   } else if (!down_write_trylock(&sbi->gc_lock)) {
stat_other_skip_bggc_count(sbi);
goto next;
}
@@ -107,14 +115,22 @@ static int gc_thread_func(void *data)
else
increase_sleep_time(gc_th, &wait_ms);
 do_gc:
-   stat_inc_bggc_count(sbi->stat_info);
+   if (!foreground)
+   stat_inc_bggc_count(sbi->stat_info);
 
sync_mode = F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_SYNC;
 
+   /* foreground GC was been triggered via f2fs_balance_fs() */
+   if (foreground)
+   sync_mode = false;
+
/* if return value is not zero, no victim was selected */
-   if (f2fs_gc(sbi, sync_mode, true, false, NULL_SEGNO))
+   if (f2fs_gc(sbi, sync_mode, !foreground,

Re: [PATCH 4/4] erofs: add on-disk compression configurations

2021-03-27 Thread Chao Yu


On 2021/3/27 11:49, Gao Xiang wrote:

From: Gao Xiang 

Add a bitmap for available compression algorithms and a variable-sized
on-disk table for compression options in preparation for upcoming big
pcluster and LZMA algorithm, which follows the end of super block.

To parse the compression options, the bitmap is scanned one by one.
For each available algorithm, there is data followed by 2-byte `length'
correspondingly (it's enough for most cases, or entire fs blocks should
be used.)

With such available algorithm bitmap, kernel itself can also refuse to
mount such filesystem if any unsupported compression algorithm exists.

Signed-off-by: Gao Xiang 
---
  fs/erofs/decompressor.c |   2 +-
  fs/erofs/erofs_fs.h |  16 +++--
  fs/erofs/internal.h |   5 +-
  fs/erofs/super.c| 145 +++-
  4 files changed, 161 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 97538ff24a19..27aa6a99b371 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -41,7 +41,7 @@ int z_erofs_load_lz4_config(struct super_block *sb,
}
distance = le16_to_cpu(lz4->max_distance);
} else {
-   distance = le16_to_cpu(dsb->lz4_max_distance);
+   distance = le16_to_cpu(dsb->u1.lz4_max_distance);
}
  
  	EROFS_SB(sb)->lz4.max_distance_pages = distance ?

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 1322ae63944b..ef3f8a99aa5f 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -18,15 +18,18 @@
   * be incompatible with this kernel version.
   */
  #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING   0x0001
-#define EROFS_ALL_FEATURE_INCOMPAT 
EROFS_FEATURE_INCOMPAT_LZ4_0PADDING
+#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS  0x0002
+#define EROFS_ALL_FEATURE_INCOMPAT \
+   (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \
+EROFS_FEATURE_INCOMPAT_COMPR_CFGS)
  
-/* 128-byte erofs on-disk super block */

+/* erofs on-disk super block (currently 128 bytes) */
  struct erofs_super_block {
__le32 magic;   /* file system magic number */
__le32 checksum;/* crc32c(super_block) */
__le32 feature_compat;
__u8 blkszbits; /* support block_size == PAGE_SIZE only */
-   __u8 reserved;
+   __u8 sb_extslots;   /* superblock size = 128 + sb_extslots * 16 */
  
  	__le16 root_nid;	/* nid of root directory */

__le64 inos;/* total valid ino # (== f_files - f_favail) */
@@ -39,7 +42,11 @@ struct erofs_super_block {
__u8 uuid[16];  /* 128-bit uuid for volume */
__u8 volume_name[16];   /* volume name */
__le32 feature_incompat;
-   __le16 lz4_max_distance;
+   union {
+   /* bitmap for available compression algorithms */
+   __le16 available_compr_algs;
+   __le16 lz4_max_distance;
+   } __packed u1;
__u8 reserved2[42];
  };
  
@@ -195,6 +202,7 @@ enum {

Z_EROFS_COMPRESSION_LZ4 = 0,
Z_EROFS_COMPRESSION_MAX
  };
+#define Z_EROFS_ALL_COMPR_ALGS (1 << (Z_EROFS_COMPRESSION_MAX - 1))
  
  /* 14 bytes (+ length field = 16 bytes) */

  struct z_erofs_lz4_cfgs {
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 46b977f348eb..f3fa895d809f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -75,6 +75,7 @@ struct erofs_sb_info {
struct xarray managed_pslots;
  
  	unsigned int shrinker_run_no;

+   u16 available_compr_algs;
  
  	/* pseudo inode to manage cached pages */

struct inode *managed_cache;
@@ -90,6 +91,7 @@ struct erofs_sb_info {
/* inode slot unit size in bit shift */
unsigned char islotbits;
  
+	u32 sb_size;			/* total superblock size */

u32 build_time_nsec;
u64 build_time;
  
@@ -233,6 +235,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \

  }
  
  EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING)

+EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS)
  EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
  
  /* atomic flag definitions */

@@ -454,7 +457,7 @@ static inline int z_erofs_load_lz4_config(struct 
super_block *sb,
  struct erofs_super_block *dsb,
  struct z_erofs_lz4_cfgs *lz4, int len)
  {
-   if (lz4 || dsb->lz4_max_distance) {
+   if (lz4 || dsb->u1.lz4_max_distance) {
erofs_err(sb, "lz4 algorithm isn't enabled");
return -EINVAL;
}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 1ca8da3f2125..c5e3039f51bf 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -122,6 +122,138 @@ static bool check_layout_compatibility(struct super_block 
*sb,
return true;
  }
  
+#ifdef CONFIG_EROFS_FS_ZIP

+/* read variable-sized metadata, offset will be aligned by 4-byte */
+static

Re: [PATCH 3/4] erofs: introduce on-disk lz4 fs configurations

2021-03-27 Thread Chao Yu


On 2021/3/27 11:49, Gao Xiang wrote:

From: Gao Xiang 

Introduce z_erofs_lz4_cfgs to store all lz4 configurations.
Currently it's only max_distance, but will be used for new
features later.

Signed-off-by: Gao Xiang 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH 2/4] erofs: support adjust lz4 history window size

2021-03-27 Thread Chao Yu


On 2021/3/27 11:49, Gao Xiang wrote:

From: Huang Jianan 

lz4 uses LZ4_DISTANCE_MAX to record history preservation. When
using rolling decompression, a block with a higher compression
ratio will cause a larger memory allocation (up to 64k). It may
cause a large resource burden in extreme cases on devices with
small memory and a large number of concurrent IOs. So appropriately
reducing this value can improve performance.

Decreasing this value will reduce the compression ratio (except
when input_size 
Signed-off-by: Guo Weichao 
[ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ]
Signed-off-by: Gao Xiang 
---
  fs/erofs/decompressor.c | 21 +
  fs/erofs/erofs_fs.h |  3 ++-
  fs/erofs/internal.h | 19 +++
  fs/erofs/super.c|  4 +++-
  4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 80e8871aef71..93411e9df9b6 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -28,6 +28,17 @@ struct z_erofs_decompressor {
char *name;
  };
  
+int z_erofs_load_lz4_config(struct super_block *sb,

+   struct erofs_super_block *dsb)
+{
+   u16 distance = le16_to_cpu(dsb->lz4_max_distance);
+
+   EROFS_SB(sb)->lz4.max_distance_pages = distance ?
+   DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
+   LZ4_MAX_DISTANCE_PAGES;
+   return 0;
+}
+
  static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
 struct list_head *pagepool)
  {
@@ -36,6 +47,8 @@ static int z_erofs_lz4_prepare_destpages(struct 
z_erofs_decompress_req *rq,
struct page *availables[LZ4_MAX_DISTANCE_PAGES] = { NULL };
unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
   BITS_PER_LONG)] = { 0 };
+   unsigned int lz4_max_distance_pages =
+   EROFS_SB(rq->sb)->lz4.max_distance_pages;
void *kaddr = NULL;
unsigned int i, j, top;
  
@@ -44,14 +57,14 @@ static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq,

struct page *const page = rq->out[i];
struct page *victim;
  
-		if (j >= LZ4_MAX_DISTANCE_PAGES)

+   if (j >= lz4_max_distance_pages)
j = 0;
  
  		/* 'valid' bounced can only be tested after a complete round */

if (test_bit(j, bounced)) {
-   DBG_BUGON(i < LZ4_MAX_DISTANCE_PAGES);
-   DBG_BUGON(top >= LZ4_MAX_DISTANCE_PAGES);
-   availables[top++] = rq->out[i - LZ4_MAX_DISTANCE_PAGES];
+   DBG_BUGON(i < lz4_max_distance_pages);
+   DBG_BUGON(top >= lz4_max_distance_pages);
+   availables[top++] = rq->out[i - lz4_max_distance_pages];
}
  
  		if (page) {

diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h
index 9ad1615f4474..b27d0e4e4ab5 100644
--- a/fs/erofs/erofs_fs.h
+++ b/fs/erofs/erofs_fs.h
@@ -39,7 +39,8 @@ struct erofs_super_block {
__u8 uuid[16];  /* 128-bit uuid for volume */
__u8 volume_name[16];   /* volume name */
__le32 feature_incompat;
-   __u8 reserved2[44];
+   __le16 lz4_max_distance;


It missed to add comments, otherwise it looks good to me.

Reviewed-by: Chao Yu 

Thanks,


+   __u8 reserved2[42];
  };
  
  /*

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d29fc0c56032..1de60992c3dd 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -59,6 +59,12 @@ struct erofs_fs_context {
unsigned int mount_opt;
  };
  
+/* all filesystem-wide lz4 configurations */

+struct erofs_sb_lz4_info {
+   /* # of pages needed for EROFS lz4 rolling decompression */
+   u16 max_distance_pages;
+};
+
  struct erofs_sb_info {
  #ifdef CONFIG_EROFS_FS_ZIP
/* list for all registered superblocks, mainly for shrinker */
@@ -72,6 +78,8 @@ struct erofs_sb_info {
  
  	/* pseudo inode to manage cached pages */

struct inode *managed_cache;
+
+   struct erofs_sb_lz4_info lz4;
  #endif/* CONFIG_EROFS_FS_ZIP */
u32 blocks;
u32 meta_blkaddr;
@@ -432,6 +440,8 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info 
*sbi,
   struct erofs_workgroup *egrp);
  int erofs_try_to_free_cached_page(struct address_space *mapping,
  struct page *page);
+int z_erofs_load_lz4_config(struct super_block *sb,
+   struct erofs_super_block *dsb);
  #else
  static inline void erofs_shrinker_register(struct super_block *sb) {}
  static inline void erofs_shrinker_unregister(struct super_block *sb) {}
@@ -439,6 +449,15 @@ static inline int erofs_init_

Re: [PATCH 1/4] erofs: introduce erofs_sb_has_xxx() helpers

2021-03-27 Thread Chao Yu


On 2021/3/27 11:49, Gao Xiang wrote:

From: Gao Xiang 

Introduce erofs_sb_has_xxx() to make long checks short, especially
for later big pcluster & LZMA features.

Signed-off-by: Gao Xiang 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-26 Thread Chao Yu


On 2021/3/27 1:30, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/26 9:19, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/25 9:59, Chao Yu wrote:

On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.


In that case, mount(2) will try readonly, no?


Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.


Any other concern about this patch?


Indeed we're talking about different things. :)

This case is mount(ro) with device(ro) having some data to recover.
My point is why not giving a chance to mount(ro) to show the current data
covered by a valid checkpoint. This doesn't change anything in the disk,

Got your idea.

IMO, it has potential issue in above condition:


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


e.g.

Recovery writes one inode and then triggers a checkpoint, all writes fail


I'm confused. Currently we don't trigger the roll-forward recovery.


Oh, my miss, sorry. :-P

My point is in this condition we can return error and try to notice user to
mount with disable_roll_forward or norecovery option, then at least user can
know he should not expect last fsynced data in newly mounted image.

Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data,
if there is no such data, then let mount() succeed.

Thanks,




due to device is readonly, once inode cache is reclaimed by vm, user will see
old inode when reloading it, or even see corrupted fs if partial meta inode's
cache is expired.

Thoughts?

Thanks,


and in the

[PATCH] f2fs: delete empty compress.h

2021-03-26 Thread Chao Yu

From: Chao Yu 

Commit 75e91c888989 ("f2fs: compress: fix compression chksum")
wrongly introduced empty compress.h, delete it.

Signed-off-by: Chao Yu 
---
 fs/f2fs/compress.h | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 fs/f2fs/compress.h

diff --git a/fs/f2fs/compress.h b/fs/f2fs/compress.h
deleted file mode 100644
index e69de29bb2d1..
-- 
2.22.1

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-03-26 Thread Chao Yu


On 2021/3/25 7:49, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:

1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.

Previously, we only check valid chpt blocks in current segment rather
than section, fix it.

2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:

  a) target segment has 128 ckpt valid blocks
  b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is
 still in dirty list)


I missed to change 128 to n, so comments should be updated as below:

  a) target segment has 'n' (n < 512) ckpt valid blocks
  b) GC migrates 'n' valid blocks to other segment (segment is still
 in dirty list)

Thanks,


  c) GC migrates '512 - n' blocks to target segment (segment has 'n'
 cp_vblocks and '512 - n' vblocks)
  d) If GC selects target segment via {AT,}SSR allocator, however there
 is no free space in targe segment.

[PATCH] f2fs: fix to cover __allocate_new_section() with curseg_lock

2021-03-25 Thread Chao Yu

In order to avoid race with f2fs_do_replace_block().

Fixes: f5a53edcf01e ("f2fs: support aligned pinned file")
Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index e533192545b2..f1b0e832e7c2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2957,19 +2957,23 @@ static void __allocate_new_section(struct f2fs_sb_info 
*sbi, int type)
 
 void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type)
 {
+   down_read(&SM_I(sbi)->curseg_lock);
down_write(&SIT_I(sbi)->sentry_lock);
__allocate_new_section(sbi, type);
up_write(&SIT_I(sbi)->sentry_lock);
+   up_read(&SM_I(sbi)->curseg_lock);
 }
 
 void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
 {
int i;
 
+   down_read(&SM_I(sbi)->curseg_lock);
down_write(&SIT_I(sbi)->sentry_lock);
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++)
__allocate_new_segment(sbi, i, false);
up_write(&SIT_I(sbi)->sentry_lock);
+   up_read(&SM_I(sbi)->curseg_lock);
 }
 
 static const struct segment_allocation default_salloc_ops = {
-- 
2.29.2

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-25 Thread Chao Yu

On 2021/3/26 9:19, Jaegeuk Kim wrote:

On 03/26, Chao Yu wrote:

On 2021/3/25 9:59, Chao Yu wrote:

On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.

I think we need to change generic/050, since f2fs can recover this partition,

Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}

even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.

if (f2fs_hw_is_readonly(sbi)) {

Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.

My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.

I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.

In that case, mount(2) will try readonly, no?

Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.

Any other concern about this patch?

Indeed we're talking about different things. :)

This case is mount(ro) with device(ro) having some data to recover.
My point is why not giving a chance to mount(ro) to show the current data
covered by a valid checkpoint. This doesn't change anything in the disk,

Got your idea.

IMO, it has potential issue in above condition:

>>>>>>> Since device is readonly now, all write to the device will fail, 
checkpoint can
>>>>>>> not persist recovered data, after page cache is expired, user can see 
stale data.

e.g.

Recovery writes one inode and then triggers a checkpoint, all writes fail
due to device is readonly, once inode cache is reclaimed by vm, user will see
old inode when reloading it, or even see corrupted fs if partial meta inode's
cache is expired.

Thoughts?

Thanks,

and in the next time, it allows mount(rw|ro) with device(rw) to recover
the data seamlessly.

Thanks,

Thanks,

# blockdev --setro /dev/vdb
# mount -t f2fs /dev/vdb /mnt/test/
mount: /mnt/test: WARNING: source write-protected, mounted read-only.

if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   err = -EROFS;
+   if (!is_

Re: [f2fs-dev] [PATCH -next] f2fs: fix a typo in inode.c

2021-03-25 Thread Chao Yu


On 2021/3/25 14:38, Ruiqi Gong wrote:

Do a trivial typo fix.
s/runing/running

Reported-by: Hulk Robot 
Signed-off-by: Ruiqi Gong 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-25 Thread Chao Yu


On 2021/3/25 9:59, Chao Yu wrote:

On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.


In that case, mount(2) will try readonly, no?


Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.


Any other concern about this patch?

Thanks,



Thanks,



# blockdev --setro /dev/vdb
# mount -t f2fs /dev/vdb /mnt/test/
mount: /mnt/test: WARNING: source write-protected, mounted read-only.



if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   err = -EROFS;
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
f2fs_err(sbi, "Need to recover fsync data, but write 
access unavailable");
-   goto free_meta;
-   }
-   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
+   else
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}

For the case of filesystem is readonly and device is writable, it's fine
to do recovery in order to let user to see fsynced data.

Thanks,





Am I missing something?

Thanks,





Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b48281642e98..2b78ee11f093 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-24 Thread Chao Yu


On 2021/3/25 6:44, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.


In that case, mount(2) will try readonly, no?


Yes, if device is readonly, mount (2) can not mount/remount device to rw
mountpoint.

Thanks,



# blockdev --setro /dev/vdb
# mount -t f2fs /dev/vdb /mnt/test/
mount: /mnt/test: WARNING: source write-protected, mounted read-only.



if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   err = -EROFS;
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
f2fs_err(sbi, "Need to recover fsync data, but write 
access unavailable");
-   goto free_meta;
-   }
-   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
+   else
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}

For the case of filesystem is readonly and device is writable, it's fine
to do recovery in order to let user to see fsynced data.

Thanks,





Am I missing something?

Thanks,





Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
fs/f2fs/super.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b48281642e98..2b78ee11f093 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct supe

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-03-24 Thread Chao Yu


On 2021/3/25 7:49, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:

1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.

Previously, we only check valid chpt blocks in current segment rather
than section, fix it.

2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:

  a) target segment has 128 ckpt valid blocks
  b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is
 still in dirty list)
  c) GC migrates '512 - n' blocks to target segment (segment has 'n'
 cp_vblocks and '512 - n' vblocks)
  d) If GC selects target segment via {AT,}SSR allocator, however there
 is no free space in targe segment.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu 
---
v2:
- fix to check checkpointed data in section rather than segment for
LFS mode.
- update commit title and message.
  fs/f2fs/f2fs.h|  1 +
  fs/f2fs/gc.c  | 28 
  fs/f2fs/segment.c | 39 ---
  fs/f2fs/segment.h | 14 +-
  4 files changed, 58 insertions(+), 24 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index eb154d9cb063..29e634d08a27 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
*sbi);
  int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
  void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
  void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
  void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
  void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d96acc6531f2..4d9616373a4a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
if (p->gc_mode == GC_AT &&
get_valid_blocks(sbi, segno, true) == 0)
return;
-
-   if (p->alloc_mode == AT_SSR &&
-   get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
-   return;
}
  
  	for (i = 0; i < sbi->segs_per_sec; i++)

@@ -728,11 +724,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
  
  		if (sec_usage_check(sbi, secno))

goto next;
+
/* Don't touch checkpointed data */
-   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
-   get_ckpt_valid_blocks(sbi, segno) &&
-   p.alloc_mode == LFS))
-   goto next;
+   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+   if (p.alloc_mode == LFS) {
+   /*
+* LFS is set to find source section during GC.
+* The victim should have no checkpointed data.
+*/
+   if (get_ckpt_valid_blocks(sbi, segno, true))
+   goto next;
+   } else {
+   /*
+* SSR | AT_SSR are set to find target segment
+* for writes which can be full by checkpointed
+* and newly written blocks.
+*/
+   if (!segment_has_free_slot(sbi, segno))
+   goto next;
+   }
+   }
+
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
  
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c

index 6e1a5f5657bf..f6a30856ceda 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, 
unsigned int segno)
mutex_lock(&dirty_i->seglist_lock);
  
  	valid_blocks = get_valid_blocks(sbi, segno, false);

-   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
+   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false);
  
  	if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISA

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-24 Thread Chao Yu


On 2021/3/24 12:22, Jaegeuk Kim wrote:

On 03/24, Chao Yu wrote:

On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.



if (f2fs_hw_is_readonly(sbi)) {


Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.


My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.


I guess we're talking about the different things...

Let me declare two different readonly status:

1. filesystem readonly: file system is mount with ro mount option, and
app from userspace can not modify any thing of filesystem, but filesystem
itself can modify data on device since device may be writable.

2. device readonly: device is set to readonly status via 'blockdev --setro'
command, and then filesystem should never issue any write IO to the device.

So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
f2fs_readonly() returns true), in this condition, we should not issue any
write IO to device anyway, because, AFAIK, write IO will fail due to
bio_check_ro() check.

if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   err = -EROFS;
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
f2fs_err(sbi, "Need to recover fsync data, but write 
access unavailable");
-   goto free_meta;
-   }
-   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
+   else
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}

For the case of filesystem is readonly and device is writable, it's fine
to do recovery in order to let user to see fsynced data.

Thanks,





Am I missing something?

Thanks,





Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
   fs/f2fs/super.c | 8 +---
   1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b48281642e98..2b78ee11f093 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 * previous checkpoint was not done by clean system shutdown.
 */
if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
+   err = -EROFS;

Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator

2021-03-23 Thread Chao Yu


On 2021/3/24 6:59, Jaegeuk Kim wrote:

On 03/19, Chao Yu wrote:

On 2021/3/19 1:17, Jaegeuk Kim wrote:

On 02/20, Chao Yu wrote:

In cp disabling mode, there could be a condition
- target segment has 128 ckpt valid blocks
- GC migrates 128 valid blocks to other segment (segment is still in
dirty list)
- GC migrates 384 blocks to target segment (segment has 128 cp_vblocks
and 384 vblocks)
- If GC selects target segment via {AT,}SSR allocator, however there is
no free space in targe segment.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu 
---
   fs/f2fs/f2fs.h|  1 +
   fs/f2fs/gc.c  | 17 +
   fs/f2fs/segment.c | 20 
   3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ed7807103c8e..9c753eff0814 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
*sbi);
   int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
   void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
   int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
   void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
   void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
   void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 86ba8ed0b8a7..a1d8062cdace 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
if (p->gc_mode == GC_AT &&
get_valid_blocks(sbi, segno, true) == 0)
return;
-
-   if (p->alloc_mode == AT_SSR &&
-   get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
-   return;
}
for (i = 0; i < sbi->segs_per_sec; i++)
@@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
+   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+   /*
+* to avoid selecting candidate which has below valid
+* block distribution:
+* partial blocks are valid and all left ones are valid
+* in previous checkpoint.
+*/
+   if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) {
+   if (!segment_has_free_slot(sbi, segno))
+   goto next;


Do we need to change this to check free_slot instead of get_ckpt_valid_blocks()?


Jaegeuk,

LFS was assigned only for GC case, in this case we are trying to select source
section, rather than target segment for SSR/AT_SSR case, so we don't need to
check free_slot.

- f2fs_gc
  - __get_victim
   - get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS, 0);



   732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
   733 get_ckpt_valid_blocks(sbi, segno) 
&&
   734 p.alloc_mode == LFS))


BTW, in LFS mode, GC wants to find source section rather than segment, so we
should change to check valid ckpt blocks in every segment of targe section here?


Alright. I refactored a bit on this patch with new one. Could you please take a 
look?

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=00152bd7cabd69b4615ebead823ff23887b0e0f7


I see, newly added comment looks good to me.

One more concern is commit title and commit message is out-of-update,
I've revised it in v2:

https://lore.kernel.org/linux-f2fs-devel/20210324031828.67133-1-yuch...@huawei.com/T/#u

Thanks,



Thanks,



Thanks,





+   }
+   }
+
if (is_atgc) {
add_victim_entry(sbi, &p, segno);
goto next;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2d5a82c4ca15..deaf57e13125 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
*sbi,
seg->next_blkoff++;
   }
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
+{
+   struct sit_info *sit = SIT_I(sbi);
+   struct seg_entry *se = get_seg_entry(sbi, segno);
+   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
+   unsigned long *target_map = SIT_I(sbi)->tmp_map;
+   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
+   unsigned long *cur_map = (unsigned

[PATCH] f2fs: fix to update last i_size if fallocate partially succeeds

2021-03-23 Thread Chao Yu

In the case of expanding pinned file, map.m_lblk and map.m_len
will update in each round of section allocation, so in error
path, last i_size will be calculated with wrong m_lblk and m_len,
fix it.

Fixes: f5a53edcf01e ("f2fs: support aligned pinned file")
Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index bd5a77091d23..dc79694e512c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1619,9 +1619,10 @@ static int expand_inode_data(struct inode *inode, loff_t 
offset,
struct f2fs_map_blocks map = { .m_next_pgofs = NULL,
.m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE,
.m_may_create = true };
-   pgoff_t pg_end;
+   pgoff_t pg_start, pg_end;
loff_t new_size = i_size_read(inode);
loff_t off_end;
+   block_t expanded = 0;
int err;
 
err = inode_newsize_ok(inode, (len + offset));
@@ -1634,11 +1635,12 @@ static int expand_inode_data(struct inode *inode, 
loff_t offset,
 
f2fs_balance_fs(sbi, true);
 
+   pg_start = ((unsigned long long)offset) >> PAGE_SHIFT;
pg_end = ((unsigned long long)offset + len) >> PAGE_SHIFT;
off_end = (offset + len) & (PAGE_SIZE - 1);
 
-   map.m_lblk = ((unsigned long long)offset) >> PAGE_SHIFT;
-   map.m_len = pg_end - map.m_lblk;
+   map.m_lblk = pg_start;
+   map.m_len = pg_end - pg_start;
if (off_end)
map.m_len++;
 
@@ -1648,7 +1650,6 @@ static int expand_inode_data(struct inode *inode, loff_t 
offset,
if (f2fs_is_pinned_file(inode)) {
block_t sec_blks = BLKS_PER_SEC(sbi);
block_t sec_len = roundup(map.m_len, sec_blks);
-   block_t done = 0;
 
map.m_len = sec_blks;
 next_alloc:
@@ -1656,10 +1657,8 @@ static int expand_inode_data(struct inode *inode, loff_t 
offset,
GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi {
down_write(&sbi->gc_lock);
err = f2fs_gc(sbi, true, false, false, NULL_SEGNO);
-   if (err && err != -ENODATA && err != -EAGAIN) {
-   map.m_len = done;
+   if (err && err != -ENODATA && err != -EAGAIN)
goto out_err;
-   }
}
 
down_write(&sbi->pin_sem);
@@ -1673,24 +1672,25 @@ static int expand_inode_data(struct inode *inode, 
loff_t offset,
 
up_write(&sbi->pin_sem);
 
-   done += map.m_len;
+   expanded += map.m_len;
sec_len -= map.m_len;
map.m_lblk += map.m_len;
if (!err && sec_len)
goto next_alloc;
 
-   map.m_len = done;
+   map.m_len = expanded;
} else {
err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO);
+   expanded = map.m_len;
}
 out_err:
if (err) {
pgoff_t last_off;
 
-   if (!map.m_len)
+   if (!expanded)
return err;
 
-   last_off = map.m_lblk + map.m_len - 1;
+   last_off = pg_start + expanded - 1;
 
/* update new size to the failed position */
new_size = (last_off == pg_end) ? offset + len :
-- 
2.29.2

[PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-03-23 Thread Chao Yu

In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
mode to select victim:

1. LFS is set to find source section during GC, the victim should have
no checkpointed data, since after GC, section could not be set free for
reuse.

Previously, we only check valid chpt blocks in current segment rather
than section, fix it.

2. SSR | AT_SSR are set to find target segment for writes which can be
fully filled by checkpointed and newly written blocks, we should never
select such segment, otherwise it can cause panic or data corruption
during allocation, potential case is described as below:

 a) target segment has 128 ckpt valid blocks
 b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is
still in dirty list)
 c) GC migrates '512 - n' blocks to target segment (segment has 'n'
cp_vblocks and '512 - n' vblocks)
 d) If GC selects target segment via {AT,}SSR allocator, however there
is no free space in targe segment.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu 
---
v2:
- fix to check checkpointed data in section rather than segment for
LFS mode.
- update commit title and message.
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/gc.c  | 28 
 fs/f2fs/segment.c | 39 ---
 fs/f2fs/segment.h | 14 +-
 4 files changed, 58 insertions(+), 24 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index eb154d9cb063..29e634d08a27 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
*sbi);
 int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
 void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
 int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
 void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
 void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
 void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d96acc6531f2..4d9616373a4a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
if (p->gc_mode == GC_AT &&
get_valid_blocks(sbi, segno, true) == 0)
return;
-
-   if (p->alloc_mode == AT_SSR &&
-   get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
-   return;
}
 
for (i = 0; i < sbi->segs_per_sec; i++)
@@ -728,11 +724,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
 
if (sec_usage_check(sbi, secno))
goto next;
+
/* Don't touch checkpointed data */
-   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
-   get_ckpt_valid_blocks(sbi, segno) &&
-   p.alloc_mode == LFS))
-   goto next;
+   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+   if (p.alloc_mode == LFS) {
+   /*
+* LFS is set to find source section during GC.
+* The victim should have no checkpointed data.
+*/
+   if (get_ckpt_valid_blocks(sbi, segno, true))
+   goto next;
+   } else {
+   /*
+* SSR | AT_SSR are set to find target segment
+* for writes which can be full by checkpointed
+* and newly written blocks.
+*/
+   if (!segment_has_free_slot(sbi, segno))
+   goto next;
+   }
+   }
+
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6e1a5f5657bf..f6a30856ceda 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, 
unsigned int segno)
mutex_lock(&dirty_i->seglist_lock);
 
valid_blocks = get_valid_blocks(sbi, segno, false);
-   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
+   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false);
 
if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
ckpt_valid_blocks == usable_blocks)) {
@@

Re: [PATCH 2/2] f2fs: fix error path of f2fs_remount()

2021-03-23 Thread Chao Yu


Ping,

On 2021/3/17 17:56, Chao Yu wrote:

In error path of f2fs_remount(), it missed to restart/stop kernel thread
or enable/disable checkpoint, then mount option status may not be
consistent with real condition of filesystem, so let's reorder remount
flow a bit as below and do recovery correctly in error path:

1) handle gc thread
2) handle ckpt thread
3) handle flush thread
4) handle checkpoint disabling

Signed-off-by: Chao Yu 
---
  fs/f2fs/super.c | 47 ++-
  1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 6716af216dca..fa60f08c30bb 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1942,8 +1942,9 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
struct f2fs_mount_info org_mount_opt;
unsigned long old_sb_flags;
int err;
-   bool need_restart_gc = false;
-   bool need_stop_gc = false;
+   bool need_restart_gc = false, need_stop_gc = false;
+   bool need_restart_ckpt = false, need_stop_ckpt = false;
+   bool need_restart_flush = false, need_stop_flush = false;
bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT);
bool no_io_align = !F2FS_IO_ALIGNED(sbi);
@@ -2081,19 +2082,10 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
clear_sbi_flag(sbi, SBI_IS_CLOSE);
}
  
-	if (checkpoint_changed) {

-   if (test_opt(sbi, DISABLE_CHECKPOINT)) {
-   err = f2fs_disable_checkpoint(sbi);
-   if (err)
-   goto restore_gc;
-   } else {
-   f2fs_enable_checkpoint(sbi);
-   }
-   }
-
if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) ||
!test_opt(sbi, MERGE_CHECKPOINT)) {
f2fs_stop_ckpt_thread(sbi);
+   need_restart_ckpt = true;
} else {
err = f2fs_start_ckpt_thread(sbi);
if (err) {
@@ -2102,6 +2094,7 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
err);
goto restore_gc;
}
+   need_stop_ckpt = true;
}
  
  	/*

@@ -2111,11 +2104,24 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
if ((*flags & SB_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) {
clear_opt(sbi, FLUSH_MERGE);
f2fs_destroy_flush_cmd_control(sbi, false);
+   need_restart_flush = true;
} else {
err = f2fs_create_flush_cmd_control(sbi);
if (err)
-   goto restore_gc;
+   goto restore_ckpt;
+   need_stop_flush = true;
}
+
+   if (checkpoint_changed) {
+   if (test_opt(sbi, DISABLE_CHECKPOINT)) {
+   err = f2fs_disable_checkpoint(sbi);
+   if (err)
+   goto restore_flush;
+   } else {
+   f2fs_enable_checkpoint(sbi);
+   }
+   }
+
  skip:
  #ifdef CONFIG_QUOTA
/* Release old quota file names */
@@ -2130,6 +2136,21 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
adjust_unusable_cap_perc(sbi);
*flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME);
return 0;
+restore_flush:
+   if (need_restart_flush) {
+   if (f2fs_create_flush_cmd_control(sbi))
+   f2fs_warn(sbi, "background flush thread has stopped");
+   } else if (need_stop_flush) {
+   clear_opt(sbi, FLUSH_MERGE);
+   f2fs_destroy_flush_cmd_control(sbi, false);
+   }
+restore_ckpt:
+   if (need_restart_ckpt) {
+   if (f2fs_start_ckpt_thread(sbi))
+   f2fs_warn(sbi, "background ckpt thread has stopped");
+   } else if (need_stop_ckpt) {
+   f2fs_stop_ckpt_thread(sbi);
+   }
  restore_gc:
if (need_restart_gc) {
if (f2fs_start_gc_thread(sbi))

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-23 Thread Chao Yu


On 2021/3/24 2:39, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.


I think we need to change generic/050, since f2fs can recover this partition,


Well, not sure we can change that testcase, since it restricts all generic
filesystems behavior. At least, ext4's behavior makes sense to me:

journal_dev_ro = bdev_read_only(journal->j_dev);
really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;

if (journal_dev_ro && !sb_rdonly(sb)) {
ext4_msg(sb, KERN_ERR,
 "journal device read-only, try mounting with '-o ro'");
err = -EROFS;
goto err_out;
}

if (ext4_has_feature_journal_needs_recovery(sb)) {
if (sb_rdonly(sb)) {
ext4_msg(sb, KERN_INFO, "INFO: recovery "
"required on readonly filesystem");
if (really_read_only) {
ext4_msg(sb, KERN_ERR, "write access "
"unavailable, cannot proceed "
"(try mounting with noload)");
err = -EROFS;
goto err_out;
}
ext4_msg(sb, KERN_INFO, "write access will "
   "be enabled during recovery");
}
}


even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.


>>if (f2fs_hw_is_readonly(sbi)) {

Since device is readonly now, all write to the device will fail, checkpoint can
not persist recovered data, after page cache is expired, user can see stale 
data.

Am I missing something?

Thanks,





Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
  fs/f2fs/super.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b48281642e98..2b78ee11f093 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 * previous checkpoint was not done by clean system shutdown.
 */
if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
+   err = -EROFS;
f2fs_err(sbi, "Need to recover fsync data, but write 
access unavailable");
-   else
-   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
+   goto free_meta;
+   }
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}
  
--

2.29.2

.

Re: [f2fs-dev] [PATCH -next] f2fs: fix wrong comment of nat_tree_lock

2021-03-23 Thread Chao Yu


On 2021/3/23 19:41, qiulaibin wrote:

Do trivial comment fix of nat_tree_lock.

Signed-off-by: qiulaibin 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH] f2fs: fix to align to section for fallocate() on pinned file

2021-03-23 Thread Chao Yu


On 2021/3/24 2:32, Jaegeuk Kim wrote:

On 03/23, Chao Yu wrote:

On 2021/3/5 17:56, Chao Yu wrote:

Now, fallocate() on a pinned file only allocates blocks which aligns
to segment rather than section, so GC may try to migrate pinned file's
block, and after several times of failure, pinned file's block could
be migrated to other place, however user won't be aware of such
condition, and then old obsolete block address may be readed/written
incorrectly.

To avoid such condition, let's try to allocate pinned file's blocks
with section alignment.

Signed-off-by: Chao Yu 


Jaegeuk,

Could you please check and apply below diff into original patch?

---
  fs/f2fs/file.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 236f3f69681a..24fa68fdcaa0 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1648,13 +1648,13 @@ static int expand_inode_data(struct inode *inode, 
loff_t offset,
return 0;

if (f2fs_is_pinned_file(inode)) {
-   block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
-   sbi->log_blocks_per_seg;
+   block_t sec_blks = BLKS_PER_SEC(sbi);
+   block_t len = rounddown(map.m_len, sec_blks);


len is declared above, so let me rephrase this as well.


Oh, right.





-   if (map.m_len % sbi->blocks_per_seg)
-   len += sbi->blocks_per_seg;
+   if (map.m_len % sec_blks)
+   len += sec_blks;


is this roundup()?


More clean.



Could you check this?
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=e1175f02291141bbd924fc578299305fcde35855


Looks good to me. :)

Thanks,





-   map.m_len = sbi->blocks_per_seg;
+   map.m_len = sec_blks;
  next_alloc:
if (has_not_enough_free_secs(sbi, 0,
GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi {
--
2.22.1


.

Re: [PATCH] f2fs: fix to align to section for fallocate() on pinned file

2021-03-23 Thread Chao Yu


On 2021/3/5 17:56, Chao Yu wrote:

Now, fallocate() on a pinned file only allocates blocks which aligns
to segment rather than section, so GC may try to migrate pinned file's
block, and after several times of failure, pinned file's block could
be migrated to other place, however user won't be aware of such
condition, and then old obsolete block address may be readed/written
incorrectly.

To avoid such condition, let's try to allocate pinned file's blocks
with section alignment.

Signed-off-by: Chao Yu 


Jaegeuk,

Could you please check and apply below diff into original patch?

---
 fs/f2fs/file.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 236f3f69681a..24fa68fdcaa0 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1648,13 +1648,13 @@ static int expand_inode_data(struct inode *inode, 
loff_t offset,
return 0;

if (f2fs_is_pinned_file(inode)) {
-   block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
-   sbi->log_blocks_per_seg;
+   block_t sec_blks = BLKS_PER_SEC(sbi);
+   block_t len = rounddown(map.m_len, sec_blks);

-   if (map.m_len % sbi->blocks_per_seg)
-   len += sbi->blocks_per_seg;
+   if (map.m_len % sec_blks)
+   len += sec_blks;

-   map.m_len = sbi->blocks_per_seg;
+   map.m_len = sec_blks;
 next_alloc:
if (has_not_enough_free_secs(sbi, 0,
GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi {
--
2.22.1

Re: [LTP] [f2fs] 02eb84b96b: ltp.swapon03.fail

2021-03-23 Thread Chao Yu


On 2021/3/11 4:49, Jaegeuk Kim wrote:

On 03/10, Huang Jianan wrote:

Hi Richard,

On 2021/3/9 12:01, Matthew Wilcox wrote:

On Tue, Mar 09, 2021 at 10:23:35AM +0800, Weichao Guo wrote:

Hi Richard,

On 2021/3/8 19:53, Richard Palethorpe wrote:

Hello,


kern  :err   : [  187.461914] F2FS-fs (sda1): Swapfile does not align to section
commit 02eb84b96bc1b382dd138bf60724edbefe77b025
Author: huangjia...@oppo.com 
Date:   Mon Mar 1 12:58:44 2021 +0800
   f2fs: check if swapfile is section-alligned
   If the swapfile isn't created by pin and fallocate, it can't be
   guaranteed section-aligned, so it may be selected by f2fs gc. When
   gc_pin_file_threshold is reached, the address of swapfile may change,
   but won't be synchronized to swap_extent, so swap will write to wrong
   address, which will cause data corruption.
   Signed-off-by: Huang Jianan 
   Signed-off-by: Guo Weichao 
   Reviewed-by: Chao Yu 
   Signed-off-by: Jaegeuk Kim 

The test uses fallocate to preallocate the swap file and writes zeros to
it. I'm not sure what pin refers to?

'pin' refers to pinned file feature in F2FS, the LBA(Logical Block Address)
of a file is fixed after pinned. Without this operation before fallocate,
the LBA may not align with section(F2FS GC unit), some LBA of the file may
be changed by F2FS GC in some extreme cases.

For this test case, how about pin the swap file before fallocate for F2FS as
following:

ioctl(fd, F2FS_IOC_SET_PIN_FILE, true);

No special ioctl should be needed.  f2fs_swap_activate() should pin the
file, just like it converts inline inodes and disables compression.


Now f2fs_swap_activate() will pin the file. The problem is that when
f2fs_swap_activate()

is executed, the file has been created and may not be section-aligned.

So I think it would be better to consider aligning the swapfile during
f2fs_swap_activate()?


Does it make sense to reallocate blocks like
in f2fs_swap_activate(),
set_inode_flag(inode, FI_PIN_FILE);
truncate_pagecache(inode, 0);
f2fs_truncate_blocks(inode, 0, true);


It will corrupt swap header info while relocating whole swapfile...


expand_inode_data();
.

Re: [PATCH] f2fs: fix to avoid out-of-bounds memory access

2021-03-22 Thread Chao Yu


Hi butt3rflyh4ck,

On 2021/3/23 13:48, butt3rflyh4ck wrote:

Hi, I have tested the patch on 5.12.0-rc4+, it seems to fix the problem.


Thanks for helping to test this patch.

Thanks,



Regards,
  butt3rflyh4ck.


On Mon, Mar 22, 2021 at 7:47 PM Chao Yu  wrote:


butt3rflyh4ck  reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:

  dump_stack+0xfa/0x151 lib/dump_stack.c:120
  print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
  __kasan_report mm/kasan/report.c:399 [inline]
  kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
  f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
  current_nat_addr fs/f2fs/node.h:213 [inline]
  get_next_nat_page fs/f2fs/node.c:123 [inline]
  __flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
  f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
  f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
  f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
  f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
  __sync_filesystem fs/sync.c:39 [inline]
  sync_filesystem fs/sync.c:67 [inline]
  sync_filesystem+0x1b5/0x260 fs/sync.c:48
  generic_shutdown_super+0x70/0x370 fs/super.c:448
  kill_block_super+0x97/0xf0 fs/super.c:1394

The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.

[1] 
https://lore.kernel.org/lkml/cafco6xomwdr8pobek6en6-fs58kg9dorfadgjj-fnf-1x43...@mail.gmail.com/T/#u

Reported-by: butt3rflyh4ck 
Signed-off-by: Chao Yu 
---
  fs/f2fs/node.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index caf43970510e..8311b2367c7c 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2790,6 +2790,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info 
*sbi)
 struct f2fs_nat_entry raw_ne;
 nid_t nid = le32_to_cpu(nid_in_journal(journal, i));

+   if (f2fs_check_nid_range(sbi, nid))
+   continue;
+
 raw_ne = nat_in_journal(journal, i);

 ne = __lookup_nat_cache(nm_i, nid);
--
2.29.2


.

[PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-22 Thread Chao Yu

This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.

Because that commit fails generic/050 testcase which expect failure
during mount a recoverable readonly partition.

Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b48281642e98..2b78ee11f093 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 * previous checkpoint was not done by clean system shutdown.
 */
if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
+   err = -EROFS;
f2fs_err(sbi, "Need to recover fsync data, but 
write access unavailable");
-   else
-   f2fs_info(sbi, "write access unavailable, 
skipping recovery");
+   goto free_meta;
+   }
+   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
goto reset_checkpoint;
}
 
-- 
2.29.2

[PATCH] f2fs: fix to avoid out-of-bounds memory access

2021-03-22 Thread Chao Yu

butt3rflyh4ck  reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:

 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
 f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
 current_nat_addr fs/f2fs/node.h:213 [inline]
 get_next_nat_page fs/f2fs/node.c:123 [inline]
 __flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
 f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
 f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
 f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
 f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
 __sync_filesystem fs/sync.c:39 [inline]
 sync_filesystem fs/sync.c:67 [inline]
 sync_filesystem+0x1b5/0x260 fs/sync.c:48
 generic_shutdown_super+0x70/0x370 fs/super.c:448
 kill_block_super+0x97/0xf0 fs/super.c:1394

The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.

[1] 
https://lore.kernel.org/lkml/cafco6xomwdr8pobek6en6-fs58kg9dorfadgjj-fnf-1x43...@mail.gmail.com/T/#u

Reported-by: butt3rflyh4ck 
Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index caf43970510e..8311b2367c7c 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2790,6 +2790,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info 
*sbi)
struct f2fs_nat_entry raw_ne;
nid_t nid = le32_to_cpu(nid_in_journal(journal, i));
 
+   if (f2fs_check_nid_range(sbi, nid))
+   continue;
+
raw_ne = nat_in_journal(journal, i);
 
ne = __lookup_nat_cache(nm_i, nid);
-- 
2.29.2

[PATCH] f2fs: fix to check ckpt blocks of section in get_victim_by_default()

2021-03-22 Thread Chao Yu

In image which enables large section, if we disable checkpoint, during
background GC, it needs to check whether there is checkpointed data in
selected victim section, rather than segment, fix this.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Signed-off-by: Chao Yu 
---
 fs/f2fs/gc.c  |  4 ++--
 fs/f2fs/segment.c | 18 --
 fs/f2fs/segment.h | 15 ++-
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 318840c69b74..a73db53deb05 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -726,8 +726,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto next;
/* Don't touch checkpointed data */
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
-   get_ckpt_valid_blocks(sbi, segno) &&
-   p.alloc_mode == LFS))
+   get_ckpt_valid_blocks(sbi, segno, true) &&
+   p.alloc_mode == LFS))
goto next;
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 81fe335b860e..788aa0f2250e 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, 
unsigned int segno)
mutex_lock(&dirty_i->seglist_lock);
 
valid_blocks = get_valid_blocks(sbi, segno, false);
-   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
+   ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false);
 
if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
ckpt_valid_blocks == usable_blocks)) {
@@ -950,7 +950,7 @@ static unsigned int get_free_segment(struct f2fs_sb_info 
*sbi)
for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) {
if (get_valid_blocks(sbi, segno, false))
continue;
-   if (get_ckpt_valid_blocks(sbi, segno))
+   if (get_ckpt_valid_blocks(sbi, segno, false))
continue;
mutex_unlock(&dirty_i->seglist_lock);
return segno;
@@ -2933,18 +2933,8 @@ static void __allocate_new_segment(struct f2fs_sb_info 
*sbi, int type,
get_valid_blocks(sbi, curseg->segno, new_sec))
goto alloc;
 
-   if (new_sec) {
-   unsigned int segno = START_SEGNO(curseg->segno);
-   int i;
-
-   for (i = 0; i < sbi->segs_per_sec; i++, segno++) {
-   if (get_ckpt_valid_blocks(sbi, segno))
-   goto alloc;
-   }
-   } else {
-   if (!get_ckpt_valid_blocks(sbi, curseg->segno))
-   return;
-   }
+   if (!get_ckpt_valid_blocks(sbi, curseg->segno, new_sec))
+   return;
 
 alloc:
old_segno = curseg->segno;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 144980b62f9e..0b4b3796faaa 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -359,8 +359,21 @@ static inline unsigned int get_valid_blocks(struct 
f2fs_sb_info *sbi,
 }
 
 static inline unsigned int get_ckpt_valid_blocks(struct f2fs_sb_info *sbi,
-   unsigned int segno)
+   unsigned int segno, bool use_section)
 {
+   if (use_section && __is_large_section(sbi)) {
+   unsigned int start_segno = START_SEGNO(segno);
+   unsigned int blocks = 0;
+   int i;
+
+   for (i = 0; i < sbi->segs_per_sec; i++, start_segno++) {
+   struct seg_entry *se = get_seg_entry(sbi, start_segno);
+
+   blocks += se->ckpt_valid_blocks;
+   }
+   return blocks;
+   }
+
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 }
 
-- 
2.29.2

Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator

2021-03-18 Thread Chao Yu


On 2021/3/19 1:17, Jaegeuk Kim wrote:

On 02/20, Chao Yu wrote:

In cp disabling mode, there could be a condition
- target segment has 128 ckpt valid blocks
- GC migrates 128 valid blocks to other segment (segment is still in
dirty list)
- GC migrates 384 blocks to target segment (segment has 128 cp_vblocks
and 384 vblocks)
- If GC selects target segment via {AT,}SSR allocator, however there is
no free space in targe segment.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu 
---
  fs/f2fs/f2fs.h|  1 +
  fs/f2fs/gc.c  | 17 +
  fs/f2fs/segment.c | 20 
  3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ed7807103c8e..9c753eff0814 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
*sbi);
  int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
  void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
  void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
  void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
  void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 86ba8ed0b8a7..a1d8062cdace 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
if (p->gc_mode == GC_AT &&
get_valid_blocks(sbi, segno, true) == 0)
return;
-
-   if (p->alloc_mode == AT_SSR &&
-   get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
-   return;
}
  
  	for (i = 0; i < sbi->segs_per_sec; i++)

@@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
  
+		if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {

+   /*
+* to avoid selecting candidate which has below valid
+* block distribution:
+* partial blocks are valid and all left ones are valid
+* in previous checkpoint.
+*/
+   if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) {
+   if (!segment_has_free_slot(sbi, segno))
+   goto next;


Do we need to change this to check free_slot instead of get_ckpt_valid_blocks()?


Jaegeuk,

LFS was assigned only for GC case, in this case we are trying to select source
section, rather than target segment for SSR/AT_SSR case, so we don't need to
check free_slot.

- f2fs_gc
 - __get_victim
  - get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS, 0);



  732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
  733 get_ckpt_valid_blocks(sbi, segno) 
&&
  734 p.alloc_mode == LFS))


BTW, in LFS mode, GC wants to find source section rather than segment, so we
should change to check valid ckpt blocks in every segment of targe section here?

Thanks,





+   }
+   }
+
if (is_atgc) {
add_victim_entry(sbi, &p, segno);
goto next;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 2d5a82c4ca15..deaf57e13125 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
*sbi,
seg->next_blkoff++;
  }
  
+bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)

+{
+   struct sit_info *sit = SIT_I(sbi);
+   struct seg_entry *se = get_seg_entry(sbi, segno);
+   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
+   unsigned long *target_map = SIT_I(sbi)->tmp_map;
+   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
+   unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
+   int i, pos;
+
+   down_write(&sit->sentry_lock);
+   for (i = 0; i < entries; i++)
+   target_map[i] = ckpt_map[i] | cur_map[i];
+
+   pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
+   up_write(&sit->sentry_lock);
+
+   return pos < sbi->blocks_per_seg;
+}
+
  /*
   * This function always allocates a used segment(from dirty seglist) by SSR
   * manner, so it should recover the existing segment information of valid 
blocks
--
2.29.2

.

Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change

2021-03-18 Thread Chao Yu


On 2021/3/6 12:04, Gao Xiang wrote:

From: Gao Xiang 

Martin reported an issue that directory read could be hung on the
latest -rc kernel with some certain image. The root cause is that
commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated
vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs
is set as actual allocated vector number rather than the requested
number now.

Let's avoid using .bi_max_vecs completely instead.

Reported-by: Martin DEVERA 
Signed-off-by: Gao Xiang 
---
change since v1:
  - since bio->bi_max_vecs doesn't record extent blocks anymore,
introduce a remaining extent block to avoid extent excess.

  fs/erofs/data.c | 28 +++-
  1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index f88851c5c250..1249e74b3bf0 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -129,6 +129,7 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
  struct page *page,
  erofs_off_t *last_block,
  unsigned int nblocks,
+ unsigned int *eblks,
  bool ra)
  {
struct inode *const inode = mapping->host;
@@ -145,8 +146,7 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
  
  	/* note that for readpage case, bio also equals to NULL */

if (bio &&
-   /* not continuous */
-   *last_block + 1 != current_block) {
+   (*last_block + 1 != current_block || !*eblks)) {


Xiang,

I found below function during checking bi_max_vecs usage in f2fs:

/**
 * bio_full - check if the bio is full
 * @bio:bio to check
 * @len:length of one segment to be added
 *
 * Return true if @bio is full and one segment with @len bytes can't be
 * added to the bio, otherwise return false
 */
static inline bool bio_full(struct bio *bio, unsigned len)
{
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;

if (bio->bi_iter.bi_size > UINT_MAX - len)
return true;

return false;
}

Could you please check that whether it will be better to use bio_full()
rather than using left-space-in-bio maintained by erofs itself? something
like:

if (bio && (bio_full(bio, PAGE_SIZE) ||
/* not continuous */
(*last_block + 1 != current_block))

I'm thinking we need to decouple bio detail implementation as much as
possible, to avoid regression whenever bio used/max size definition
updates, though I've no idea how to fix f2fs case.

Let me know if you have other concern.

Thanks,


  submit_bio_retry:
submit_bio(bio);
bio = NULL;
@@ -216,7 +216,8 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE))
nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE);
  
-		bio = bio_alloc(GFP_NOIO, bio_max_segs(nblocks));

+   *eblks = bio_max_segs(nblocks);
+   bio = bio_alloc(GFP_NOIO, *eblks);
  
  		bio->bi_end_io = erofs_readendio;

bio_set_dev(bio, sb->s_bdev);
@@ -229,16 +230,8 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
/* out of the extent or bio is full */
if (err < PAGE_SIZE)
goto submit_bio_retry;
-
+   --*eblks;
*last_block = current_block;
-
-   /* shift in advance in case of it followed by too many gaps */
-   if (bio->bi_iter.bi_size >= bio->bi_max_vecs * PAGE_SIZE) {
-   /* err should reassign to 0 after submitting */
-   err = 0;
-   goto submit_bio_out;
-   }
-
return bio;
  
  err_out:

@@ -252,7 +245,6 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
  
  	/* if updated manually, continuous pages has a gap */

if (bio)
-submit_bio_out:
submit_bio(bio);
return err ? ERR_PTR(err) : NULL;
  }
@@ -264,23 +256,26 @@ static inline struct bio *erofs_read_raw_page(struct bio 
*bio,
  static int erofs_raw_access_readpage(struct file *file, struct page *page)
  {
erofs_off_t last_block;
+   unsigned int eblks;
struct bio *bio;
  
  	trace_erofs_readpage(page, true);
  
  	bio = erofs_read_raw_page(NULL, page->mapping,

- page, &last_block, 1, false);
+ page, &last_block, 1, &eblks, false);
  
  	if (IS_ERR(bio))

return PTR_ERR(bio);
  
-	DBG_BUGON(bio);	/* since we have only one bio -- must be NULL */

+   if (bio)
+   submit_bio(bio);
return 0;
  }
  
  static void erofs_raw_access_readahead(struct readahead_control *rac)

  {
erofs_off_t last_block;
+   unsigned int eblks;
struct bio *bio = NULL;
struct page *page;

[PATCH 1/2] f2fs: don't start checkpoint thread in readonly mountpoint

2021-03-17 Thread Chao Yu

In readonly mountpoint, there should be no write IOs include checkpoint
IO, so that it's not needed to create kernel checkpoint thread.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index ce88db0170fa..6716af216dca 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2091,8 +2091,10 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
}
}
 
-   if (!test_opt(sbi, DISABLE_CHECKPOINT) &&
-   test_opt(sbi, MERGE_CHECKPOINT)) {
+   if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) ||
+   !test_opt(sbi, MERGE_CHECKPOINT)) {
+   f2fs_stop_ckpt_thread(sbi);
+   } else {
err = f2fs_start_ckpt_thread(sbi);
if (err) {
f2fs_err(sbi,
@@ -2100,8 +2102,6 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
err);
goto restore_gc;
}
-   } else {
-   f2fs_stop_ckpt_thread(sbi);
}
 
/*
@@ -3857,7 +3857,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 
/* setup checkpoint request control and start checkpoint issue thread */
f2fs_init_ckpt_req_control(sbi);
-   if (!test_opt(sbi, DISABLE_CHECKPOINT) &&
+   if (!f2fs_readonly(sb) && !test_opt(sbi, DISABLE_CHECKPOINT) &&
test_opt(sbi, MERGE_CHECKPOINT)) {
err = f2fs_start_ckpt_thread(sbi);
if (err) {
-- 
2.29.2

[PATCH 2/2] f2fs: fix error path of f2fs_remount()

2021-03-17 Thread Chao Yu

In error path of f2fs_remount(), it missed to restart/stop kernel thread
or enable/disable checkpoint, then mount option status may not be
consistent with real condition of filesystem, so let's reorder remount
flow a bit as below and do recovery correctly in error path:

1) handle gc thread
2) handle ckpt thread
3) handle flush thread
4) handle checkpoint disabling

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 47 ++-
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 6716af216dca..fa60f08c30bb 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1942,8 +1942,9 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
struct f2fs_mount_info org_mount_opt;
unsigned long old_sb_flags;
int err;
-   bool need_restart_gc = false;
-   bool need_stop_gc = false;
+   bool need_restart_gc = false, need_stop_gc = false;
+   bool need_restart_ckpt = false, need_stop_ckpt = false;
+   bool need_restart_flush = false, need_stop_flush = false;
bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT);
bool no_io_align = !F2FS_IO_ALIGNED(sbi);
@@ -2081,19 +2082,10 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
clear_sbi_flag(sbi, SBI_IS_CLOSE);
}
 
-   if (checkpoint_changed) {
-   if (test_opt(sbi, DISABLE_CHECKPOINT)) {
-   err = f2fs_disable_checkpoint(sbi);
-   if (err)
-   goto restore_gc;
-   } else {
-   f2fs_enable_checkpoint(sbi);
-   }
-   }
-
if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) ||
!test_opt(sbi, MERGE_CHECKPOINT)) {
f2fs_stop_ckpt_thread(sbi);
+   need_restart_ckpt = true;
} else {
err = f2fs_start_ckpt_thread(sbi);
if (err) {
@@ -2102,6 +2094,7 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
err);
goto restore_gc;
}
+   need_stop_ckpt = true;
}
 
/*
@@ -2111,11 +2104,24 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
if ((*flags & SB_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) {
clear_opt(sbi, FLUSH_MERGE);
f2fs_destroy_flush_cmd_control(sbi, false);
+   need_restart_flush = true;
} else {
err = f2fs_create_flush_cmd_control(sbi);
if (err)
-   goto restore_gc;
+   goto restore_ckpt;
+   need_stop_flush = true;
}
+
+   if (checkpoint_changed) {
+   if (test_opt(sbi, DISABLE_CHECKPOINT)) {
+   err = f2fs_disable_checkpoint(sbi);
+   if (err)
+   goto restore_flush;
+   } else {
+   f2fs_enable_checkpoint(sbi);
+   }
+   }
+
 skip:
 #ifdef CONFIG_QUOTA
/* Release old quota file names */
@@ -2130,6 +2136,21 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
adjust_unusable_cap_perc(sbi);
*flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME);
return 0;
+restore_flush:
+   if (need_restart_flush) {
+   if (f2fs_create_flush_cmd_control(sbi))
+   f2fs_warn(sbi, "background flush thread has stopped");
+   } else if (need_stop_flush) {
+   clear_opt(sbi, FLUSH_MERGE);
+   f2fs_destroy_flush_cmd_control(sbi, false);
+   }
+restore_ckpt:
+   if (need_restart_ckpt) {
+   if (f2fs_start_ckpt_thread(sbi))
+   f2fs_warn(sbi, "background ckpt thread has stopped");
+   } else if (need_stop_ckpt) {
+   f2fs_stop_ckpt_thread(sbi);
+   }
 restore_gc:
if (need_restart_gc) {
if (f2fs_start_gc_thread(sbi))
-- 
2.29.2

Re: [PATCH 2/2] erofs: use sync decompression for atomic contexts only

2021-03-16 Thread Chao Yu


On 2021/3/17 11:54, Huang Jianan via Linux-erofs wrote:

Sync decompression was introduced to get rid of additional kworker
scheduling overhead. But there is no such overhead in non-atomic
contexts. Therefore, it should be better to turn off sync decompression
to avoid the current thread waiting in z_erofs_runqueue.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH 1/2] erofs: use workqueue decompression for atomic contexts only

2021-03-16 Thread Chao Yu


On 2021/3/17 11:54, Huang Jianan via Linux-erofs wrote:

z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH] zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()

2021-03-16 Thread Chao Yu


On 2021/3/17 7:30, Damien Le Moal wrote:

On 2021/03/16 21:30, Chao Yu wrote:

In zonefs_open_zone(), if opened zone count is larger than
.s_max_open_zones threshold, we missed to recover .i_wr_refcnt,
fix this.

Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Signed-off-by: Chao Yu 
---
  fs/zonefs/super.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 0fe76f376dee..be6b99f7de74 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -966,8 +966,7 @@ static int zonefs_open_zone(struct inode *inode)
  
  	mutex_lock(&zi->i_truncate_mutex);
  
-	zi->i_wr_refcnt++;

-   if (zi->i_wr_refcnt == 1) {
+   if (zi->i_wr_refcnt == 0) {


Nit: if (!zi->i_wr_refcnt) ? I can change that when applying.


More clean, thanks. :)

Thanks,



  
  		if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) {

atomic_dec(&sbi->s_open_zones);
@@ -978,7 +977,6 @@ static int zonefs_open_zone(struct inode *inode)
if (i_size_read(inode) < zi->i_max_size) {
ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
if (ret) {
-   zi->i_wr_refcnt--;
atomic_dec(&sbi->s_open_zones);
goto unlock;
}
@@ -986,6 +984,8 @@ static int zonefs_open_zone(struct inode *inode)
}
}
  
+	zi->i_wr_refcnt++;

+
  unlock:
mutex_unlock(&zi->i_truncate_mutex);
  



Good catch ! Will apply this and check zonefs test suite as this bug went
undetected.

Thanks.

[PATCH] zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()

2021-03-16 Thread Chao Yu

In zonefs_open_zone(), if opened zone count is larger than
.s_max_open_zones threshold, we missed to recover .i_wr_refcnt,
fix this.

Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Signed-off-by: Chao Yu 
---
 fs/zonefs/super.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 0fe76f376dee..be6b99f7de74 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -966,8 +966,7 @@ static int zonefs_open_zone(struct inode *inode)
 
mutex_lock(&zi->i_truncate_mutex);
 
-   zi->i_wr_refcnt++;
-   if (zi->i_wr_refcnt == 1) {
+   if (zi->i_wr_refcnt == 0) {
 
if (atomic_inc_return(&sbi->s_open_zones) > 
sbi->s_max_open_zones) {
atomic_dec(&sbi->s_open_zones);
@@ -978,7 +977,6 @@ static int zonefs_open_zone(struct inode *inode)
if (i_size_read(inode) < zi->i_max_size) {
ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
if (ret) {
-   zi->i_wr_refcnt--;
atomic_dec(&sbi->s_open_zones);
goto unlock;
}
@@ -986,6 +984,8 @@ static int zonefs_open_zone(struct inode *inode)
}
}
 
+   zi->i_wr_refcnt++;
+
 unlock:
mutex_unlock(&zi->i_truncate_mutex);
 
-- 
2.29.2

Re: [PATCH v3] f2fs: allow to change discard policy based on cached discard cmds

2021-03-16 Thread Chao Yu


On 2021/3/16 17:29, Sahitya Tummala wrote:

With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.

Signed-off-by: Sahitya Tummala 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs] ab2dbddfd0: BUG:kernel_NULL_pointer_dereference,address

2021-03-16 Thread Chao Yu


Hi Sahitya,

Node manager was initialized after segment manager's initialization,
so f2fs_available_free_memory() called from issue_discard_thread()
may access invalid nm_i pointer, could you please check and fix
this case?

On 2021/3/16 12:58, kernel test robot wrote:



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: ab2dbddfd064f2078a7099e4d65cce54f5ef5e71 ("[PATCH v2] f2fs: allow to change 
discard policy based on cached discard cmds")
url: 
https://github.com/0day-ci/linux/commits/Sahitya-Tummala/f2fs-allow-to-change-discard-policy-based-on-cached-discard-cmds/20210311-170257


in testcase: ltp
version: ltp-x86_64-14c1f76-1_20210315
with following parameters:

disk: 1HDD
fs: f2fs
test: io
ucode: 0x21

test-description: The LTP testsuite contains a collection of tools for testing 
the Linux kernel and related features.
test-url: http://linux-test-project.github.io/


on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G 
memory

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


[   38.378402] BUG: kernel NULL pointer dereference, address: 0010
[   38.378526] #PF: supervisor read access in kernel mode
[   38.378610] #PF: error_code(0x) - not-present page
[   38.378694] PGD 0 P4D 0
[   38.378739] Oops:  [#1] SMP PTI
[   38.378799] CPU: 2 PID: 2436 Comm: f2fs_discard-8: Not tainted 
5.12.0-rc2-1-gab2dbddfd064 #1
[   38.378940] Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 
02/05/2013
[   38.379057] RIP: 0010:f2fs_available_free_memory 
(kbuild/src/consumer/fs/f2fs/node.c:96) f2fs
[ 38.379237] Code: 04 00 00 48 0f af d6 48 be c3 f5 28 5c 8f c2 f5 28 48 c1 ea 02 48 
89 d0 48 f7 e6 48 c1 ea 03 48 39 ca 0f 97 c0 e9 af fe ff ff <41> 8b 54 24 10 49 
63 8d 94 20 00 00 48 0f af d6 48 be c3 f5 28 5c
All code

0:  04 00   add$0x0,%al
2:  00 48 0fadd%cl,0xf(%rax)
5:  af  scas   %es:(%rdi),%eax
6:  d6  (bad)
7:  48 be c3 f5 28 5c 8fmovabs $0x28f5c28f5c28f5c3,%rsi
e:  c2 f5 28
   11:  48 c1 ea 02 shr$0x2,%rdx
   15:  48 89 d0mov%rdx,%rax
   18:  48 f7 e6mul%rsi
   1b:  48 c1 ea 03 shr$0x3,%rdx
   1f:  48 39 cacmp%rcx,%rdx
   22:  0f 97 c0seta   %al
   25:  e9 af fe ff ff  jmpq   0xfed9
   2a:* 41 8b 54 24 10  mov0x10(%r12),%edx  <-- trapping 
instruction
   2f:  49 63 8d 94 20 00 00movslq 0x2094(%r13),%rcx
   36:  48 0f af d6 imul   %rsi,%rdx
   3a:  48  rex.W
   3b:  be c3 f5 28 5c  mov$0x5c28f5c3,%esi

Code starting with the faulting instruction
===
0:  41 8b 54 24 10  mov0x10(%r12),%edx
5:  49 63 8d 94 20 00 00movslq 0x2094(%r13),%rcx
c:  48 0f af d6 imul   %rsi,%rdx
   10:  48  rex.W
   11:  be c3 f5 28 5c  mov$0x5c28f5c3,%esi
[   38.379531] RSP: 0018:c96f3dd8 EFLAGS: 00010246
[   38.379617] RAX: 0106 RBX: 888213317000 RCX: 001e9c8c
[   38.379731] RDX: 88810c84b430 RSI: 001e9c8c RDI: 88810c84b540
[   38.379844] RBP: 0006 R08: 0106 R09: 88821fb2bc58
[   38.379958] R10: 032e R11: 88821fb2a144 R12: 
[   38.380071] R13: 88820b7e4000 R14: ea60 R15: 
[   38.380185] FS:  () GS:88821fb0() 
knlGS:
[   38.380315] CS:  0010 DS:  ES:  CR0: 80050033
[   38.380408] CR2: 0010 CR3: 00021e00a003 CR4: 001706e0
[   38.380522] Call Trace:
[   38.380619] ? del_timer_sync (kbuild/src/consumer/kernel/time/timer.c:1394)
[   38.380686] ? prepare_to_wait_event 
(kbuild/src/consumer/kernel/sched/wait.c:323 (discriminator 15))
[   38.380762] ? __next_timer_interrupt 
(kbuild/src/consumer/kernel/time/timer.c:1816)
[   38.380841] issue_discard_thread (kbuild/src/consumer/fs/f2fs/segment.c:1759 
(discriminator 1)) f2fs
[   38.380937] ? finish_wait (kbuild/src/consumer/kernel/sched/wait.c:403)
[   38.380997] ? __issue_discard_cmd 
(kbuild/src/consumer/fs/f2fs/segment.c:1722) f2fs
[   38.381094] kthread (kbuild/src/consumer/kernel/kthread.c:292)
[   38.381151] ? kthread_park (kbuild/src/consumer/kernel/kthread.c:245)
[   38.381213] ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_64.S:300)
[   38.381276] Modules linked in: dm_mod f2fs netconsole btrfs blake2b_generic 
xor zstd_compress raid6_pq libcrc32c sd_mod t10_pi sg intel_rapl_msr 
intel_rapl_common i915 x86_pkg_temp_thermal intel_powerclamp coretemp intel_gtt 
crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c

Re: [PATCH v6 2/2] erofs: decompress in endio if possible

2021-03-16 Thread Chao Yu


Hi Jianan,

On 2021/3/16 11:15, Huang Jianan via Linux-erofs wrote:

z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead. Also, it makes no sense to apply synchronous
decompression for such case.


It looks this patch does more than one things:
- combine dm-verity and erofs workqueue
- change policy of decompression in context of thread

Normally, we do one thing in one patch, by this way, we will be benefit in
scenario of when backporting patches and bisecting problematic patch with
minimum granularity, and also it will help reviewer to focus on reviewing
single code logic by following patch's goal.

So IMO, it would be better to separate this patch into two.

One more thing is could you explain a little bit more about why we need to
change policy of decompression in context of thread? for better performance?

BTW, code looks clean to me. :)

Thanks,



Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 
---
  fs/erofs/internal.h |  2 ++
  fs/erofs/super.c|  1 +
  fs/erofs/zdata.c| 15 +--
  3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 67a7ec945686..fbc4040715be 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -50,6 +50,8 @@ struct erofs_fs_context {
  #ifdef CONFIG_EROFS_FS_ZIP
/* current strategy of how to use managed cache */
unsigned char cache_strategy;
+   /* strategy of sync decompression (false - auto, true - force on) */
+   bool readahead_sync_decompress;
  
  	/* threshold for decompression synchronously */

unsigned int max_sync_decompress_pages;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index d5a6b9b888a5..0445d09b6331 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -200,6 +200,7 @@ static void erofs_default_options(struct erofs_fs_context 
*ctx)
  #ifdef CONFIG_EROFS_FS_ZIP
ctx->cache_strategy = EROFS_ZIP_CACHE_READAROUND;
ctx->max_sync_decompress_pages = 3;
+   ctx->readahead_sync_decompress = false;
  #endif
  #ifdef CONFIG_EROFS_FS_XATTR
set_opt(ctx, XATTR_USER);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 6cb356c4217b..25a0c4890d0a 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -706,9 +706,12 @@ static int z_erofs_do_read_page(struct 
z_erofs_decompress_frontend *fe,
goto out;
  }
  
+static void z_erofs_decompressqueue_work(struct work_struct *work);

  static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
   bool sync, int bios)
  {
+   struct erofs_sb_info *const sbi = EROFS_SB(io->sb);
+
/* wake up the caller thread for sync decompression */
if (sync) {
unsigned long flags;
@@ -720,8 +723,15 @@ static void z_erofs_decompress_kickoff(struct 
z_erofs_decompressqueue *io,
return;
}
  
-	if (!atomic_add_return(bios, &io->pending_bios))

+   if (atomic_add_return(bios, &io->pending_bios))
+   return;
+   /* Use workqueue and sync decompression for atomic contexts only */
+   if (in_atomic() || irqs_disabled()) {
queue_work(z_erofs_workqueue, &io->u.work);
+   sbi->ctx.readahead_sync_decompress = true;
+   return;
+   }
+   z_erofs_decompressqueue_work(&io->u.work);
  }
  
  static bool z_erofs_page_is_invalidated(struct page *page)

@@ -1333,7 +1343,8 @@ static void z_erofs_readahead(struct readahead_control 
*rac)
struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
  
  	unsigned int nr_pages = readahead_count(rac);

-   bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages);
+   bool sync = (sbi->ctx.readahead_sync_decompress &&
+   nr_pages <= sbi->ctx.max_sync_decompress_pages);
struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
struct page *page, *head = NULL;
LIST_HEAD(pagepool);

Re: [PATCH v6 1/2] erofs: avoid memory allocation failure during rolling decompression

2021-03-16 Thread Chao Yu


On 2021/3/16 11:15, Huang Jianan via Linux-erofs wrote:

Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.

In the long term, we might consider adding another !Uptodate case
for such case.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v5 1/2] erofs: avoid memory allocation failure during rolling decompression

2021-03-15 Thread Chao Yu


On 2021/3/5 17:58, Huang Jianan via Linux-erofs wrote:

Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.

In the long term, we might consider adding another !Uptodate case
for such case.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
---
  fs/erofs/decompressor.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 1cb1ffd10569..3d276a8aad86 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -73,7 +73,8 @@ static int z_erofs_lz4_prepare_destpages(struct 
z_erofs_decompress_req *rq,
victim = availables[--top];
get_page(victim);
} else {
-   victim = erofs_allocpage(pagepool, GFP_KERNEL);
+   victim = erofs_allocpage(pagepool,
+GFP_KERNEL | __GFP_NOFAIL);
if (!victim)
return -ENOMEM;


A little bit weird that we still need to check return value of erofs_allocpage()
after we pass __GFP_NOFAIL parameter.

Thanks,


set_page_private(victim, Z_EROFS_SHORTLIVED_PAGE);

Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization

2021-03-15 Thread Chao Yu


Hi Sahitya,

On 2021/3/15 17:45, Sahitya Tummala wrote:

Hi Chao,

On Mon, Mar 15, 2021 at 04:10:22PM +0800, Chao Yu wrote:

Hi Sahitya,

On 2021/3/15 15:46, Sahitya Tummala wrote:

Hi Chao,

On Mon, Mar 15, 2021 at 02:12:44PM +0800, Chao Yu wrote:

Sahitya,

On 2021/3/15 12:56, Sahitya Tummala wrote:

When f2fs is heavily utilized over 80%, the current discard policy
sets the max sleep timeout of discard thread as 50ms
(DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are
no pending discard commands to be issued. This results into
unnecessary frequent and periodic wake ups of the discard thread.
This patch adds check for pending  discard commands in addition
to heavy utilization condition to prevent those wake ups.


Could this commit fix your issue?

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6


I don't think it will help because we are changing the max timeout of the
dpolicy itself in __init_discard_policy() when util > 80% as below -

dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;

And issue_discard_thread() uses this value as wait_ms, when there
are no more pending discard commands to be issued.

 } else {
 wait_ms = dpolicy.max_interval;
 }


The new patch posted above is not changing anything related to the  
max_interval.
Hence, I think it won't help the uncessary wakeup problem I am trying to solve
for this condition - util > 80% and when there are no pending discards.

Please let me know if i am missing something.


Copied, thanks for the explanation.

But there is another case which can cause this issue in the case of
disk util < 80%.

wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;

do {
wait_event_interruptible_timeout(, wait_ms);

...

if (!atomic_read(&dcc->discard_cmd_cnt))
[1] new statement
continue;

} while();

Then the loop will wakeup whenever 50ms timeout.


Yes, only for a short period of time i.e., until the first discard command
is issued. Once a discard is issued, it will use
wait_ms = dpolicy.max_interval;


So, to avoid this case, shouldn't we reset wait_ms to dpolicy.max_interval
at [1]?


Yes, we can add that to cover the above case.


Meanwhile, how about relocating discard_cmd_cnt check after
__init_discard_policy(DPOLICY_FORCE)? and olny set .max_interval to
DEF_MAX_DISCARD_ISSUE_TIME if there is no discard command, and keep
.granularity to 1?



There is not need to change .granularity, right? It will be controlled


I think so.


as per utilization as it is done today. Only max_interval and wait_ms
needs to be updated. Does this look good?

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index d7076796..958ad1e 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1772,13 +1772,16 @@ static int issue_discard_thread(void *data)
 wait_ms = dpolicy.max_interval;
 continue;
 }
-   if (!atomic_read(&dcc->discard_cmd_cnt))
-   continue;
-
 if (sbi->gc_mode == GC_URGENT_HIGH ||
 !f2fs_available_free_memory(sbi, DISCARD_CACHE))
 __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1);

+   if (!atomic_read(&dcc->discard_cmd_cnt)) {
+   dpolicy.max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
+   wait_ms = dpolicy.max_interval;
+   continue;
+   }


Hmm.. how about cleaning up to configure discard policy in
__init_discard_policy()?

Something like:

---
 fs/f2fs/segment.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 592927ccffa7..684463a70eb9 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1118,7 +1118,9 @@ static void __init_discard_policy(struct f2fs_sb_info 
*sbi,
dpolicy->ordered = true;
if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
dpolicy->granularity = 1;
-   dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
+   if (atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt))
+   dpolicy->max_interval =
+   DEF_MIN_DISCARD_ISSUE_TIME;
}
} else if (discard_type == DPOLICY_FORCE) {
dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
@@ -1734,8 +1736,15 @@ static int issue_discard_thread(void *data)
set_freezable();

do {
-   __init_discard_policy(sbi, &dpolicy, DPOLICY_BG,
-   dcc->discard_granularity);
+   if (sbi->gc_mode == GC_URGENT_HIGH ||
+   !f2fs_available_free_memor

Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization

2021-03-15 Thread Chao Yu


Hi Sahitya,

On 2021/3/15 15:46, Sahitya Tummala wrote:

Hi Chao,

On Mon, Mar 15, 2021 at 02:12:44PM +0800, Chao Yu wrote:

Sahitya,

On 2021/3/15 12:56, Sahitya Tummala wrote:

When f2fs is heavily utilized over 80%, the current discard policy
sets the max sleep timeout of discard thread as 50ms
(DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are
no pending discard commands to be issued. This results into
unnecessary frequent and periodic wake ups of the discard thread.
This patch adds check for pending  discard commands in addition
to heavy utilization condition to prevent those wake ups.


Could this commit fix your issue?

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6


I don't think it will help because we are changing the max timeout of the
dpolicy itself in __init_discard_policy() when util > 80% as below -

dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;

And issue_discard_thread() uses this value as wait_ms, when there
are no more pending discard commands to be issued.

 } else {
 wait_ms = dpolicy.max_interval;
 }


The new patch posted above is not changing anything related to the  
max_interval.
Hence, I think it won't help the uncessary wakeup problem I am trying to solve
for this condition - util > 80% and when there are no pending discards.

Please let me know if i am missing something.


Copied, thanks for the explanation.

But there is another case which can cause this issue in the case of
disk util < 80%.

wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;

do {
wait_event_interruptible_timeout(, wait_ms);

...

if (!atomic_read(&dcc->discard_cmd_cnt))
[1] new statement
continue;

} while();

Then the loop will wakeup whenever 50ms timeout.

So, to avoid this case, shouldn't we reset wait_ms to dpolicy.max_interval
at [1]?

Meanwhile, how about relocating discard_cmd_cnt check after
__init_discard_policy(DPOLICY_FORCE)? and olny set .max_interval to
DEF_MAX_DISCARD_ISSUE_TIME if there is no discard command, and keep
.granularity to 1?

Thanks,



Thanks,
Sahitya.


Thanks,



Signed-off-by: Sahitya Tummala 
---
  fs/f2fs/segment.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index dced46c..df30220 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1112,6 +1112,8 @@ static void __init_discard_policy(struct f2fs_sb_info 
*sbi,
struct discard_policy *dpolicy,
int discard_type, unsigned int granularity)
  {
+   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+
/* common policy */
dpolicy->type = discard_type;
dpolicy->sync = true;
@@ -1129,7 +1131,8 @@ static void __init_discard_policy(struct f2fs_sb_info 
*sbi,
dpolicy->io_aware = true;
dpolicy->sync = false;
dpolicy->ordered = true;
-   if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
+   if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL &&
+   atomic_read(&dcc->discard_cmd_cnt)) {
dpolicy->granularity = 1;
dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
}

Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization

2021-03-14 Thread Chao Yu


Sahitya,

On 2021/3/15 12:56, Sahitya Tummala wrote:

When f2fs is heavily utilized over 80%, the current discard policy
sets the max sleep timeout of discard thread as 50ms
(DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are
no pending discard commands to be issued. This results into
unnecessary frequent and periodic wake ups of the discard thread.
This patch adds check for pending  discard commands in addition
to heavy utilization condition to prevent those wake ups.


Could this commit fix your issue?

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6

Thanks,



Signed-off-by: Sahitya Tummala 
---
  fs/f2fs/segment.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index dced46c..df30220 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1112,6 +1112,8 @@ static void __init_discard_policy(struct f2fs_sb_info 
*sbi,
struct discard_policy *dpolicy,
int discard_type, unsigned int granularity)
  {
+   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+
/* common policy */
dpolicy->type = discard_type;
dpolicy->sync = true;
@@ -1129,7 +1131,8 @@ static void __init_discard_policy(struct f2fs_sb_info 
*sbi,
dpolicy->io_aware = true;
dpolicy->sync = false;
dpolicy->ordered = true;
-   if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
+   if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL &&
+   atomic_read(&dcc->discard_cmd_cnt)) {
dpolicy->granularity = 1;
dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
}

Re: [f2fs-dev] [PATCH v3] f2fs: add sysfs nodes to get runtime compression stat

2021-03-12 Thread Chao Yu


On 2021/3/11 10:32, Daeho Jeong wrote:

From: Daeho Jeong 

I've added new sysfs nodes to show runtime compression stat since mount.
compr_written_block - show the block count written after compression
compr_saved_block - show the saved block count with compression
compr_new_inode - show the count of inode newly enabled for compression

Signed-off-by: Daeho Jeong 
---
v2: thanks to kernel test robot , fixed compile issue
 related to kernel config.
v3: changed sysfs nodes' names and made them runtime stat, not
 persistent on disk
---
  Documentation/ABI/testing/sysfs-fs-f2fs | 20 +
  fs/f2fs/compress.c  |  1 +
  fs/f2fs/f2fs.h  | 19 
  fs/f2fs/super.c |  7 +++
  fs/f2fs/sysfs.c | 58 +
  5 files changed, 105 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index cbeac1bebe2f..f2981eb319cb 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -409,3 +409,23 @@ Description:   Give a way to change checkpoint merge 
daemon's io priority.
I/O priority "3". We can select the class between "rt" and "be",
and set the I/O priority within valid range of it. "," delimiter
is necessary in between I/O class and priority number.
+
+What:  /sys/fs/f2fs//compr_written_block
+Date:  March 2021
+Contact:   "Daeho Jeong" 
+Description:   Show the block count written after compression since mount.
+   If you write "0" here, you can initialize compr_written_block 
and
+   compr_saved_block to "0".
+
+What:  /sys/fs/f2fs//compr_saved_block
+Date:  March 2021
+Contact:   "Daeho Jeong" 
+Description:   Show the saved block count with compression since mount.
+   If you write "0" here, you can initialize compr_written_block 
and
+   compr_saved_block to "0".
+
+What:  /sys/fs/f2fs//compr_new_inode
+Date:  March 2021
+Contact:   "Daeho Jeong" 
+Description:   Show the count of inode newly enabled for compression since 
mount.
+   If you write "0" here, you can initialize compr_new_inode to 
"0".
diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 77fa342de38f..3c9d797dbdd6 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1353,6 +1353,7 @@ static int f2fs_write_compressed_pages(struct 
compress_ctx *cc,
if (fio.compr_blocks)
f2fs_i_compr_blocks_update(inode, fio.compr_blocks - 1, false);
f2fs_i_compr_blocks_update(inode, cc->nr_cpages, true);
+   add_compr_block_stat(inode, cc->nr_cpages);


If compressed cluster was overwritten as normal cluster, compr_saved_block value
won't be decreased, is it fine?

  
  	set_inode_flag(cc->inode, FI_APPEND_WRITE);

if (cc->cluster_idx == 0)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e2d302ae3a46..2c989f8caf05 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1623,6 +1623,11 @@ struct f2fs_sb_info {
  #ifdef CONFIG_F2FS_FS_COMPRESSION
struct kmem_cache *page_array_slab; /* page array entry */
unsigned int page_array_slab_size;  /* default page array slab size 
*/
+
+   /* For runtime compression statistics */
+   atomic64_t compr_written_block;
+   atomic64_t compr_saved_block;
+   atomic_t compr_new_inode;
  #endif
  };
  
@@ -3955,6 +3960,18 @@ int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi);

  void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi);
  int __init f2fs_init_compress_cache(void);
  void f2fs_destroy_compress_cache(void);
+#define inc_compr_inode_stat(inode)\
+   do {\
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);\
+   atomic_inc(&sbi->compr_new_inode);   \
+   } while (0)
+#define add_compr_block_stat(inode, blocks)\
+   do {\
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);\
+   int diff = F2FS_I(inode)->i_cluster_size - blocks;   \
+   atomic64_add(blocks, &sbi->compr_written_block); \
+   atomic64_add(diff, &sbi->compr_saved_block); \
+   } while (0)
  #else
  static inline bool f2fs_is_compressed_page(struct page *page) { return false; 
}
  static inline bool f2fs_is_compress_backend_ready(struct inode *inode)
@@ -3983,6 +4000,7 @@ static inline int f2fs_init_page_array_cache(struct 
f2fs_sb_info *sbi) { return
  static inline void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi) { }
  static inline int __init f2fs_init_compress_cache(void) { return 0; }
  static inline void f2fs_destroy_compress_cache(v

Re: [PATCH v2] f2fs: allow to change discard policy based on cached discard cmds

2021-03-11 Thread Chao Yu


On 2021/3/11 16:59, Sahitya Tummala wrote:

With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.

Signed-off-by: Sahitya Tummala 
---
v2:
- by mistake the last else condition was modified, fix it now.


Oh, yes,

Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH] f2fs: allow to change discard policy based on cached discard cmds

2021-03-10 Thread Chao Yu


On 2021/3/11 9:47, Sahitya Tummala wrote:

With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.


Agreed.



Signed-off-by: Sahitya Tummala 


Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH] configfs: Fix use-after-free issue in __configfs_open_file

2021-03-10 Thread Chao Yu


Hi Joel, Christoph, Al

Does any one have time to review this patch, ten days past...

Thanks,

On 2021/3/5 11:29, Chao Yu wrote:

+Cc fsdevel

Ping,

Any comments one this patch?

On 2021/3/1 14:10, Chao Yu wrote:

From: Daiyue Zhang 

Commit b0841eefd969 ("configfs: provide exclusion between IO and removals")
uses ->frag_dead to mark the fragment state, thus no bothering with extra
refcount on config_item when opening a file. The configfs_get_config_item
was removed in __configfs_open_file, but not with config_item_put. So the
refcount on config_item will lost its balance, causing use-after-free
issues in some occasions like this:

Test:
1. Mount configfs on /config with read-only items:
drwxrwx--- 289 root   root0 2021-04-01 11:55 /config
drwxr-xr-x   2 root   root0 2021-04-01 11:54 /config/a
--w--w--w-   1 root   root 4096 2021-04-01 11:53 /config/a/1.txt
..

2. Then run:
for file in /config
do
echo $file
grep -R 'key' $file
done

3. __configfs_open_file will be called in parallel, the first one
got called will do:
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO))
goto out_put_module;
config_item_put(buffer->item);
kref_put()
package_details_release()
kfree()

the other one will run into use-after-free issues like this:
BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0
Read of size 8 at addr fff155f02480 by task grep/13096
CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW   4.14.116-kasan #1
TGID: 13096 Comm: grep
Call trace:
dump_stack+0x118/0x160
kasan_report+0x22c/0x294
__asan_load8+0x80/0x88
__configfs_open_file+0x1bc/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48

Allocated by task 2138:
kasan_kmalloc+0xe0/0x1ac
kmem_cache_alloc_trace+0x334/0x394
packages_make_item+0x4c/0x180
configfs_mkdir+0x358/0x740
vfs_mkdir2+0x1bc/0x2e8
SyS_mkdirat+0x154/0x23c
el0_svc_naked+0x34/0x38

Freed by task 13096:
kasan_slab_free+0xb8/0x194
kfree+0x13c/0x910
package_details_release+0x524/0x56c
kref_put+0xc4/0x104
config_item_put+0x24/0x34
__configfs_open_file+0x35c/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
el0_svc_naked+0x34/0x38

To fix this issue, remove the config_item_put in
__configfs_open_file to balance the refcount of config_item.

Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals")
Cc: Al Viro 
Cc: Joel Becker 
Cc: Christoph Hellwig 
Signed-off-by: Daiyue Zhang 
Signed-off-by: Yi Chen 
Signed-off-by: Ge Qiu 
Reviewed-by: Chao Yu 
---
   fs/configfs/file.c | 6 ++
   1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/configfs/file.c b/fs/configfs/file.c
index 1f0270229d7b..da8351d1e455 100644
--- a/fs/configfs/file.c
+++ b/fs/configfs/file.c
@@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
   
   	attr = to_attr(dentry);

if (!attr)
-   goto out_put_item;
+   goto out_free_buffer;
   
   	if (type & CONFIGFS_ITEM_BIN_ATTR) {

buffer->bin_attr = to_bin_attr(dentry);
@@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
/* Grab the module reference for this attribute if we have one */
error = -ENODEV;
if (!try_module_get(buffer->owner))
-   goto out_put_item;
+   goto out_free_buffer;
   
   	error = -EACCES;

if (!buffer->item->ci_type)
@@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
   
   out_put_module:

module_put(buffer->owner);
-out_put_item:
-   config_item_put(buffer->item);
   out_free_buffer:
up_read(&frag->frag_sem);
kfree(buffer);


.

Re: [f2fs-dev] [PATCH v2] f2fs: add sysfs nodes to get accumulated compression info

2021-03-09 Thread Chao Yu


On 2021/3/9 21:00, Daeho Jeong wrote:

2021년 3월 9일 (화) 오후 6:22, Chao Yu 님이 작성:


On 2021/3/5 10:24, Daeho Jeong wrote:

From: Daeho Jeong 

Added acc_compr_inodes to show accumulated compressed inode count and
acc_compr_blocks to show accumulated secured block count with


I noticed that these stat numbers are recorded in extra reserved area in
hot node curseg journal, the journal will be persisted only for umount
or fastboot checkpoint, so the numbers are not so accurate... does this
satisfy your requirement?



Yes, we are satisfied with just getting rough number of them. But, it


Alright,


would be better if you suggest more accurate way. :)


I think this is the cheapest way to store rough number, otherwise it needs to 
change
f2fs_checkpoint structure layout or add a new inner inode to persist these stat
numbers if we want more accurate one.




compression in sysfs. These can be re-initialized to "0" by writing "0"
value in one of both.


Why do we allow reset the stat numbers?



Actually, I want to have a way to clear any stale number of them, but
I agree we don't need this.


Why not covering all code with macro CONFIG_F2FS_FS_COMPRESSION, since these
numbers are only be updated when we enable compression.



I wanted to keep the info even in the kernel with doesn't support
per-file compression if those had been written once. What do you
think?


Sure, if so it's fine to me. :)

Thanks,




Thanks,

.

Re: [f2fs-dev] [PATCH v2] f2fs: expose # of overprivision segments

2021-03-09 Thread Chao Yu


On 2021/3/4 3:28, Jaegeuk Kim wrote:

This is useful when checking conditions during checkpoint=disable in Android.

Signed-off-by: Jaegeuk Kim 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH v2] f2fs: add sysfs nodes to get accumulated compression info

2021-03-09 Thread Chao Yu


On 2021/3/5 10:24, Daeho Jeong wrote:

From: Daeho Jeong 

Added acc_compr_inodes to show accumulated compressed inode count and
acc_compr_blocks to show accumulated secured block count with


I noticed that these stat numbers are recorded in extra reserved area in
hot node curseg journal, the journal will be persisted only for umount
or fastboot checkpoint, so the numbers are not so accurate... does this
satisfy your requirement?


compression in sysfs. These can be re-initialized to "0" by writing "0"
value in one of both.


Why do we allow reset the stat numbers?

Why not covering all code with macro CONFIG_F2FS_FS_COMPRESSION, since these
numbers are only be updated when we enable compression.

Thanks,

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-08 Thread Chao Yu


On 2021/3/9 8:07, Jaegeuk Kim wrote:

On 03/05, Chao Yu wrote:

On 2021/3/5 1:50, Jaegeuk Kim wrote:

On 03/04, Chao Yu wrote:

On 2021/3/3 2:44, Jaegeuk Kim wrote:

On 03/02, Jaegeuk Kim wrote:

On 03/02, Chao Yu wrote:

On 2021/3/2 13:42, Jaegeuk Kim wrote:

This is useful when checking conditions during checkpoint=disable in Android.


This sysfs entry is readonly, how about putting this at
/sys/fs/f2fs//stat/?


Urg.. "stat" is a bit confused. I'll take a look a better ones.


Oh, I mean put it into "stat" directory, not "stat" entry, something like this:

/sys/fs/f2fs//stat/ovp_segments


I meant that too. Why is it like stat, since it's a geomerty?


Hmm.. I feel a little bit weired to treat ovp_segments as 'stat' class, one 
reason
is ovp_segments is readonly and is matching the readonly attribute of a stat.


It seems I don't fully understand what you suggest here. I don't want to add the
# of ovp_segments in /stat, since it is not part of status, but put it in
/ to sync with other # of free/dirty segments. If you can't read out 
easily,
I suggest to create symlinks to organize all the current mess.


Alright.











Taking a look at other entries using in Android, I feel that this one can't be
in stat or whatever other location, since I worry about the consistency with
similar dirty/free segments. It seems it's not easy to clean up the existing
ones anymore.


Well, actually, the entry number are still increasing continuously, the result 
is
that it becomes more and more slower and harder for me to find target entry name
from that directory.

IMO, once new readonly entry was added to "" directory, there is no chance
to reloacate it due to interface compatibility. So I think this is the only
chance to put it to the appropriate place at this time.


I know, but this will diverge those info into different places. I don't have
big concern when finding a specific entry with this tho, how about making
symlinks to create a dir structure for your easy access? Or, using a script
would be alternative way.


Yes, there should be some alternative ways to help to access f2fs sysfs
interface, but from a point view of user, I'm not sure he can figure out those
ways.

For those fs meta stat, why not adding a single entry to include all info you
need rather than adding them one by one? e.g.


You can add that in /proc as well, which requires to parse back when retrieving
specific values.


Copied.

Thanks,





/proc/fs/f2fs//super_block
/proc/fs/f2fs//checkpoint
/proc/fs/f2fs//nat_table
/proc/fs/f2fs//sit_table
...

Thanks,





Thanks,









Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/sysfs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..254b6fa17406 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
 }
+static ssize_t ovp_segments_show(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
 static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
 {
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
 F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
 F2FS_GENERAL_RO_ATTR(dirty_segments);
 F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);


Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?


Yeah, thanks.



Thanks,


 F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
 F2FS_GENERAL_RO_ATTR(features);
 F2FS_GENERAL_RO_ATTR(current_reserved_blocks);




___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

.


.


.

Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-03-08 Thread Chao Yu


On 2021/3/9 8:01, Jaegeuk Kim wrote:

On 03/05, Chao Yu wrote:

On 2021/3/5 4:20, Jaegeuk Kim wrote:

On 02/27, Jaegeuk Kim wrote:

On 02/04, Chao Yu wrote:

Jaegeuk,

On 2021/2/2 16:00, Chao Yu wrote:

-   for (i = 0; i < dic->nr_cpages; i++) {
+   for (i = 0; i < cc->nr_cpages; i++) {
struct page *page = dic->cpages[i];


por_fsstress still hang in this line?


I'm stuck on testing the patches, since the latest kernel is panicking somehow.
Let me update later, once I can test a bit. :(


It seems this works without error.
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=4e6e1364dccba80ed44925870b97fbcf989b96c9


Ah, good news.

Thanks for helping to test the patch. :)


Hmm, I hit this again. Let me check w/o compress_cache back. :(


Oops :(

Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change

2021-03-07 Thread Chao Yu


On 2021/3/8 10:36, Gao Xiang wrote:

Hi Chao,

On Mon, Mar 08, 2021 at 09:29:30AM +0800, Chao Yu wrote:

On 2021/3/6 12:04, Gao Xiang wrote:

From: Gao Xiang 

Martin reported an issue that directory read could be hung on the
latest -rc kernel with some certain image. The root cause is that
commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated
vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs
is set as actual allocated vector number rather than the requested
number now.

Let's avoid using .bi_max_vecs completely instead.

Reported-by: Martin DEVERA 
Signed-off-by: Gao Xiang 


Looks good to me, btw, it needs to Cc stable mailing list?

Reviewed-by: Chao Yu 



Thanks for your review. <= 5.11 kernels are not impacted.
For now, this only impacts 5.12-rc due to a bio behavior
change (see commit baa2c7c97153). So personally I think
just leave as it is is fine.


Okay, so that's fine if you send pull request before 5.12 formal release. ;)

Thanks,



Thanks,
Gao Xiang


Thanks,



.

Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change

2021-03-07 Thread Chao Yu


On 2021/3/6 12:04, Gao Xiang wrote:

From: Gao Xiang 

Martin reported an issue that directory read could be hung on the
latest -rc kernel with some certain image. The root cause is that
commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated
vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs
is set as actual allocated vector number rather than the requested
number now.

Let's avoid using .bi_max_vecs completely instead.

Reported-by: Martin DEVERA 
Signed-off-by: Gao Xiang 


Looks good to me, btw, it needs to Cc stable mailing list?

Reviewed-by: Chao Yu 

Thanks,

[PATCH] f2fs: fix to align to section for fallocate() on pinned file

2021-03-05 Thread Chao Yu

Now, fallocate() on a pinned file only allocates blocks which aligns
to segment rather than section, so GC may try to migrate pinned file's
block, and after several times of failure, pinned file's block could
be migrated to other place, however user won't be aware of such
condition, and then old obsolete block address may be readed/written
incorrectly.

To avoid such condition, let's try to allocate pinned file's blocks
with section alignment.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h|  2 +-
 fs/f2fs/file.c|  2 +-
 fs/f2fs/segment.c | 34 ++
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 700ef46e4147..ff6c212bf2c5 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3399,7 +3399,7 @@ void f2fs_get_new_segment(struct f2fs_sb_info *sbi,
unsigned int *newseg, bool new_sec, int dir);
 void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
unsigned int start, unsigned int end);
-void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type);
+void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type);
 void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
 int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
 bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 1863944f4073..1b70b56434e4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1666,7 +1666,7 @@ static int expand_inode_data(struct inode *inode, loff_t 
offset,
down_write(&sbi->pin_sem);
 
f2fs_lock_op(sbi);
-   f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA_PINNED);
+   f2fs_allocate_new_section(sbi, CURSEG_COLD_DATA_PINNED);
f2fs_unlock_op(sbi);
 
map.m_seg_type = CURSEG_COLD_DATA_PINNED;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b26217910b85..ff50e79d2bb7 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2918,7 +2918,8 @@ void f2fs_allocate_segment_for_resize(struct f2fs_sb_info 
*sbi, int type,
up_read(&SM_I(sbi)->curseg_lock);
 }
 
-static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type)
+static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type,
+   bool new_sec)
 {
struct curseg_info *curseg = CURSEG_I(sbi, type);
unsigned int old_segno;
@@ -2926,10 +2927,22 @@ static void __allocate_new_segment(struct f2fs_sb_info 
*sbi, int type)
if (!curseg->inited)
goto alloc;
 
-   if (!curseg->next_blkoff &&
-   !get_valid_blocks(sbi, curseg->segno, false) &&
-   !get_ckpt_valid_blocks(sbi, curseg->segno))
-   return;
+   if (curseg->next_blkoff ||
+   get_valid_blocks(sbi, curseg->segno, new_sec))
+   goto alloc;
+
+   if (new_sec) {
+   unsigned int segno = START_SEGNO(curseg->segno);
+   int i;
+
+   for (i = 0; i < sbi->segs_per_sec; i++, segno++) {
+   if (get_ckpt_valid_blocks(sbi, segno))
+   goto alloc;
+   }
+   } else {
+   if (!get_ckpt_valid_blocks(sbi, curseg->segno))
+   return;
+   }
 
 alloc:
old_segno = curseg->segno;
@@ -2937,10 +2950,15 @@ static void __allocate_new_segment(struct f2fs_sb_info 
*sbi, int type)
locate_dirty_segment(sbi, old_segno);
 }
 
-void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type)
+static void __allocate_new_section(struct f2fs_sb_info *sbi, int type)
+{
+   __allocate_new_segment(sbi, type, true);
+}
+
+void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type)
 {
down_write(&SIT_I(sbi)->sentry_lock);
-   __allocate_new_segment(sbi, type);
+   __allocate_new_section(sbi, type);
up_write(&SIT_I(sbi)->sentry_lock);
 }
 
@@ -2950,7 +2968,7 @@ void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
 
down_write(&SIT_I(sbi)->sentry_lock);
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++)
-   __allocate_new_segment(sbi, i);
+   __allocate_new_segment(sbi, i, false);
up_write(&SIT_I(sbi)->sentry_lock);
 }
 
-- 
2.29.2

Re: [PATCH] configfs: Fix use-after-free issue in __configfs_open_file

2021-03-04 Thread Chao Yu


+Cc fsdevel

Ping,

Any comments one this patch?

On 2021/3/1 14:10, Chao Yu wrote:

From: Daiyue Zhang 

Commit b0841eefd969 ("configfs: provide exclusion between IO and removals")
uses ->frag_dead to mark the fragment state, thus no bothering with extra
refcount on config_item when opening a file. The configfs_get_config_item
was removed in __configfs_open_file, but not with config_item_put. So the
refcount on config_item will lost its balance, causing use-after-free
issues in some occasions like this:

Test:
1. Mount configfs on /config with read-only items:
drwxrwx--- 289 root   root0 2021-04-01 11:55 /config
drwxr-xr-x   2 root   root0 2021-04-01 11:54 /config/a
--w--w--w-   1 root   root 4096 2021-04-01 11:53 /config/a/1.txt
..

2. Then run:
for file in /config
do
echo $file
grep -R 'key' $file
done

3. __configfs_open_file will be called in parallel, the first one
got called will do:
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO))
goto out_put_module;
config_item_put(buffer->item);
kref_put()
package_details_release()
kfree()

the other one will run into use-after-free issues like this:
BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0
Read of size 8 at addr fff155f02480 by task grep/13096
CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW   4.14.116-kasan #1
TGID: 13096 Comm: grep
Call trace:
dump_stack+0x118/0x160
kasan_report+0x22c/0x294
__asan_load8+0x80/0x88
__configfs_open_file+0x1bc/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48

Allocated by task 2138:
kasan_kmalloc+0xe0/0x1ac
kmem_cache_alloc_trace+0x334/0x394
packages_make_item+0x4c/0x180
configfs_mkdir+0x358/0x740
vfs_mkdir2+0x1bc/0x2e8
SyS_mkdirat+0x154/0x23c
el0_svc_naked+0x34/0x38

Freed by task 13096:
kasan_slab_free+0xb8/0x194
kfree+0x13c/0x910
package_details_release+0x524/0x56c
kref_put+0xc4/0x104
config_item_put+0x24/0x34
__configfs_open_file+0x35c/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
el0_svc_naked+0x34/0x38

To fix this issue, remove the config_item_put in
__configfs_open_file to balance the refcount of config_item.

Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals")
Cc: Al Viro 
Cc: Joel Becker 
Cc: Christoph Hellwig 
Signed-off-by: Daiyue Zhang 
Signed-off-by: Yi Chen 
Signed-off-by: Ge Qiu 
Reviewed-by: Chao Yu 
---
  fs/configfs/file.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/configfs/file.c b/fs/configfs/file.c
index 1f0270229d7b..da8351d1e455 100644
--- a/fs/configfs/file.c
+++ b/fs/configfs/file.c
@@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
  
  	attr = to_attr(dentry);

if (!attr)
-   goto out_put_item;
+   goto out_free_buffer;
  
  	if (type & CONFIGFS_ITEM_BIN_ATTR) {

buffer->bin_attr = to_bin_attr(dentry);
@@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
/* Grab the module reference for this attribute if we have one */
error = -ENODEV;
if (!try_module_get(buffer->owner))
-   goto out_put_item;
+   goto out_free_buffer;
  
  	error = -EACCES;

if (!buffer->item->ci_type)
@@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
  
  out_put_module:

module_put(buffer->owner);
-out_put_item:
-   config_item_put(buffer->item);
  out_free_buffer:
up_read(&frag->frag_sem);
kfree(buffer);

Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-03-04 Thread Chao Yu


On 2021/3/5 4:20, Jaegeuk Kim wrote:

On 02/27, Jaegeuk Kim wrote:

On 02/04, Chao Yu wrote:

Jaegeuk,

On 2021/2/2 16:00, Chao Yu wrote:

-   for (i = 0; i < dic->nr_cpages; i++) {
+   for (i = 0; i < cc->nr_cpages; i++) {
struct page *page = dic->cpages[i];


por_fsstress still hang in this line?


I'm stuck on testing the patches, since the latest kernel is panicking somehow.
Let me update later, once I can test a bit. :(


It seems this works without error.
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=4e6e1364dccba80ed44925870b97fbcf989b96c9


Ah, good news.

Thanks for helping to test the patch. :)

Thanks,







Thanks,


block_t blkaddr;
struct bio_post_read_ctx *ctx;
@@ -2201,6 +2207,14 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, 
struct bio **bio_ret,
blkaddr = data_blkaddr(dn.inode, dn.node_page,
dn.ofs_in_node + i + 1);
+   f2fs_wait_on_block_writeback(inode, blkaddr);
+
+   if (f2fs_load_compressed_page(sbi, page, blkaddr)) {
+   if (atomic_dec_and_test(&dic->remaining_pages))
+   f2fs_decompress_cluster(dic);
+   continue;
+   }
+



___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

.

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-04 Thread Chao Yu


On 2021/3/5 1:50, Jaegeuk Kim wrote:

On 03/04, Chao Yu wrote:

On 2021/3/3 2:44, Jaegeuk Kim wrote:

On 03/02, Jaegeuk Kim wrote:

On 03/02, Chao Yu wrote:

On 2021/3/2 13:42, Jaegeuk Kim wrote:

This is useful when checking conditions during checkpoint=disable in Android.


This sysfs entry is readonly, how about putting this at
/sys/fs/f2fs//stat/?


Urg.. "stat" is a bit confused. I'll take a look a better ones.


Oh, I mean put it into "stat" directory, not "stat" entry, something like this:

/sys/fs/f2fs//stat/ovp_segments


I meant that too. Why is it like stat, since it's a geomerty?


Hmm.. I feel a little bit weired to treat ovp_segments as 'stat' class, one 
reason
is ovp_segments is readonly and is matching the readonly attribute of a stat.







Taking a look at other entries using in Android, I feel that this one can't be
in stat or whatever other location, since I worry about the consistency with
similar dirty/free segments. It seems it's not easy to clean up the existing
ones anymore.


Well, actually, the entry number are still increasing continuously, the result 
is
that it becomes more and more slower and harder for me to find target entry name
from that directory.

IMO, once new readonly entry was added to "" directory, there is no chance
to reloacate it due to interface compatibility. So I think this is the only
chance to put it to the appropriate place at this time.


I know, but this will diverge those info into different places. I don't have
big concern when finding a specific entry with this tho, how about making
symlinks to create a dir structure for your easy access? Or, using a script
would be alternative way.


Yes, there should be some alternative ways to help to access f2fs sysfs
interface, but from a point view of user, I'm not sure he can figure out those
ways.

For those fs meta stat, why not adding a single entry to include all info you
need rather than adding them one by one? e.g.

/proc/fs/f2fs//super_block
/proc/fs/f2fs//checkpoint
/proc/fs/f2fs//nat_table
/proc/fs/f2fs//sit_table
...

Thanks,





Thanks,









Signed-off-by: Jaegeuk Kim 
---
fs/f2fs/sysfs.c | 8 
1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..254b6fa17406 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
}
+static ssize_t ovp_segments_show(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
{
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
F2FS_GENERAL_RO_ATTR(dirty_segments);
F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);


Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?


Yeah, thanks.



Thanks,


F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
F2FS_GENERAL_RO_ATTR(features);
F2FS_GENERAL_RO_ATTR(current_reserved_blocks);




___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

.


.

Re: [f2fs-dev] [PATCH] f2fs: fix a redundant call to f2fs_balance_fs if an error occurs

2021-03-04 Thread Chao Yu


On 2021/3/4 17:21, Colin King wrote:

From: Colin Ian King 

The  uninitialized variable dn.node_changed does not get set when a
call to f2fs_get_node_page fails.  This uninitialized value gets used
in the call to f2fs_balance_fs() that may or not may not balances
dirty node and dentry pages depending on the uninitialized state of
the variable. Fix this by only calling f2fs_balance_fs if err is
not set.

Thanks to Jaegeuk Kim for suggesting an appropriate fix.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 2a3407607028 ("f2fs: call f2fs_balance_fs only when node was changed")
Signed-off-by: Colin Ian King 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-03 Thread Chao Yu


On 2021/3/3 2:44, Jaegeuk Kim wrote:

On 03/02, Jaegeuk Kim wrote:

On 03/02, Chao Yu wrote:

On 2021/3/2 13:42, Jaegeuk Kim wrote:

This is useful when checking conditions during checkpoint=disable in Android.


This sysfs entry is readonly, how about putting this at
/sys/fs/f2fs//stat/?


Urg.. "stat" is a bit confused. I'll take a look a better ones.


Oh, I mean put it into "stat" directory, not "stat" entry, something like this:

/sys/fs/f2fs//stat/ovp_segments



Taking a look at other entries using in Android, I feel that this one can't be
in stat or whatever other location, since I worry about the consistency with
similar dirty/free segments. It seems it's not easy to clean up the existing
ones anymore.


Well, actually, the entry number are still increasing continuously, the result 
is
that it becomes more and more slower and harder for me to find target entry name
from that directory.

IMO, once new readonly entry was added to "" directory, there is no chance
to reloacate it due to interface compatibility. So I think this is the only
chance to put it to the appropriate place at this time.

Thanks,









Signed-off-by: Jaegeuk Kim 
---
   fs/f2fs/sysfs.c | 8 
   1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..254b6fa17406 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
   }
+static ssize_t ovp_segments_show(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
   static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
   {
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
   F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
   F2FS_GENERAL_RO_ATTR(dirty_segments);
   F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);


Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?


Yeah, thanks.



Thanks,


   F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
   F2FS_GENERAL_RO_ATTR(features);
   F2FS_GENERAL_RO_ATTR(current_reserved_blocks);




___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

.

Re: [PATCH] f2fs: fix compile warning

2021-03-02 Thread Chao Yu


On 2021/3/2 16:44, Chao Yu wrote:

node.c:2750:13: note: in expansion of macro ‘min’
nrpages = min(last_offset - i, BIO_MAX_PAGES);


Oh, after I rebase to last dev branch, compile warning is gone as
we changed to use bio_max_segs() rather than min().

Please ignore this patch.

Thanks,

[PATCH] f2fs: remove unused file_clear_encrypt()

2021-03-02 Thread Chao Yu

- file_clear_encrypt() was never be used, remove it.
- In addition, relocating macros for cleanup.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4fc343aa0a08..ddd54f5cf932 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -637,21 +637,26 @@ enum {
 #define FADVISE_MODIFIABLE_BITS(FADVISE_COLD_BIT | FADVISE_HOT_BIT)
 
 #define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT)
-#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
 #define file_set_cold(inode)   set_file(inode, FADVISE_COLD_BIT)
-#define file_lost_pino(inode)  set_file(inode, FADVISE_LOST_PINO_BIT)
 #define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT)
+
+#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
+#define file_lost_pino(inode)  set_file(inode, FADVISE_LOST_PINO_BIT)
 #define file_got_pino(inode)   clear_file(inode, FADVISE_LOST_PINO_BIT)
+
 #define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT)
 #define file_set_encrypt(inode)set_file(inode, FADVISE_ENCRYPT_BIT)
-#define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT)
+
 #define file_enc_name(inode)   is_file(inode, FADVISE_ENC_NAME_BIT)
 #define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT)
+
 #define file_keep_isize(inode) is_file(inode, FADVISE_KEEP_SIZE_BIT)
 #define file_set_keep_isize(inode) set_file(inode, FADVISE_KEEP_SIZE_BIT)
+
 #define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
 #define file_set_hot(inode)set_file(inode, FADVISE_HOT_BIT)
 #define file_clear_hot(inode)  clear_file(inode, FADVISE_HOT_BIT)
+
 #define file_is_verity(inode)  is_file(inode, FADVISE_VERITY_BIT)
 #define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT)
 
-- 
2.29.2

[PATCH] f2fs: fix compile warning

2021-03-02 Thread Chao Yu

node.c: In function ‘f2fs_restore_node_summary’:
./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types 
lacks a cast
  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
^
./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
   (__typecheck(x, y) && __no_side_effects(x, y))
^
./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
  __builtin_choose_expr(__safe_cmp(x, y), \
^
./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
 #define min(x, y) __careful_cmp(x, y, <)
   ^
node.c:2750:13: note: in expansion of macro ‘min’
   nrpages = min(last_offset - i, BIO_MAX_PAGES);

Use min_t() rather than min() to do type cast before comparing.

Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index a8a0fb890e8d..77f9ffaf9b8e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2747,7 +2747,8 @@ int f2fs_restore_node_summary(struct f2fs_sb_info *sbi,
sum_entry = &sum->entries[0];
 
for (i = 0; i < last_offset; i += nrpages, addr += nrpages) {
-   nrpages = min(last_offset - i, BIO_MAX_PAGES);
+   nrpages = min_t(unsigned long, last_offset - i,
+   BIO_MAX_PAGES);
 
/* readahead node pages */
f2fs_ra_meta_pages(sbi, addr, nrpages, META_POR, true);
-- 
2.29.2

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-02 Thread Chao Yu


On 2021/3/2 13:42, Jaegeuk Kim wrote:

This is useful when checking conditions during checkpoint=disable in Android.


This sysfs entry is readonly, how about putting this at
/sys/fs/f2fs//stat/?



Signed-off-by: Jaegeuk Kim 
---
  fs/f2fs/sysfs.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..254b6fa17406 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
  }
  
+static ssize_t ovp_segments_show(struct f2fs_attr *a,

+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
  static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
  {
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
  F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
  F2FS_GENERAL_RO_ATTR(dirty_segments);
  F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);


Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?

Thanks,


  F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
  F2FS_GENERAL_RO_ATTR(features);
  F2FS_GENERAL_RO_ATTR(current_reserved_blocks);

Re: [f2fs-dev] [PATCH v2 3/3] f2fs: check if swapfile is section-alligned

2021-03-01 Thread Chao Yu


On 2021/3/1 12:58, Huang Jianan via Linux-f2fs-devel wrote:

If the swapfile isn't created by pin and fallocate, it can't be
guaranteed section-aligned, so it may be selected by f2fs gc. When
gc_pin_file_threshold is reached, the address of swapfile may change,
but won't be synchronized to swap_extent, so swap will write to wrong
address, which will cause data corruption.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 


Reviewed-by: Chao Yu 

Thanks,

[PATCH] configfs: Fix use-after-free issue in __configfs_open_file

2021-02-28 Thread Chao Yu

From: Daiyue Zhang 

Commit b0841eefd969 ("configfs: provide exclusion between IO and removals")
uses ->frag_dead to mark the fragment state, thus no bothering with extra
refcount on config_item when opening a file. The configfs_get_config_item
was removed in __configfs_open_file, but not with config_item_put. So the
refcount on config_item will lost its balance, causing use-after-free
issues in some occasions like this:

Test:
1. Mount configfs on /config with read-only items:
drwxrwx--- 289 root   root0 2021-04-01 11:55 /config
drwxr-xr-x   2 root   root0 2021-04-01 11:54 /config/a
--w--w--w-   1 root   root 4096 2021-04-01 11:53 /config/a/1.txt
..

2. Then run:
for file in /config
do
echo $file
grep -R 'key' $file
done

3. __configfs_open_file will be called in parallel, the first one
got called will do:
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO))
goto out_put_module;
config_item_put(buffer->item);
kref_put()
package_details_release()
kfree()

the other one will run into use-after-free issues like this:
BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0
Read of size 8 at addr fff155f02480 by task grep/13096
CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW   4.14.116-kasan #1
TGID: 13096 Comm: grep
Call trace:
dump_stack+0x118/0x160
kasan_report+0x22c/0x294
__asan_load8+0x80/0x88
__configfs_open_file+0x1bc/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48

Allocated by task 2138:
kasan_kmalloc+0xe0/0x1ac
kmem_cache_alloc_trace+0x334/0x394
packages_make_item+0x4c/0x180
configfs_mkdir+0x358/0x740
vfs_mkdir2+0x1bc/0x2e8
SyS_mkdirat+0x154/0x23c
el0_svc_naked+0x34/0x38

Freed by task 13096:
kasan_slab_free+0xb8/0x194
kfree+0x13c/0x910
package_details_release+0x524/0x56c
kref_put+0xc4/0x104
config_item_put+0x24/0x34
__configfs_open_file+0x35c/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
el0_svc_naked+0x34/0x38

To fix this issue, remove the config_item_put in
__configfs_open_file to balance the refcount of config_item.

Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals")
Cc: Al Viro 
Cc: Joel Becker 
Cc: Christoph Hellwig 
Signed-off-by: Daiyue Zhang 
Signed-off-by: Yi Chen 
Signed-off-by: Ge Qiu 
Reviewed-by: Chao Yu 
---
 fs/configfs/file.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/configfs/file.c b/fs/configfs/file.c
index 1f0270229d7b..da8351d1e455 100644
--- a/fs/configfs/file.c
+++ b/fs/configfs/file.c
@@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
 
attr = to_attr(dentry);
if (!attr)
-   goto out_put_item;
+   goto out_free_buffer;
 
if (type & CONFIGFS_ITEM_BIN_ATTR) {
buffer->bin_attr = to_bin_attr(dentry);
@@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
/* Grab the module reference for this attribute if we have one */
error = -ENODEV;
if (!try_module_get(buffer->owner))
-   goto out_put_item;
+   goto out_free_buffer;
 
error = -EACCES;
if (!buffer->item->ci_type)
@@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct 
file *file, int type
 
 out_put_module:
module_put(buffer->owner);
-out_put_item:
-   config_item_put(buffer->item);
 out_free_buffer:
up_read(&frag->frag_sem);
kfree(buffer);
-- 
2.29.2

Re: [f2fs-dev] [PATCH][next] f2fs: Replace one-element array with flexible-array member

2021-02-28 Thread Chao Yu


On 2021/2/28 13:00, Jaegeuk Kim wrote:

On 02/25, Chao Yu wrote:

Hello, Gustavo,

On 2021/2/25 3:03, Gustavo A. R. Silva wrote:

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].


I proposal to do the similar cleanup, and I've no objection on doing this.

https://lore.kernel.org/patchwork/patch/869440/

Let's ask for Jaegeuk' opinion.


Merged, thanks.
This looks better reason than code readability. :)


Agreed.

Reviewed-by: Chao Yu 

Thanks,







Refactor the code according to the use of a flexible-array member in
struct f2fs_checkpoint, instead of a one-element arrays.

Notice that a temporary pointer to void '*tmp_ptr' was used in order to
fix the following errors when using a flexible array instead of a one
element array in struct f2fs_checkpoint:

CC [M]  fs/f2fs/dir.o
In file included from fs/f2fs/dir.c:13:
fs/f2fs/f2fs.h: In function ‘__bitmap_ptr’:
fs/f2fs/f2fs.h:2227:40: error: invalid use of flexible array member
   2227 |   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
|^
fs/f2fs/f2fs.h:2227:49: error: invalid use of flexible array member
   2227 |   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
| ^
fs/f2fs/f2fs.h:2238:40: error: invalid use of flexible array member
   2238 |   return &ckpt->sit_nat_version_bitmap + offset;
|^
make[2]: *** [scripts/Makefile.build:287: fs/f2fs/dir.o] Error 1
make[1]: *** [scripts/Makefile.build:530: fs/f2fs] Error 2
make: *** [Makefile:1819: fs] Error 2

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] 
https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Build-tested-by: kernel test robot 
Link: https://lore.kernel.org/lkml/603647e4.deefbl4eqljuwaue%25...@intel.com/
Signed-off-by: Gustavo A. R. Silva 
---
   fs/f2fs/f2fs.h  | 5 +++--
   include/linux/f2fs_fs.h | 2 +-
   2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e2d302ae3a46..3f5cb097c30f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2215,6 +2215,7 @@ static inline block_t __cp_payload(struct f2fs_sb_info 
*sbi)
   static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
   {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+   void *tmp_ptr = &ckpt->sit_nat_version_bitmap;
int offset;
if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) {
@@ -2224,7 +2225,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
*sbi, int flag)
 * if large_nat_bitmap feature is enabled, leave checksum
 * protection for all nat/sit bitmaps.
 */
-   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
+   return tmp_ptr + offset + sizeof(__le32);
}
if (__cp_payload(sbi) > 0) {
@@ -2235,7 +2236,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
*sbi, int flag)
} else {
offset = (flag == NAT_BITMAP) ?
le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0;
-   return &ckpt->sit_nat_version_bitmap + offset;
+   return tmp_ptr + offset;
}
   }
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index c6cc0a566ef5..5487a80617a3 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -168,7 +168,7 @@ struct f2fs_checkpoint {
unsigned char alloc_type[MAX_ACTIVE_LOGS];
/* SIT and NAT version bitmap */
-   unsigned char sit_nat_version_bitmap[1];
+   unsigned char sit_nat_version_bitmap[];
   } __packed;
   #define CP_CHKSUM_OFFSET 4092/* default chksum offset in checkpoint 
*/


.

Re: [f2fs-dev] [PATCH 3/3] f2fs: check if swapfile is section-alligned

2021-02-28 Thread Chao Yu


Hi Jianan,

On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote:

If the swapfile isn't created by pin and fallocate, it cann't be


Typo:

can't


guaranteed section-aligned, so it may be selected by f2fs gc. When
gc_pin_file_threshold is reached, the address of swapfile may change,
but won't be synchroniz to swap_extent, so swap will write to wrong


synchronized


address, which will cause data corruption.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
---
  fs/f2fs/data.c | 63 ++
  1 file changed, 63 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 4dbc1cafc55d..3e523d6e4643 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3781,11 +3781,63 @@ int f2fs_migrate_page(struct address_space *mapping,
  #endif
  
  #ifdef CONFIG_SWAP

+static int f2fs_check_file_aligned(struct inode *inode)


f2fs_check_file_alignment() or f2fs_is_file_aligned()?


+{
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+   block_t main_blkaddr = SM_I(sbi)->main_blkaddr;
+   block_t cur_lblock;
+   block_t last_lblock;
+   block_t pblock;
+   unsigned long len;
+   unsigned long nr_pblocks;
+   unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec;


unsigned int blocks_per_sec = BLKS_PER_SEC(sbi);


+   int ret;
+
+   cur_lblock = 0;
+   last_lblock = bytes_to_blks(inode, i_size_read(inode));
+   len = i_size_read(inode);
+
+   while (cur_lblock < last_lblock) {
+   struct f2fs_map_blocks map;
+   pgoff_t next_pgofs;
+
+   memset(&map, 0, sizeof(map));
+   map.m_lblk = cur_lblock;
+   map.m_len = bytes_to_blks(inode, len) - cur_lblock;


map.m_len = last_lblock - cur_lblock;


+   map.m_next_pgofs = &next_pgofs;


map.m_next_pgofs = NULL;
map.m_next_extent = NULL;


+   map.m_seg_type = NO_CHECK_TYPE;


map.m_may_create = false;


+
+   ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP);
+


Unneeded blank line.


+   if (ret)
+   goto err_out;
+
+   /* hole */
+   if (!(map.m_flags & F2FS_MAP_FLAGS))


ret = -ENOENT;


+   goto err_out;
+
+   pblock = map.m_pblk;
+   nr_pblocks = map.m_len;
+
+   if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
+   nr_pblocks & (blocks_per_sec - 1))


ret = -EINVAL;


+   goto err_out;
+
+   cur_lblock += nr_pblocks;
+   }
+
+   return 0;
+err_out:
+   pr_err("swapon: swapfile isn't section-aligned\n");


We should show above message only after we fail in check condition:

if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
nr_pblocks & (blocks_per_sec - 1)) {
f2fs_err(sbi, "Swapfile does not align to section");
goto err_out;
}

And please use f2fs_{err,warn,info..} macro rather than pr_{err,warn,info..}.

Could you please fix above related issues in check_swap_activate_fast() as well.


+   return -EINVAL;


return ret;


+}
+
  static int check_swap_activate_fast(struct swap_info_struct *sis,
struct file *swap_file, sector_t *span)
  {
struct address_space *mapping = swap_file->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
sector_t cur_lblock;
sector_t last_lblock;
sector_t pblock;
@@ -3793,6 +3845,7 @@ static int check_swap_activate_fast(struct 
swap_info_struct *sis,
sector_t highest_pblock = 0;
int nr_extents = 0;
unsigned long nr_pblocks;
+   unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec;


Ditto,


u64 len;
int ret;
  
@@ -3827,6 +3880,13 @@ static int check_swap_activate_fast(struct swap_info_struct *sis,

pblock = map.m_pblk;
nr_pblocks = map.m_len;
  
+		if ((pblock - SM_I(sbi)->main_blkaddr) & (blocks_per_sec - 1) ||

+   nr_pblocks & (blocks_per_sec - 1)) {
+   pr_err("swapon: swapfile isn't section-aligned\n");


Ditto,


+   ret = -EINVAL;
+   goto out;
+   }
+
if (cur_lblock + nr_pblocks >= sis->max)
nr_pblocks = sis->max - cur_lblock;
  
@@ -3878,6 +3938,9 @@ static int check_swap_activate(struct swap_info_struct *sis,

if (PAGE_SIZE == F2FS_BLKSIZE)
return check_swap_activate_fast(sis, swap_file, span);
  
+	if (f2fs_check_file_aligned(inode))


ret = f2fs_check_file_aligned();
if (ret)
return ret;

Thanks,


+   return -EINVAL;
+
blocks_per_page = bytes_to_blks(inode, PAGE_SIZE);
  
  	/*

Re: [f2fs-dev] [PATCH 2/3] f2fs: fix last_lblock check in check_swap_activate_fast

2021-02-28 Thread Chao Yu


On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote:

Because page_no < sis->max guarantees that the while loop break out
normally, the wrong check contidion here doesn't cause a problem.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
---
  fs/f2fs/data.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a1498a1a345c..4dbc1cafc55d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -3804,7 +3804,7 @@ static int check_swap_activate_fast(struct 
swap_info_struct *sis,
last_lblock = bytes_to_blks(inode, i_size_read(inode));
len = i_size_read(inode);
  
-	while (cur_lblock <= last_lblock && cur_lblock < sis->max) {

+   while (cur_lblock + 1 <= last_lblock && cur_lblock < sis->max) {


while (cur_lblock < last_lblock && cur_lblock < sis->max) {

It's nitpick, if necessary, I guess Jaegeuk could help to change this
while merging.

Reviewed-by: Chao Yu 

Thanks,


struct f2fs_map_blocks map;
pgoff_t next_pgofs;

Re: [f2fs-dev] [PATCH 1/3] f2fs: remove unnecessary IS_SWAPFILE check

2021-02-28 Thread Chao Yu


On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote:

Now swapfile in f2fs directly submit IO to blockdev according to
swapfile extents reported by f2fs when swapon, therefore there is
no need to check IS_SWAPFILE when exec filesystem operation.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH v3] f2fs: compress: Allow modular (de)compression algorithms

2021-02-26 Thread Chao Yu


On 2021/2/26 23:51, Geert Uytterhoeven wrote:

If F2FS_FS is modular, enabling the compressions options
F2FS_FS_{LZ4,LZ4HZ,LZO,LZORLE,ZSTD} will make the (de)compression
algorithms {LZ4,LZ4HC,LZO,ZSTD}_{,DE}COMPRESS builtin instead of
modular, as the former depend on an intermediate boolean
F2FS_FS_COMPRESSION, which in-turn depends on tristate F2FS_FS.

Indeed, if a boolean symbol A depends directly on a tristate symbol B
and selects another tristate symbol C:

 tristate B

 tristate C

 bool A
   depends on B
   select C

and B is modular, then C will also be modular.

However, if there is an intermediate boolean D in the dependency chain
between A and B:

 tristate B

 tristate C

 bool D
   depends on B

 bool A
   depends on D
   select C

then the modular state won't propagate from B to C, and C will be
builtin instead of modular.

As modular dependency propagation through intermediate symbols is
obscure, fix this in a robust way by moving the selection of tristate
(de)compression algorithms from the boolean compression options to the
tristate main F2FS_FS option.

Signed-off-by: Geert Uytterhoeven 


Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH] f2fs: compress: Allow modular (de)compression algorithms

2021-02-25 Thread Chao Yu


Hi Geert,

On 2021/2/23 15:42, Geert Uytterhoeven wrote:

I checked the code in menu_finalize(), and this seems to work like this.

I discussed the oddity of the select behavior before
(https://lore.kernel.org/linux-kbuild/e1a6228d-1341-6264-d97a-e2bd52a65...@infradead.org/),
but I was not confident about what the right direction was.


Anyway, the behavior is obscure from the current code.

If you want to make this more robust,
you can write as follows:

config F2FS_FS
 tristate "F2FS filesystem support"
 depends on BLOCK
 select NLS
 select CRYPTO
 select CRYPTO_CRC32
 select F2FS_FS_XATTR if FS_ENCRYPTION
 select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
 select LZO_COMPRESS if F2FS_FS_LZO
 select LZO_DECOMPRESS if F2FS_FS_LZO
 select LZ4_COMPRESS if F2FS_FS_LZ4
 select LZ4_DECOMPRESS if F2FS_FS_LZ4
 select LZ4HC_COMPRESS if F2FS_FS_LZ4HC
 select ZSTD_COMPRESS if F2FS_FS_ZSTD
 select ZSTD_DECOMPRESS if F2FS_FS_ZSTD

The code is a bit clumsy, but it is clear
that the module (F2FS_FS) is selecting the
compress/decompress libraries.

Actually the above is what I tried first ;-)  Works fine.

Then I started to look for similar cases in other file systems (e.g.
EROFS_FS_ZIP), and discovered the issue doesn't happen there, which
sparked my investigation.  So I settled on the direct dependency,
because it keeps all compression-related logic together.


It looks above way is more explicit, how about using your previous 
implementation?

Thank,



Gr{oetje,eeting}s,

Re: [f2fs-dev] [PATCH][next] f2fs: Replace one-element array with flexible-array member

2021-02-24 Thread Chao Yu


Hello, Gustavo,

On 2021/2/25 3:03, Gustavo A. R. Silva wrote:

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].


I proposal to do the similar cleanup, and I've no objection on doing this.

https://lore.kernel.org/patchwork/patch/869440/

Let's ask for Jaegeuk' opinion.



Refactor the code according to the use of a flexible-array member in
struct f2fs_checkpoint, instead of a one-element arrays.

Notice that a temporary pointer to void '*tmp_ptr' was used in order to
fix the following errors when using a flexible array instead of a one
element array in struct f2fs_checkpoint:

   CC [M]  fs/f2fs/dir.o
In file included from fs/f2fs/dir.c:13:
fs/f2fs/f2fs.h: In function ‘__bitmap_ptr’:
fs/f2fs/f2fs.h:2227:40: error: invalid use of flexible array member
  2227 |   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
   |^
fs/f2fs/f2fs.h:2227:49: error: invalid use of flexible array member
  2227 |   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
   | ^
fs/f2fs/f2fs.h:2238:40: error: invalid use of flexible array member
  2238 |   return &ckpt->sit_nat_version_bitmap + offset;
   |^
make[2]: *** [scripts/Makefile.build:287: fs/f2fs/dir.o] Error 1
make[1]: *** [scripts/Makefile.build:530: fs/f2fs] Error 2
make: *** [Makefile:1819: fs] Error 2

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] 
https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Build-tested-by: kernel test robot 
Link: https://lore.kernel.org/lkml/603647e4.deefbl4eqljuwaue%25...@intel.com/
Signed-off-by: Gustavo A. R. Silva 
---
  fs/f2fs/f2fs.h  | 5 +++--
  include/linux/f2fs_fs.h | 2 +-
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e2d302ae3a46..3f5cb097c30f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2215,6 +2215,7 @@ static inline block_t __cp_payload(struct f2fs_sb_info 
*sbi)
  static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
  {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+   void *tmp_ptr = &ckpt->sit_nat_version_bitmap;
int offset;
  
  	if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) {

@@ -2224,7 +2225,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
*sbi, int flag)
 * if large_nat_bitmap feature is enabled, leave checksum
 * protection for all nat/sit bitmaps.
 */
-   return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32);
+   return tmp_ptr + offset + sizeof(__le32);
}
  
  	if (__cp_payload(sbi) > 0) {

@@ -2235,7 +2236,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
*sbi, int flag)
} else {
offset = (flag == NAT_BITMAP) ?
le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0;
-   return &ckpt->sit_nat_version_bitmap + offset;
+   return tmp_ptr + offset;
}
  }
  
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h

index c6cc0a566ef5..5487a80617a3 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -168,7 +168,7 @@ struct f2fs_checkpoint {
unsigned char alloc_type[MAX_ACTIVE_LOGS];
  
  	/* SIT and NAT version bitmap */

-   unsigned char sit_nat_version_bitmap[1];
+   unsigned char sit_nat_version_bitmap[];
  } __packed;
  
  #define CP_CHKSUM_OFFSET	4092	/* default chksum offset in checkpoint */

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1234 matches

Mail list logo