from:"Jaegeuk Kim"

Re: [f2fs-dev] [PATCH] f2fs: fix to avoid out-of-bounds memory access

2021-04-20 Thread Jaegeuk Kim

Hi,

On 04/20, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Tue, Mar 23, 2021 at 02:43:29PM +0800, Chao Yu wrote:
> > Hi butt3rflyh4ck,
> > 
> > On 2021/3/23 13:48, butt3rflyh4ck wrote:
> > > Hi, I have tested the patch on 5.12.0-rc4+, it seems to fix the problem.
> > 
> > Thanks for helping to test this patch.
> 
> Was this patch applied? I do not see it in mainline (unless
> miss-checked).

Not yet. Queue for next merge window.

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=b862676e371715456c9dade7990c8004996d0d9e

> 
> Regards,
> Salvatore

Re: [PATCH] f2fs: fix to cover allocate_segment() with lock

2021-04-20 Thread Jaegeuk Kim

On 04/20, Chao Yu wrote:
> On 2021/4/20 0:57, Jaegeuk Kim wrote:
> > On 04/14, Chao Yu wrote:
> > > As we did for other cases, in fix_curseg_write_pointer(), let's
> > > change as below:
> > > - use callback function s_ops->allocate_segment() instead of
> > > raw function allocate_segment_by_default();
> > > - cover allocate_segment() with curseg_lock and sentry_lock.
> > > 
> > > Signed-off-by: Chao Yu 
> > > ---
> > >   fs/f2fs/segment.c | 7 ++-
> > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index b2ee6b7791b0..daf9531ec58f 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct 
> > > f2fs_sb_info *sbi, int type)
> > >   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
> > >   "curseg[0x%x,0x%x]", type, cs->segno, 
> > > cs->next_blkoff);
> > > - allocate_segment_by_default(sbi, type, true);
> > > +
> > > + down_read(_I(sbi)->curseg_lock);
> > > + down_write(_I(sbi)->sentry_lock);
> > > + SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
> > > + up_write(_I(sbi)->sentry_lock);
> > > + up_read(_I(sbi)->curseg_lock);
> > 
> > Seems f2fs_allocate_new_section()?
> 
> f2fs_allocate_new_section() will allocate new section only when current
> section has been initialized and has valid block/ckpt_block.
> 
> It looks fix_curseg_write_pointer() wants to force migrating current segment
> to new section whenever write pointer and curseg->next_blkoff is inconsistent.
> 
> So how about adding a parameter to force f2fs_allocate_new_section() to
> allocate new section?

I think that can be doable. Hope to avoid native calls as much as possible.

> 
> Thanks,
> 
> > 
> > >   /* check consistency of the zone curseg pointed to */
> > >   if (check_zone_write_pointer(sbi, zbd, ))
> > > -- 
> > > 2.29.2
> > .
> >

Re: [PATCH] f2fs: fix to cover allocate_segment() with lock

2021-04-19 Thread Jaegeuk Kim

On 04/14, Chao Yu wrote:
> As we did for other cases, in fix_curseg_write_pointer(), let's
> change as below:
> - use callback function s_ops->allocate_segment() instead of
> raw function allocate_segment_by_default();
> - cover allocate_segment() with curseg_lock and sentry_lock.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index b2ee6b7791b0..daf9531ec58f 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct 
> f2fs_sb_info *sbi, int type)
>  
>   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
>   "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> - allocate_segment_by_default(sbi, type, true);
> +
> + down_read(_I(sbi)->curseg_lock);
> + down_write(_I(sbi)->sentry_lock);
> + SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
> + up_write(_I(sbi)->sentry_lock);
> + up_read(_I(sbi)->curseg_lock);

Seems f2fs_allocate_new_section()?

>  
>   /* check consistency of the zone curseg pointed to */
>   if (check_zone_write_pointer(sbi, zbd, ))
> -- 
> 2.29.2

[PATCH RESEND] dm verity: fix not aligned logical block size of RS roots IO

2021-04-14 Thread Jaegeuk Kim

From: Jaegeuk Kim 

commit df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
made dm_bufio->block_size 1024, if f->roots is 2. But, that gives the below EIO
if the logical block size of the device is 4096, given 
v->data_dev_block_bits=12.

E sd 0: 0:0:0: [sda] tag#30 request not aligned to the logical block size
E blk_update_request: I/O error, dev sda, sector 10368424 op 0x0:(READ) flags 
0x0 phys_seg 1 prio class 0
E device-mapper: verity-fec: 254:8: FEC 9244672: parity read failed (block 
18056): -5

Let's use f->roots for dm_bufio iff it's aligned to v->data_dev_block_bits.

Fixes: df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
Cc: sta...@vger.kernel.org
Signed-off-by: Jaegeuk Kim 
---
 drivers/md/dm-verity-fec.c | 11 ---
 drivers/md/dm-verity-fec.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 66f4c6398f67..cea2b3789736 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -65,7 +65,7 @@ static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int 
index,
u8 *res;
 
position = (index + rsb) * v->fec->roots;
-   block = div64_u64_rem(position, v->fec->roots << SECTOR_SHIFT, );
+   block = div64_u64_rem(position, v->fec->io_size, );
*offset = (unsigned)rem;
 
res = dm_bufio_read(v->fec->bufio, block, buf);
@@ -154,7 +154,7 @@ static int fec_decode_bufs(struct dm_verity *v, struct 
dm_verity_fec_io *fio,
 
/* read the next block when we run out of parity bytes */
offset += v->fec->roots;
-   if (offset >= v->fec->roots << SECTOR_SHIFT) {
+   if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
 
par = fec_read_parity(v, rsb, block_offset, , 
);
@@ -742,8 +742,13 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
 
+   if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
+   f->io_size = 1 << v->data_dev_block_bits;
+   else
+   f->io_size = v->fec->roots << SECTOR_SHIFT;
+
f->bufio = dm_bufio_client_create(f->dev->bdev,
- f->roots << SECTOR_SHIFT,
+ f->io_size,
  1, 0, NULL, NULL);
if (IS_ERR(f->bufio)) {
ti->error = "Cannot initialize FEC bufio client";
diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h
index 42fbd3a7fc9f..3c46c8d61883 100644
--- a/drivers/md/dm-verity-fec.h
+++ b/drivers/md/dm-verity-fec.h
@@ -36,6 +36,7 @@ struct dm_verity_fec {
struct dm_dev *dev; /* parity data device */
struct dm_bufio_client *data_bufio; /* for data dev access */
struct dm_bufio_client *bufio;  /* for parity data access */
+   size_t io_size; /* IO size for roots */
sector_t start; /* parity data start in blocks */
sector_t blocks;/* number of blocks covered */
sector_t rounds;/* number of interleaving rounds */
-- 
2.31.1.295.g9ea45b61b8-goog

Re: [PATCH] dm verity: fix not aligned logical block size of RS roots IO

2021-04-13 Thread Jaegeuk Kim

On 04/12, Jaegeuk Kim wrote:
> From: Jaegeuk Kim 
> 
> commit df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block 
> size")
> made dm_bufio->block_size 1024, if f->roots is 2. But, that gives the below 
> EIO
> if the logical block size of the device is 4096, given 
> v->data_dev_block_bits=12.
> 
> E sd 0: 0:0:0: [sda] tag#30 request not aligned to the logical block size
> E blk_update_request: I/O error, dev sda, sector 10368424 op 0x0:(READ) flags 
> 0x0 phys_seg 1 prio class 0
> E device-mapper: verity-fec: 254:8: FEC 9244672: parity read failed (block 
> 18056): -5
> 
> Let's use f->roots for dm_bufio iff it's aligned to v->data_dev_block_bits.
> 
> Fixes: df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block 
> size")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Jaegeuk Kim 
> ---
>  drivers/md/dm-verity-fec.c | 11 ---
>  drivers/md/dm-verity-fec.h |  1 +
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
> index 66f4c6398f67..cea2b3789736 100644
> --- a/drivers/md/dm-verity-fec.c
> +++ b/drivers/md/dm-verity-fec.c
> @@ -65,7 +65,7 @@ static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, 
> int index,
>   u8 *res;
>  
>   position = (index + rsb) * v->fec->roots;
> - block = div64_u64_rem(position, v->fec->roots << SECTOR_SHIFT, );
> + block = div64_u64_rem(position, v->fec->io_size, );
>   *offset = (unsigned)rem;
>  
>   res = dm_bufio_read(v->fec->bufio, block, buf);
> @@ -154,7 +154,7 @@ static int fec_decode_bufs(struct dm_verity *v, struct 
> dm_verity_fec_io *fio,
>  
>   /* read the next block when we run out of parity bytes */
>   offset += v->fec->roots;
> - if (offset >= v->fec->roots << SECTOR_SHIFT) {
> + if (offset >= v->fec->io_size) {
>   dm_bufio_release(buf);
>  
>   par = fec_read_parity(v, rsb, block_offset, , 
> );
> @@ -742,8 +742,13 @@ int verity_fec_ctr(struct dm_verity *v)
>   return -E2BIG;
>   }
>  
> + if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
> + f->io_size = 1 << v->data_dev_block_bits;
> + else
> + f->io_size = v->fec->roots << SECTOR_SHIFT;
> +
>   f->bufio = dm_bufio_client_create(f->dev->bdev,
> -   f->roots << SECTOR_SHIFT,
> +   f->io_size,
> 1, 0, NULL, NULL);
>   if (IS_ERR(f->bufio)) {
>   ti->error = "Cannot initialize FEC bufio client";
> diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h
> index 42fbd3a7fc9f..3c46c8d61883 100644
> --- a/drivers/md/dm-verity-fec.h
> +++ b/drivers/md/dm-verity-fec.h
> @@ -36,6 +36,7 @@ struct dm_verity_fec {
>   struct dm_dev *dev; /* parity data device */
>   struct dm_bufio_client *data_bufio; /* for data dev access */
>   struct dm_bufio_client *bufio;  /* for parity data access */
> + size_t io_size; /* IO size for roots */
>   sector_t start; /* parity data start in blocks */
>   sector_t blocks;/* number of blocks covered */
>   sector_t rounds;/* number of interleaving rounds */
> -- 
> 2.31.1.295.g9ea45b61b8-goog

Re: [PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-13 Thread Jaegeuk Kim

On 04/13, Chao Yu wrote:
> On 2021/4/13 11:27, Jaegeuk Kim wrote:
> > On 04/12, Chao Yu wrote:
> > > As Yi Chen reported, there is a potential race case described as below:
> > > 
> > > Thread A  Thread B
> > > - f2fs_ioc_start_atomic_write
> > >   - mkwrite
> > >- set_page_dirty
> > > - f2fs_set_page_private(page, 0)
> > >   - set_inode_flag(FI_ATOMIC_FILE)
> > >   - mkwrite same page
> > >- set_page_dirty
> > > - f2fs_register_inmem_page
> > >  - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
> > >failed due to PagePrivate flag has been set
> > >  - list_add_tail
> > >   - truncate_inode_pages
> > >- f2fs_invalidate_page
> > > - clear page private but w/o remove it from
> > >   inmem_list
> > >- set page->mapping to NULL
> > > - f2fs_ioc_commit_atomic_write
> > >   - __f2fs_commit_inmem_pages
> > > - __revoke_inmem_pages
> > >  - f2fs_put_page panic as page->mapping is NULL
> > > 
> > > The root cause is we missed to keep isolation of atomic write in the case
> > > of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
> > > lock to avoid this issue.
> > 
> > My only concern is performance regression. Could you please verify the 
> > numbers?
> 
> Do you have specific test script?
> 
> IIRC, the scenario you mean is multi-threads write/mmap the same db, right?

I suggest to run sqlite transaction/check operations in android devices in 
parallel.

> 
> Thanks,
> 
> > 
> > > 
> > > Reported-by: Yi Chen 
> > > Signed-off-by: Chao Yu 
> > > ---
> > > v3:
> > > - rebase to last dev branch
> > > - update commit message because this patch fixes a different racing issue
> > > of atomic write
> > >   fs/f2fs/file.c| 3 +++
> > >   fs/f2fs/segment.c | 6 ++
> > >   2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index d697c8900fa7..6284b2f4a60b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   goto out;
> > >   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > > + down_write(_I(inode)->i_mmap_sem);
> > >   /*
> > >* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > > @@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > > inode->i_ino, get_dirty_pages(inode));
> > >   ret = filemap_write_and_wait_range(inode->i_mapping, 0, 
> > > LLONG_MAX);
> > >   if (ret) {
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >   goto out;
> > >   }
> > > @@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   /* add inode in inmem_list first and set atomic_file */
> > >   set_inode_flag(inode, FI_ATOMIC_FILE);
> > >   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index 0cb1ca88d4aa..78c8342f52fd 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > >   struct f2fs_inode_info *fi = F2FS_I(inode);
> > >   do {
> > > + down_write(_I(inode)->i_mmap_sem);
> > >   mutex_lock(>inmem_lock);
> > >   if (list_empty(>inmem_pages)) {
> > >   fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > > @@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > >

Re: [PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-12 Thread Jaegeuk Kim

On 04/12, Chao Yu wrote:
> As Yi Chen reported, there is a potential race case described as below:
> 
> Thread A  Thread B
> - f2fs_ioc_start_atomic_write
>   - mkwrite
>- set_page_dirty
> - f2fs_set_page_private(page, 0)
>  - set_inode_flag(FI_ATOMIC_FILE)
>   - mkwrite same page
>- set_page_dirty
> - f2fs_register_inmem_page
>  - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
>failed due to PagePrivate flag has been set
>  - list_add_tail
>   - truncate_inode_pages
>- f2fs_invalidate_page
> - clear page private but w/o remove it from
>   inmem_list
>- set page->mapping to NULL
> - f2fs_ioc_commit_atomic_write
>  - __f2fs_commit_inmem_pages
>- __revoke_inmem_pages
> - f2fs_put_page panic as page->mapping is NULL
> 
> The root cause is we missed to keep isolation of atomic write in the case
> of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
> lock to avoid this issue.

My only concern is performance regression. Could you please verify the numbers?

> 
> Reported-by: Yi Chen 
> Signed-off-by: Chao Yu 
> ---
> v3:
> - rebase to last dev branch
> - update commit message because this patch fixes a different racing issue
> of atomic write
>  fs/f2fs/file.c| 3 +++
>  fs/f2fs/segment.c | 6 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index d697c8900fa7..6284b2f4a60b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   goto out;
>  
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   /*
>* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> @@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
> inode->i_ino, get_dirty_pages(inode));
>   ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
>   if (ret) {
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   goto out;
>   }
> @@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   /* add inode in inmem_list first and set atomic_file */
>   set_inode_flag(inode, FI_ATOMIC_FILE);
>   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>  
>   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 0cb1ca88d4aa..78c8342f52fd 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>  
>   do {
> + down_write(_I(inode)->i_mmap_sem);
>   mutex_lock(>inmem_lock);
>   if (list_empty(>inmem_pages)) {
>   fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> @@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   spin_unlock(>inode_lock[ATOMIC_FILE]);
>  
>   mutex_unlock(>inmem_lock);
> + up_write(_I(inode)->i_mmap_sem);
>   break;
>   }
>   __revoke_inmem_pages(inode, >inmem_pages,
>   true, false, true);
>   mutex_unlock(>inmem_lock);
> + up_write(_I(inode)->i_mmap_sem);
>   } while (1);
>  }
>  
> @@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   f2fs_balance_fs(sbi, true);
>  
>   down_write(>i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   f2fs_lock_op(sbi);
>   set_inode_flag(inode, FI_ATOMIC_COMMIT);
> @@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   clear_inode_flag(inode, FI_ATOMIC_COMMIT);
>  
>   f2fs_unlock_op(sbi);
> +
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(>i_gc_rwsem[WRITE]);
>  
>   return err;
> -- 
> 2.29.2

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-04-12 Thread Jaegeuk Kim

On 04/11, Chao Yu wrote:
> Hi Jaegeuk,
> 
> Could you please help to merge below cleanup diff into original patch?
> or merge this separately if it is too late since it is near rc7.

I didn't review this tho, this gives an error in xfstests/083.

> 
> From 5a342a8f332a1b3281ec0e2b4d41b5287689c8ed Mon Sep 17 00:00:00 2001
> From: Chao Yu 
> Date: Sun, 11 Apr 2021 14:29:34 +0800
> Subject: [PATCH] f2fs: avoid duplicated codes for cleanup
> 
> f2fs_segment_has_free_slot() was copied from __next_free_blkoff(),
> the main implementation of them is almost the same, clean up them to
> reuse common code as much as possible.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.c | 32 ++--
>  1 file changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index b33273aa5c22..bd9056165d62 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2627,22 +2627,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int 
> type, bool new_sec)
>   curseg->alloc_type = LFS;
>  }
> 
> -static void __next_free_blkoff(struct f2fs_sb_info *sbi,
> - struct curseg_info *seg, block_t start)
> +static int __next_free_blkoff(struct f2fs_sb_info *sbi,
> + int segno, block_t start)
>  {
> - struct seg_entry *se = get_seg_entry(sbi, seg->segno);
> + struct seg_entry *se = get_seg_entry(sbi, segno);
>   int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
>   unsigned long *target_map = SIT_I(sbi)->tmp_map;
>   unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
>   unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
> - int i, pos;
> + int i;
> 
>   for (i = 0; i < entries; i++)
>   target_map[i] = ckpt_map[i] | cur_map[i];
> 
> - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
> -
> - seg->next_blkoff = pos;
> + return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start);
>  }
> 
>  /*
> @@ -2654,26 +2652,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
> *sbi,
>   struct curseg_info *seg)
>  {
>   if (seg->alloc_type == SSR)
> - __next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
> + seg->next_blkoff =
> + __next_free_blkoff(sbi, seg->segno,
> + seg->next_blkoff + 1);
>   else
>   seg->next_blkoff++;
>  }
> 
>  bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
>  {
> - struct seg_entry *se = get_seg_entry(sbi, segno);
> - int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
> - unsigned long *target_map = SIT_I(sbi)->tmp_map;
> - unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
> - unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
> - int i, pos;
> -
> - for (i = 0; i < entries; i++)
> - target_map[i] = ckpt_map[i] | cur_map[i];
> -
> - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
> -
> - return pos < sbi->blocks_per_seg;
> + return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg;
>  }
> 
>  /*
> @@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int 
> type, bool flush)
> 
>   reset_curseg(sbi, type, 1);
>   curseg->alloc_type = SSR;
> - __next_free_blkoff(sbi, curseg, 0);
> + __next_free_blkoff(sbi, curseg->segno, 0);
> 
>   sum_page = f2fs_get_sum_page(sbi, new_segno);
>   if (IS_ERR(sum_page)) {
> -- 
> 2.22.1

[PATCH] dm verity: fix not aligned logical block size of RS roots IO

2021-04-12 Thread Jaegeuk Kim

From: Jaegeuk Kim 

commit df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
made dm_bufio->block_size 1024, if f->roots is 2. But, that gives the below EIO
if the logical block size of the device is 4096, given 
v->data_dev_block_bits=12.

E sd 0: 0:0:0: [sda] tag#30 request not aligned to the logical block size
E blk_update_request: I/O error, dev sda, sector 10368424 op 0x0:(READ) flags 
0x0 phys_seg 1 prio class 0
E device-mapper: verity-fec: 254:8: FEC 9244672: parity read failed (block 
18056): -5

Let's use f->roots for dm_bufio iff it's aligned to v->data_dev_block_bits.

Fixes: df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
Cc: sta...@vger.kernel.org
Signed-off-by: Jaegeuk Kim 
---
 drivers/md/dm-verity-fec.c | 11 ---
 drivers/md/dm-verity-fec.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 66f4c6398f67..cea2b3789736 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -65,7 +65,7 @@ static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int 
index,
u8 *res;
 
position = (index + rsb) * v->fec->roots;
-   block = div64_u64_rem(position, v->fec->roots << SECTOR_SHIFT, );
+   block = div64_u64_rem(position, v->fec->io_size, );
*offset = (unsigned)rem;
 
res = dm_bufio_read(v->fec->bufio, block, buf);
@@ -154,7 +154,7 @@ static int fec_decode_bufs(struct dm_verity *v, struct 
dm_verity_fec_io *fio,
 
/* read the next block when we run out of parity bytes */
offset += v->fec->roots;
-   if (offset >= v->fec->roots << SECTOR_SHIFT) {
+   if (offset >= v->fec->io_size) {
dm_bufio_release(buf);
 
par = fec_read_parity(v, rsb, block_offset, , 
);
@@ -742,8 +742,13 @@ int verity_fec_ctr(struct dm_verity *v)
return -E2BIG;
}
 
+   if ((f->roots << SECTOR_SHIFT) & ((1 << v->data_dev_block_bits) - 1))
+   f->io_size = 1 << v->data_dev_block_bits;
+   else
+   f->io_size = v->fec->roots << SECTOR_SHIFT;
+
f->bufio = dm_bufio_client_create(f->dev->bdev,
- f->roots << SECTOR_SHIFT,
+ f->io_size,
  1, 0, NULL, NULL);
if (IS_ERR(f->bufio)) {
ti->error = "Cannot initialize FEC bufio client";
diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h
index 42fbd3a7fc9f..3c46c8d61883 100644
--- a/drivers/md/dm-verity-fec.h
+++ b/drivers/md/dm-verity-fec.h
@@ -36,6 +36,7 @@ struct dm_verity_fec {
struct dm_dev *dev; /* parity data device */
struct dm_bufio_client *data_bufio; /* for data dev access */
struct dm_bufio_client *bufio;  /* for parity data access */
+   size_t io_size; /* IO size for roots */
sector_t start; /* parity data start in blocks */
sector_t blocks;/* number of blocks covered */
sector_t rounds;/* number of interleaving rounds */
-- 
2.31.1.295.g9ea45b61b8-goog

Re: [PATCH] dm verity: fix unaligned block size

2021-04-10 Thread Jaegeuk Kim

Sorry, this patch is totally wrong. Let me dig out more.

On 04/10, Jaegeuk Kim wrote:
> From: Jaegeuk Kim 
> 
> When f->roots is 2 and block size is 4096, it will gives unaligned block size
> length in the scsi command like below. Let's allocate dm_bufio to set the 
> block
> size length to match IO chunk size.
> 
> E sd 0: 0:0:0: [sda] tag#30 request not aligned to the logical block size
> E blk_update_request: I/O error, dev sda, sector 10368424 op 0x0:(READ) flags 
> 0x0 phys_seg 1 prio class 0
> E device-mapper: verity-fec: 254:8: FEC 9244672: parity read failed (block 
> 18056): -5
> 
> Fixes: ce1cca17381f ("dm verity: fix FEC for RS roots unaligned to block 
> size")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Jaegeuk Kim 
> ---
>  drivers/md/dm-verity-fec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
> index 66f4c6398f67..656238131dd7 100644
> --- a/drivers/md/dm-verity-fec.c
> +++ b/drivers/md/dm-verity-fec.c
> @@ -743,7 +743,7 @@ int verity_fec_ctr(struct dm_verity *v)
>   }
>  
>   f->bufio = dm_bufio_client_create(f->dev->bdev,
> -   f->roots << SECTOR_SHIFT,
> +   1 << v->data_dev_block_bits,
> 1, 0, NULL, NULL);
>   if (IS_ERR(f->bufio)) {
>   ti->error = "Cannot initialize FEC bufio client";
> -- 
> 2.31.1.295.g9ea45b61b8-goog

[PATCH] dm verity: fix unaligned block size

2021-04-10 Thread Jaegeuk Kim

From: Jaegeuk Kim 

When f->roots is 2 and block size is 4096, it will gives unaligned block size
length in the scsi command like below. Let's allocate dm_bufio to set the block
size length to match IO chunk size.

E sd 0: 0:0:0: [sda] tag#30 request not aligned to the logical block size
E blk_update_request: I/O error, dev sda, sector 10368424 op 0x0:(READ) flags 
0x0 phys_seg 1 prio class 0
E device-mapper: verity-fec: 254:8: FEC 9244672: parity read failed (block 
18056): -5

Fixes: ce1cca17381f ("dm verity: fix FEC for RS roots unaligned to block size")
Cc: sta...@vger.kernel.org
Signed-off-by: Jaegeuk Kim 
---
 drivers/md/dm-verity-fec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 66f4c6398f67..656238131dd7 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -743,7 +743,7 @@ int verity_fec_ctr(struct dm_verity *v)
}
 
f->bufio = dm_bufio_client_create(f->dev->bdev,
- f->roots << SECTOR_SHIFT,
+ 1 << v->data_dev_block_bits,
  1, 0, NULL, NULL);
if (IS_ERR(f->bufio)) {
ti->error = "Cannot initialize FEC bufio client";
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH] f2fs: set checkpoint_merge by default

2021-04-01 Thread Jaegeuk Kim

Once we introduced checkpoint_merge, we've seen some contention w/o the option.
In order to avoid it, let's set it by default.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 14239e2b7ae7..c15800c3cdb1 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1839,6 +1839,7 @@ static void default_options(struct f2fs_sb_info *sbi)
set_opt(sbi, EXTENT_CACHE);
set_opt(sbi, NOHEAP);
clear_opt(sbi, DISABLE_CHECKPOINT);
+   set_opt(sbi, MERGE_CHECKPOINT);
F2FS_OPTION(sbi).unusable_cap = 0;
sbi->sb->s_flags |= SB_LAZYTIME;
set_opt(sbi, FLUSH_MERGE);
-- 
2.31.0.208.g409f899ff0-goog

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-30 Thread Jaegeuk Kim

On 03/27, Chao Yu wrote:
> On 2021/3/27 9:52, Chao Yu wrote:
> > On 2021/3/27 1:30, Jaegeuk Kim wrote:
> > > On 03/26, Chao Yu wrote:
> > > > On 2021/3/26 9:19, Jaegeuk Kim wrote:
> > > > > On 03/26, Chao Yu wrote:
> > > > > > On 2021/3/25 9:59, Chao Yu wrote:
> > > > > > > On 2021/3/25 6:44, Jaegeuk Kim wrote:
> > > > > > > > On 03/24, Chao Yu wrote:
> > > > > > > > > On 2021/3/24 12:22, Jaegeuk Kim wrote:
> > > > > > > > > > On 03/24, Chao Yu wrote:
> > > > > > > > > > > On 2021/3/24 2:39, Jaegeuk Kim wrote:
> > > > > > > > > > > > On 03/23, Chao Yu wrote:
> > > > > > > > > > > > > This reverts commit 
> > > > > > > > > > > > > 938a184265d75ea474f1c6fe1da96a5196163789.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Because that commit fails generic/050 testcase which 
> > > > > > > > > > > > > expect failure
> > > > > > > > > > > > > during mount a recoverable readonly partition.
> > > > > > > > > > > > 
> > > > > > > > > > > > I think we need to change generic/050, since f2fs can 
> > > > > > > > > > > > recover this partition,
> > > > > > > > > > > 
> > > > > > > > > > > Well, not sure we can change that testcase, since it 
> > > > > > > > > > > restricts all generic
> > > > > > > > > > > filesystems behavior. At least, ext4's behavior makes 
> > > > > > > > > > > sense to me:
> > > > > > > > > > > 
> > > > > > > > > > >   journal_dev_ro = bdev_read_only(journal->j_dev);
> > > > > > > > > > >   really_read_only = bdev_read_only(sb->s_bdev) | 
> > > > > > > > > > > journal_dev_ro;
> > > > > > > > > > > 
> > > > > > > > > > >   if (journal_dev_ro && !sb_rdonly(sb)) {
> > > > > > > > > > >   ext4_msg(sb, KERN_ERR,
> > > > > > > > > > >"journal device read-only, try 
> > > > > > > > > > > mounting with '-o ro'");
> > > > > > > > > > >   err = -EROFS;
> > > > > > > > > > >   goto err_out;
> > > > > > > > > > >   }
> > > > > > > > > > > 
> > > > > > > > > > >   if (ext4_has_feature_journal_needs_recovery(sb)) {
> > > > > > > > > > >   if (sb_rdonly(sb)) {
> > > > > > > > > > >   ext4_msg(sb, KERN_INFO, "INFO: recovery 
> > > > > > > > > > > "
> > > > > > > > > > >   "required on readonly 
> > > > > > > > > > > filesystem");
> > > > > > > > > > >   if (really_read_only) {
> > > > > > > > > > >   ext4_msg(sb, KERN_ERR, "write 
> > > > > > > > > > > access "
> > > > > > > > > > >   "unavailable, cannot 
> > > > > > > > > > > proceed "
> > > > > > > > > > >   "(try mounting with 
> > > > > > > > > > > noload)");
> > > > > > > > > > >   err = -EROFS;
> > > > > > > > > > >   goto err_out;
> > > > > > > > > > >   }
> > > > > > > > > > >   ext4_msg(sb, KERN_INFO, "write access 
> > > > > > > > > > > will "
> > > > > > > > > > >  "be enabled during recovery");
> > > > > > > > > > >   }
> > &

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-26 Thread Jaegeuk Kim

On 03/26, Chao Yu wrote:
> On 2021/3/26 9:19, Jaegeuk Kim wrote:
> > On 03/26, Chao Yu wrote:
> > > On 2021/3/25 9:59, Chao Yu wrote:
> > > > On 2021/3/25 6:44, Jaegeuk Kim wrote:
> > > > > On 03/24, Chao Yu wrote:
> > > > > > On 2021/3/24 12:22, Jaegeuk Kim wrote:
> > > > > > > On 03/24, Chao Yu wrote:
> > > > > > > > On 2021/3/24 2:39, Jaegeuk Kim wrote:
> > > > > > > > > On 03/23, Chao Yu wrote:
> > > > > > > > > > This reverts commit 
> > > > > > > > > > 938a184265d75ea474f1c6fe1da96a5196163789.
> > > > > > > > > > 
> > > > > > > > > > Because that commit fails generic/050 testcase which expect 
> > > > > > > > > > failure
> > > > > > > > > > during mount a recoverable readonly partition.
> > > > > > > > > 
> > > > > > > > > I think we need to change generic/050, since f2fs can recover 
> > > > > > > > > this partition,
> > > > > > > > 
> > > > > > > > Well, not sure we can change that testcase, since it restricts 
> > > > > > > > all generic
> > > > > > > > filesystems behavior. At least, ext4's behavior makes sense to 
> > > > > > > > me:
> > > > > > > > 
> > > > > > > > journal_dev_ro = bdev_read_only(journal->j_dev);
> > > > > > > > really_read_only = bdev_read_only(sb->s_bdev) | 
> > > > > > > > journal_dev_ro;
> > > > > > > > 
> > > > > > > > if (journal_dev_ro && !sb_rdonly(sb)) {
> > > > > > > > ext4_msg(sb, KERN_ERR,
> > > > > > > >  "journal device read-only, try 
> > > > > > > > mounting with '-o ro'");
> > > > > > > > err = -EROFS;
> > > > > > > > goto err_out;
> > > > > > > > }
> > > > > > > > 
> > > > > > > > if (ext4_has_feature_journal_needs_recovery(sb)) {
> > > > > > > > if (sb_rdonly(sb)) {
> > > > > > > > ext4_msg(sb, KERN_INFO, "INFO: recovery 
> > > > > > > > "
> > > > > > > > "required on readonly 
> > > > > > > > filesystem");
> > > > > > > > if (really_read_only) {
> > > > > > > > ext4_msg(sb, KERN_ERR, "write 
> > > > > > > > access "
> > > > > > > > "unavailable, cannot 
> > > > > > > > proceed "
> > > > > > > > "(try mounting with 
> > > > > > > > noload)");
> > > > > > > > err = -EROFS;
> > > > > > > > goto err_out;
> > > > > > > > }
> > > > > > > > ext4_msg(sb, KERN_INFO, "write access 
> > > > > > > > will "
> > > > > > > >"be enabled during recovery");
> > > > > > > > }
> > > > > > > > }
> > > > > > > > 
> > > > > > > > > even though using it as readonly. And, valid checkpoint can 
> > > > > > > > > allow for user to
> > > > > > > > > read all the data without problem.
> > > > > > > > 
> > > > > > > > > > if (f2fs_hw_is_readonly(sbi)) {
> > > > > > > > 
> > > > > > > > Since device is readonly now, all write to the device will 
> > > > > > > > fail, checkpoint can
> > > > > > > > not persist recovered data, after page cache is expired, user 
> > > > > > > > can see stale data.
> >

Re: [PATCH v3] f2fs: allow to change discard policy based on cached discard cmds

2021-03-25 Thread Jaegeuk Kim

On 03/26, Sahitya Tummala wrote:
> Hi Jaegeuk,
> 
> This latest v3 patch needs to be updated in f2fs tree.
> The f2fs tree currently points to older version of patch.
> 
> Please make a note of it.

Ha, need more coffee. Thanks for pointing it out. :)

> 
> Thanks,
> Sahitya.
> 
> On Tue, Mar 16, 2021 at 07:08:58PM +0800, Chao Yu wrote:
> > On 2021/3/16 17:29, Sahitya Tummala wrote:
> > >With the default DPOLICY_BG discard thread is ioaware, which prevents
> > >the discard thread from issuing the discard commands. On low RAM setups,
> > >it is observed that these discard commands in the cache are consuming
> > >high memory. This patch aims to relax the memory pressure on the system
> > >due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
> > >based on the nm_i->ram_thresh configured.
> > >
> > >Signed-off-by: Sahitya Tummala 
> > 
> > Reviewed-by: Chao Yu 
> > 
> > Thanks,
> 
> -- 
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-25 Thread Jaegeuk Kim

On 03/26, Chao Yu wrote:
> On 2021/3/25 9:59, Chao Yu wrote:
> > On 2021/3/25 6:44, Jaegeuk Kim wrote:
> > > On 03/24, Chao Yu wrote:
> > > > On 2021/3/24 12:22, Jaegeuk Kim wrote:
> > > > > On 03/24, Chao Yu wrote:
> > > > > > On 2021/3/24 2:39, Jaegeuk Kim wrote:
> > > > > > > On 03/23, Chao Yu wrote:
> > > > > > > > This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.
> > > > > > > > 
> > > > > > > > Because that commit fails generic/050 testcase which expect 
> > > > > > > > failure
> > > > > > > > during mount a recoverable readonly partition.
> > > > > > > 
> > > > > > > I think we need to change generic/050, since f2fs can recover 
> > > > > > > this partition,
> > > > > > 
> > > > > > Well, not sure we can change that testcase, since it restricts all 
> > > > > > generic
> > > > > > filesystems behavior. At least, ext4's behavior makes sense to me:
> > > > > > 
> > > > > > journal_dev_ro = bdev_read_only(journal->j_dev);
> > > > > > really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;
> > > > > > 
> > > > > > if (journal_dev_ro && !sb_rdonly(sb)) {
> > > > > > ext4_msg(sb, KERN_ERR,
> > > > > >  "journal device read-only, try mounting with 
> > > > > > '-o ro'");
> > > > > > err = -EROFS;
> > > > > > goto err_out;
> > > > > > }
> > > > > > 
> > > > > > if (ext4_has_feature_journal_needs_recovery(sb)) {
> > > > > > if (sb_rdonly(sb)) {
> > > > > > ext4_msg(sb, KERN_INFO, "INFO: recovery "
> > > > > > "required on readonly 
> > > > > > filesystem");
> > > > > > if (really_read_only) {
> > > > > > ext4_msg(sb, KERN_ERR, "write access "
> > > > > > "unavailable, cannot proceed "
> > > > > > "(try mounting with noload)");
> > > > > > err = -EROFS;
> > > > > > goto err_out;
> > > > > > }
> > > > > > ext4_msg(sb, KERN_INFO, "write access will "
> > > > > >"be enabled during recovery");
> > > > > > }
> > > > > > }
> > > > > > 
> > > > > > > even though using it as readonly. And, valid checkpoint can allow 
> > > > > > > for user to
> > > > > > > read all the data without problem.
> > > > > > 
> > > > > > > > if (f2fs_hw_is_readonly(sbi)) {
> > > > > > 
> > > > > > Since device is readonly now, all write to the device will fail, 
> > > > > > checkpoint can
> > > > > > not persist recovered data, after page cache is expired, user can 
> > > > > > see stale data.
> > > > > 
> > > > > My point is, after mount with ro, there'll be no data write which 
> > > > > preserves the
> > > > > current status. So, in the next time, we can recover fsync'ed data 
> > > > > later, if
> > > > > user succeeds to mount as rw. Another point is, with the current 
> > > > > checkpoint, we
> > > > > should not have any corrupted metadata. So, why not giving a chance 
> > > > > to show what
> > > > > data remained to user? I think this can be doable only with CoW 
> > > > > filesystems.
> > > > 
> > > > I guess we're talking about the different things...
> > > > 
> > > > Let me declare two different readonly status:
> > > > 
> > > > 1. filesystem readonly: file system is mount with ro mount option, and
> > > > app from userspace can not modify any thing of filesystem, but 
> > > > filesyst

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-03-24 Thread Jaegeuk Kim

On 03/25, Chao Yu wrote:
> On 2021/3/25 7:49, Jaegeuk Kim wrote:
> > On 03/24, Chao Yu wrote:
> > > In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
> > > mode to select victim:
> > > 
> > > 1. LFS is set to find source section during GC, the victim should have
> > > no checkpointed data, since after GC, section could not be set free for
> > > reuse.
> > > 
> > > Previously, we only check valid chpt blocks in current segment rather
> > > than section, fix it.
> > > 
> > > 2. SSR | AT_SSR are set to find target segment for writes which can be
> > > fully filled by checkpointed and newly written blocks, we should never
> > > select such segment, otherwise it can cause panic or data corruption
> > > during allocation, potential case is described as below:
> > > 
> > >   a) target segment has 128 ckpt valid blocks
> > >   b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is
> > >  still in dirty list)
> > >   c) GC migrates '512 - n' blocks to target segment (segment has 'n'
> > >  cp_vblocks and '512 - n' vblocks)
> > >   d) If GC selects target segment via {AT,}SSR allocator, however there
> > >  is no free space in targe segment.
> > > 
> > > Fixes: 4354994f097d ("f2fs: checkpoint disabling")
> > > Fixes: 093749e296e2 ("f2fs: support age threshold based garbage 
> > > collection")
> > > Signed-off-by: Chao Yu 
> > > ---
> > > v2:
> > > - fix to check checkpointed data in section rather than segment for
> > > LFS mode.
> > > - update commit title and message.
> > >   fs/f2fs/f2fs.h|  1 +
> > >   fs/f2fs/gc.c  | 28 
> > >   fs/f2fs/segment.c | 39 ---
> > >   fs/f2fs/segment.h | 14 +-
> > >   4 files changed, 58 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index eb154d9cb063..29e634d08a27 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct 
> > > f2fs_sb_info *sbi);
> > >   int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
> > >   void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
> > >   int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool 
> > > for_ra);
> > > +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
> > >   void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
> > >   void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
> > >   void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index d96acc6531f2..4d9616373a4a 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info 
> > > *sbi,
> > >   if (p->gc_mode == GC_AT &&
> > >   get_valid_blocks(sbi, segno, true) == 0)
> > >   return;
> > > -
> > > - if (p->alloc_mode == AT_SSR &&
> > > - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
> > > - return;
> > >   }
> > >   for (i = 0; i < sbi->segs_per_sec; i++)
> > > @@ -728,11 +724,27 @@ static int get_victim_by_default(struct 
> > > f2fs_sb_info *sbi,
> > >   if (sec_usage_check(sbi, secno))
> > >   goto next;
> > > +
> > >   /* Don't touch checkpointed data */
> > > - if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
> > > - get_ckpt_valid_blocks(sbi, segno) &&
> > > - p.alloc_mode == LFS))
> > > - goto next;
> > > + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
> > > + if (p.alloc_mode == LFS) {
> > > + /*
> > > +  * LFS is set to find source section during GC.
> > > +  * The victim should have no checkpointed data.
> > > +  */
> > > + if (get_ckpt_valid_blocks(sbi, segno, true))
> > > +

Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()

2021-03-24 Thread Jaegeuk Kim

On 03/24, Chao Yu wrote:
> In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR
> mode to select victim:
> 
> 1. LFS is set to find source section during GC, the victim should have
> no checkpointed data, since after GC, section could not be set free for
> reuse.
> 
> Previously, we only check valid chpt blocks in current segment rather
> than section, fix it.
> 
> 2. SSR | AT_SSR are set to find target segment for writes which can be
> fully filled by checkpointed and newly written blocks, we should never
> select such segment, otherwise it can cause panic or data corruption
> during allocation, potential case is described as below:
> 
>  a) target segment has 128 ckpt valid blocks
>  b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is
> still in dirty list)
>  c) GC migrates '512 - n' blocks to target segment (segment has 'n'
> cp_vblocks and '512 - n' vblocks)
>  d) If GC selects target segment via {AT,}SSR allocator, however there
> is no free space in targe segment.
> 
> Fixes: 4354994f097d ("f2fs: checkpoint disabling")
> Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
> Signed-off-by: Chao Yu 
> ---
> v2:
> - fix to check checkpointed data in section rather than segment for
> LFS mode.
> - update commit title and message.
>  fs/f2fs/f2fs.h|  1 +
>  fs/f2fs/gc.c  | 28 
>  fs/f2fs/segment.c | 39 ---
>  fs/f2fs/segment.h | 14 +-
>  4 files changed, 58 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index eb154d9cb063..29e634d08a27 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
> *sbi);
>  int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
>  void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
>  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
> +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
>  void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
>  void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
>  void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index d96acc6531f2..4d9616373a4a 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
>   if (p->gc_mode == GC_AT &&
>   get_valid_blocks(sbi, segno, true) == 0)
>   return;
> -
> - if (p->alloc_mode == AT_SSR &&
> - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
> - return;
>   }
>  
>   for (i = 0; i < sbi->segs_per_sec; i++)
> @@ -728,11 +724,27 @@ static int get_victim_by_default(struct f2fs_sb_info 
> *sbi,
>  
>   if (sec_usage_check(sbi, secno))
>   goto next;
> +
>   /* Don't touch checkpointed data */
> - if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
> - get_ckpt_valid_blocks(sbi, segno) &&
> - p.alloc_mode == LFS))
> - goto next;
> + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
> + if (p.alloc_mode == LFS) {
> + /*
> +  * LFS is set to find source section during GC.
> +  * The victim should have no checkpointed data.
> +  */
> + if (get_ckpt_valid_blocks(sbi, segno, true))
> + goto next;
> + } else {
> + /*
> +  * SSR | AT_SSR are set to find target segment
> +  * for writes which can be full by checkpointed
> +  * and newly written blocks.
> +  */
> + if (!segment_has_free_slot(sbi, segno))
> + goto next;
> + }
> + }
> +
>   if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
>   goto next;
>  
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 6e1a5f5657bf..f6a30856ceda 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info 
> *sbi, unsigned int segno)
>   mutex_lock(_i->seglist_lock);
>  
>   valid_blocks = get_valid_blocks(sbi, segno, false);
> - ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno);
> + ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false);
>  
>   if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) ||
>   ckpt_valid_blocks ==

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-24 Thread Jaegeuk Kim

On 03/24, Chao Yu wrote:
> On 2021/3/24 12:22, Jaegeuk Kim wrote:
> > On 03/24, Chao Yu wrote:
> > > On 2021/3/24 2:39, Jaegeuk Kim wrote:
> > > > On 03/23, Chao Yu wrote:
> > > > > This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.
> > > > > 
> > > > > Because that commit fails generic/050 testcase which expect failure
> > > > > during mount a recoverable readonly partition.
> > > > 
> > > > I think we need to change generic/050, since f2fs can recover this 
> > > > partition,
> > > 
> > > Well, not sure we can change that testcase, since it restricts all generic
> > > filesystems behavior. At least, ext4's behavior makes sense to me:
> > > 
> > >   journal_dev_ro = bdev_read_only(journal->j_dev);
> > >   really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;
> > > 
> > >   if (journal_dev_ro && !sb_rdonly(sb)) {
> > >   ext4_msg(sb, KERN_ERR,
> > >"journal device read-only, try mounting with '-o ro'");
> > >   err = -EROFS;
> > >   goto err_out;
> > >   }
> > > 
> > >   if (ext4_has_feature_journal_needs_recovery(sb)) {
> > >   if (sb_rdonly(sb)) {
> > >   ext4_msg(sb, KERN_INFO, "INFO: recovery "
> > >   "required on readonly filesystem");
> > >   if (really_read_only) {
> > >   ext4_msg(sb, KERN_ERR, "write access "
> > >   "unavailable, cannot proceed "
> > >   "(try mounting with noload)");
> > >   err = -EROFS;
> > >   goto err_out;
> > >   }
> > >   ext4_msg(sb, KERN_INFO, "write access will "
> > >  "be enabled during recovery");
> > >   }
> > >   }
> > > 
> > > > even though using it as readonly. And, valid checkpoint can allow for 
> > > > user to
> > > > read all the data without problem.
> > > 
> > > > >   if (f2fs_hw_is_readonly(sbi)) {
> > > 
> > > Since device is readonly now, all write to the device will fail, 
> > > checkpoint can
> > > not persist recovered data, after page cache is expired, user can see 
> > > stale data.
> > 
> > My point is, after mount with ro, there'll be no data write which preserves 
> > the
> > current status. So, in the next time, we can recover fsync'ed data later, if
> > user succeeds to mount as rw. Another point is, with the current 
> > checkpoint, we
> > should not have any corrupted metadata. So, why not giving a chance to show 
> > what
> > data remained to user? I think this can be doable only with CoW filesystems.
> 
> I guess we're talking about the different things...
> 
> Let me declare two different readonly status:
> 
> 1. filesystem readonly: file system is mount with ro mount option, and
> app from userspace can not modify any thing of filesystem, but filesystem
> itself can modify data on device since device may be writable.
> 
> 2. device readonly: device is set to readonly status via 'blockdev --setro'
> command, and then filesystem should never issue any write IO to the device.
> 
> So, what I mean is, *when device is readonly*, rather than f2fs mountpoint
> is readonly (f2fs_hw_is_readonly() returns true as below code, instead of
> f2fs_readonly() returns true), in this condition, we should not issue any
> write IO to device anyway, because, AFAIK, write IO will fail due to
> bio_check_ro() check.

In that case, mount(2) will try readonly, no?

# blockdev --setro /dev/vdb
# mount -t f2fs /dev/vdb /mnt/test/
mount: /mnt/test: WARNING: source write-protected, mounted read-only.

> 
>   if (f2fs_hw_is_readonly(sbi)) {
> - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> - err = -EROFS;
> + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
>   f2fs_err(sbi, "Need to recover fsync data, but 
> write access unavailable");
> - goto free_meta;
> - }
> - f2fs_info(sbi, "write access unavailable, skipping 
> recovery");
> + else
> + f2fs_info

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-23 Thread Jaegeuk Kim

On 03/24, Chao Yu wrote:
> On 2021/3/24 2:39, Jaegeuk Kim wrote:
> > On 03/23, Chao Yu wrote:
> > > This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.
> > > 
> > > Because that commit fails generic/050 testcase which expect failure
> > > during mount a recoverable readonly partition.
> > 
> > I think we need to change generic/050, since f2fs can recover this 
> > partition,
> 
> Well, not sure we can change that testcase, since it restricts all generic
> filesystems behavior. At least, ext4's behavior makes sense to me:
> 
>   journal_dev_ro = bdev_read_only(journal->j_dev);
>   really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro;
> 
>   if (journal_dev_ro && !sb_rdonly(sb)) {
>   ext4_msg(sb, KERN_ERR,
>"journal device read-only, try mounting with '-o ro'");
>   err = -EROFS;
>   goto err_out;
>   }
> 
>   if (ext4_has_feature_journal_needs_recovery(sb)) {
>   if (sb_rdonly(sb)) {
>   ext4_msg(sb, KERN_INFO, "INFO: recovery "
>   "required on readonly filesystem");
>   if (really_read_only) {
>   ext4_msg(sb, KERN_ERR, "write access "
>   "unavailable, cannot proceed "
>   "(try mounting with noload)");
>   err = -EROFS;
>   goto err_out;
>   }
>   ext4_msg(sb, KERN_INFO, "write access will "
>  "be enabled during recovery");
>   }
>   }
> 
> > even though using it as readonly. And, valid checkpoint can allow for user 
> > to
> > read all the data without problem.
> 
> >>if (f2fs_hw_is_readonly(sbi)) {
> 
> Since device is readonly now, all write to the device will fail, checkpoint 
> can
> not persist recovered data, after page cache is expired, user can see stale 
> data.

My point is, after mount with ro, there'll be no data write which preserves the
current status. So, in the next time, we can recover fsync'ed data later, if
user succeeds to mount as rw. Another point is, with the current checkpoint, we
should not have any corrupted metadata. So, why not giving a chance to show what
data remained to user? I think this can be doable only with CoW filesystems.

> 
> Am I missing something?
> 
> Thanks,
> 
> > 
> > > 
> > > Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
> > > Signed-off-by: Chao Yu 
> > > ---
> > >   fs/f2fs/super.c | 8 +---
> > >   1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > > index b48281642e98..2b78ee11f093 100644
> > > --- a/fs/f2fs/super.c
> > > +++ b/fs/f2fs/super.c
> > > @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block 
> > > *sb, void *data, int silent)
> > >* previous checkpoint was not done by clean system 
> > > shutdown.
> > >*/
> > >   if (f2fs_hw_is_readonly(sbi)) {
> > > - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
> > > + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> > > + err = -EROFS;
> > >   f2fs_err(sbi, "Need to recover fsync 
> > > data, but write access unavailable");
> > > - else
> > > - f2fs_info(sbi, "write access unavailable, 
> > > skipping recovery");
> > > + goto free_meta;
> > > + }
> > > + f2fs_info(sbi, "write access unavailable, skipping 
> > > recovery");
> > >   goto reset_checkpoint;
> > >   }
> > > -- 
> > > 2.29.2
> > .
> >

Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator

2021-03-23 Thread Jaegeuk Kim

On 03/19, Chao Yu wrote:
> On 2021/3/19 1:17, Jaegeuk Kim wrote:
> > On 02/20, Chao Yu wrote:
> > > In cp disabling mode, there could be a condition
> > > - target segment has 128 ckpt valid blocks
> > > - GC migrates 128 valid blocks to other segment (segment is still in
> > > dirty list)
> > > - GC migrates 384 blocks to target segment (segment has 128 cp_vblocks
> > > and 384 vblocks)
> > > - If GC selects target segment via {AT,}SSR allocator, however there is
> > > no free space in targe segment.
> > > 
> > > Fixes: 4354994f097d ("f2fs: checkpoint disabling")
> > > Fixes: 093749e296e2 ("f2fs: support age threshold based garbage 
> > > collection")
> > > Signed-off-by: Chao Yu 
> > > ---
> > >   fs/f2fs/f2fs.h|  1 +
> > >   fs/f2fs/gc.c  | 17 +
> > >   fs/f2fs/segment.c | 20 
> > >   3 files changed, 34 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index ed7807103c8e..9c753eff0814 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct 
> > > f2fs_sb_info *sbi);
> > >   int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
> > >   void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
> > >   int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool 
> > > for_ra);
> > > +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
> > >   void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
> > >   void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
> > >   void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index 86ba8ed0b8a7..a1d8062cdace 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info 
> > > *sbi,
> > >   if (p->gc_mode == GC_AT &&
> > >   get_valid_blocks(sbi, segno, true) == 0)
> > >   return;
> > > -
> > > - if (p->alloc_mode == AT_SSR &&
> > > - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
> > > - return;
> > >   }
> > >   for (i = 0; i < sbi->segs_per_sec; i++)
> > > @@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info 
> > > *sbi,
> > >   if (gc_type == BG_GC && test_bit(secno, 
> > > dirty_i->victim_secmap))
> > >   goto next;
> > > + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
> > > + /*
> > > +  * to avoid selecting candidate which has below valid
> > > +  * block distribution:
> > > +  * partial blocks are valid and all left ones are valid
> > > +  * in previous checkpoint.
> > > +  */
> > > + if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) {
> > > + if (!segment_has_free_slot(sbi, segno))
> > > + goto next;
> > 
> > Do we need to change this to check free_slot instead of 
> > get_ckpt_valid_blocks()?
> 
> Jaegeuk,
> 
> LFS was assigned only for GC case, in this case we are trying to select source
> section, rather than target segment for SSR/AT_SSR case, so we don't need to
> check free_slot.
> 
> - f2fs_gc
>  - __get_victim
>   - get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS, 0);
> 
> > 
> >   732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
> >   733 get_ckpt_valid_blocks(sbi, 
> > segno) &&
> >   734 p.alloc_mode == LFS))
> 
> BTW, in LFS mode, GC wants to find source section rather than segment, so we
> should change to check valid ckpt blocks in every segment of targe section 
> here?

Alright. I refactored a bit on this patch with new one. Could you please take a 
look?

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=00152bd7cabd69b4615ebead823ff23887b0e0f7

Thanks,

> 
> Thanks,
> 
> > 
> > 
> > > + }
> > > + }
> > > +

Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"

2021-03-23 Thread Jaegeuk Kim

On 03/23, Chao Yu wrote:
> This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789.
> 
> Because that commit fails generic/050 testcase which expect failure
> during mount a recoverable readonly partition.

I think we need to change generic/050, since f2fs can recover this partition,
even though using it as readonly. And, valid checkpoint can allow for user to
read all the data without problem.

> 
> Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/super.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index b48281642e98..2b78ee11f093 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, 
> void *data, int silent)
>* previous checkpoint was not done by clean system shutdown.
>*/
>   if (f2fs_hw_is_readonly(sbi)) {
> - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
> + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> + err = -EROFS;
>   f2fs_err(sbi, "Need to recover fsync data, but 
> write access unavailable");
> - else
> - f2fs_info(sbi, "write access unavailable, 
> skipping recovery");
> + goto free_meta;
> + }
> + f2fs_info(sbi, "write access unavailable, skipping 
> recovery");
>   goto reset_checkpoint;
>   }
>  
> -- 
> 2.29.2

Re: [PATCH] f2fs: fix to align to section for fallocate() on pinned file

2021-03-23 Thread Jaegeuk Kim

On 03/23, Chao Yu wrote:
> On 2021/3/5 17:56, Chao Yu wrote:
> > Now, fallocate() on a pinned file only allocates blocks which aligns
> > to segment rather than section, so GC may try to migrate pinned file's
> > block, and after several times of failure, pinned file's block could
> > be migrated to other place, however user won't be aware of such
> > condition, and then old obsolete block address may be readed/written
> > incorrectly.
> > 
> > To avoid such condition, let's try to allocate pinned file's blocks
> > with section alignment.
> > 
> > Signed-off-by: Chao Yu 
> 
> Jaegeuk,
> 
> Could you please check and apply below diff into original patch?
> 
> ---
>  fs/f2fs/file.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 236f3f69681a..24fa68fdcaa0 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1648,13 +1648,13 @@ static int expand_inode_data(struct inode *inode, 
> loff_t offset,
>   return 0;
> 
>   if (f2fs_is_pinned_file(inode)) {
> - block_t len = (map.m_len >> sbi->log_blocks_per_seg) <<
> - sbi->log_blocks_per_seg;
> + block_t sec_blks = BLKS_PER_SEC(sbi);
> + block_t len = rounddown(map.m_len, sec_blks);

len is declared above, so let me rephrase this as well.

> 
> - if (map.m_len % sbi->blocks_per_seg)
> - len += sbi->blocks_per_seg;
> + if (map.m_len % sec_blks)
> + len += sec_blks;

is this roundup()?

Could you check this?
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=e1175f02291141bbd924fc578299305fcde35855

> 
> - map.m_len = sbi->blocks_per_seg;
> + map.m_len = sec_blks;
>  next_alloc:
>   if (has_not_enough_free_secs(sbi, 0,
>   GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi {
> -- 
> 2.22.1
>

Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator

2021-03-18 Thread Jaegeuk Kim

On 02/20, Chao Yu wrote:
> In cp disabling mode, there could be a condition
> - target segment has 128 ckpt valid blocks
> - GC migrates 128 valid blocks to other segment (segment is still in
> dirty list)
> - GC migrates 384 blocks to target segment (segment has 128 cp_vblocks
> and 384 vblocks)
> - If GC selects target segment via {AT,}SSR allocator, however there is
> no free space in targe segment.
> 
> Fixes: 4354994f097d ("f2fs: checkpoint disabling")
> Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/f2fs.h|  1 +
>  fs/f2fs/gc.c  | 17 +
>  fs/f2fs/segment.c | 20 
>  3 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index ed7807103c8e..9c753eff0814 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info 
> *sbi);
>  int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
>  void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
>  int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
> +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
>  void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
>  void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
>  void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 86ba8ed0b8a7..a1d8062cdace 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi,
>   if (p->gc_mode == GC_AT &&
>   get_valid_blocks(sbi, segno, true) == 0)
>   return;
> -
> - if (p->alloc_mode == AT_SSR &&
> - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
> - return;
>   }
>  
>   for (i = 0; i < sbi->segs_per_sec; i++)
> @@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info 
> *sbi,
>   if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
>   goto next;
>  
> + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
> + /*
> +  * to avoid selecting candidate which has below valid
> +  * block distribution:
> +  * partial blocks are valid and all left ones are valid
> +  * in previous checkpoint.
> +  */
> + if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) {
> + if (!segment_has_free_slot(sbi, segno))
> + goto next;

Do we need to change this to check free_slot instead of get_ckpt_valid_blocks()?

 732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) &&
 733 get_ckpt_valid_blocks(sbi, segno) 
&&
 734 p.alloc_mode == LFS))


> + }
> + }
> +
>   if (is_atgc) {
>   add_victim_entry(sbi, , segno);
>   goto next;
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 2d5a82c4ca15..deaf57e13125 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct f2fs_sb_info 
> *sbi,
>   seg->next_blkoff++;
>  }
>  
> +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
> +{
> + struct sit_info *sit = SIT_I(sbi);
> + struct seg_entry *se = get_seg_entry(sbi, segno);
> + int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
> + unsigned long *target_map = SIT_I(sbi)->tmp_map;
> + unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
> + unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
> + int i, pos;
> +
> + down_write(>sentry_lock);
> + for (i = 0; i < entries; i++)
> + target_map[i] = ckpt_map[i] | cur_map[i];
> +
> + pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
> + up_write(>sentry_lock);
> +
> + return pos < sbi->blocks_per_seg;
> +}
> +
>  /*
>   * This function always allocates a used segment(from dirty seglist) by SSR
>   * manner, so it should recover the existing segment information of valid 
> blocks
> -- 
> 2.29.2

Re: [PATCH v11 1/2] scsi: ufs: Enable power management for wlun

2021-03-12 Thread Jaegeuk Kim

On 03/11, Asutosh Das wrote:
> During runtime-suspend of ufs host, the scsi devices are
> already suspended and so are the queues associated with them.
> But the ufs host sends SSU to wlun during its runtime-suspend.
> During the process blk_queue_enter checks if the queue is not in
> suspended state. If so, it waits for the queue to resume, and never
> comes out of it.
> The commit
> (d55d15a33: scsi: block: Do not accept any requests while suspended)
> adds the check if the queue is in suspended state in blk_queue_enter().
> 
> Call trace:
>  __switch_to+0x174/0x2c4
>  __schedule+0x478/0x764
>  schedule+0x9c/0xe0
>  blk_queue_enter+0x158/0x228
>  blk_mq_alloc_request+0x40/0xa4
>  blk_get_request+0x2c/0x70
>  __scsi_execute+0x60/0x1c4
>  ufshcd_set_dev_pwr_mode+0x124/0x1e4
>  ufshcd_suspend+0x208/0x83c
>  ufshcd_runtime_suspend+0x40/0x154
>  ufshcd_pltfrm_runtime_suspend+0x14/0x20
>  pm_generic_runtime_suspend+0x28/0x3c
>  __rpm_callback+0x80/0x2a4
>  rpm_suspend+0x308/0x614
>  rpm_idle+0x158/0x228
>  pm_runtime_work+0x84/0xac
>  process_one_work+0x1f0/0x470
>  worker_thread+0x26c/0x4c8
>  kthread+0x13c/0x320
>  ret_from_fork+0x10/0x18
> 
> Fix this by registering ufs device wlun as a scsi driver and
> registering it for block runtime-pm. Also make this as a
> supplier for all other luns. That way, this device wlun
> suspends after all the consumers and resumes after
> hba resumes.
> 
> Co-developed-by: Can Guo 
> Signed-off-by: Can Guo 
> Signed-off-by: Asutosh Das 
> ---
>  drivers/scsi/ufs/cdns-pltfrm.c |   2 +
>  drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
>  drivers/scsi/ufs/ufs-debugfs.c |   5 +
>  drivers/scsi/ufs/ufs-debugfs.h |   2 +
>  drivers/scsi/ufs/ufs-exynos.c  |   2 +
>  drivers/scsi/ufs/ufs-hisi.c|   2 +
>  drivers/scsi/ufs/ufs-mediatek.c|   2 +
>  drivers/scsi/ufs/ufs-qcom.c|   2 +
>  drivers/scsi/ufs/ufs_bsg.c |   6 +-
>  drivers/scsi/ufs/ufshcd-pci.c  |  36 +--
>  drivers/scsi/ufs/ufshcd.c  | 616 
> ++---
>  drivers/scsi/ufs/ufshcd.h  |   7 +
>  include/trace/events/ufs.h |  20 ++
>  13 files changed, 498 insertions(+), 206 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/cdns-pltfrm.c b/drivers/scsi/ufs/cdns-pltfrm.c
> index 149391f..3e70c23 100644
> --- a/drivers/scsi/ufs/cdns-pltfrm.c
> +++ b/drivers/scsi/ufs/cdns-pltfrm.c
> @@ -319,6 +319,8 @@ static const struct dev_pm_ops cdns_ufs_dev_pm_ops = {
>   .runtime_suspend = ufshcd_pltfrm_runtime_suspend,
>   .runtime_resume  = ufshcd_pltfrm_runtime_resume,
>   .runtime_idle= ufshcd_pltfrm_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static struct platform_driver cdns_ufs_pltfrm_driver = {
> diff --git a/drivers/scsi/ufs/tc-dwc-g210-pci.c 
> b/drivers/scsi/ufs/tc-dwc-g210-pci.c
> index 67a6a61..b01db12 100644
> --- a/drivers/scsi/ufs/tc-dwc-g210-pci.c
> +++ b/drivers/scsi/ufs/tc-dwc-g210-pci.c
> @@ -148,6 +148,8 @@ static const struct dev_pm_ops tc_dwc_g210_pci_pm_ops = {
>   .runtime_suspend = tc_dwc_g210_pci_runtime_suspend,
>   .runtime_resume  = tc_dwc_g210_pci_runtime_resume,
>   .runtime_idle= tc_dwc_g210_pci_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static const struct pci_device_id tc_dwc_g210_pci_tbl[] = {
> diff --git a/drivers/scsi/ufs/ufs-debugfs.c b/drivers/scsi/ufs/ufs-debugfs.c
> index dee98dc..f8ce2eb 100644
> --- a/drivers/scsi/ufs/ufs-debugfs.c
> +++ b/drivers/scsi/ufs/ufs-debugfs.c
> @@ -54,3 +54,8 @@ void ufs_debugfs_hba_exit(struct ufs_hba *hba)
>  {
>   debugfs_remove_recursive(hba->debugfs_root);
>  }
> +
> +void ufs_debugfs_eh_exit(void)
> +{
> + debugfs_remove_recursive(ufs_debugfs_root);
> +}
> diff --git a/drivers/scsi/ufs/ufs-debugfs.h b/drivers/scsi/ufs/ufs-debugfs.h
> index f35b39c..3fce5a0 100644
> --- a/drivers/scsi/ufs/ufs-debugfs.h
> +++ b/drivers/scsi/ufs/ufs-debugfs.h
> @@ -12,11 +12,13 @@ void __init ufs_debugfs_init(void);
>  void __exit ufs_debugfs_exit(void);
>  void ufs_debugfs_hba_init(struct ufs_hba *hba);
>  void ufs_debugfs_hba_exit(struct ufs_hba *hba);
> +void ufs_debugfs_eh_exit(void);
>  #else
>  static inline void ufs_debugfs_init(void) {}
>  static inline void ufs_debugfs_exit(void) {}
>  static inline void ufs_debugfs_hba_init(struct ufs_hba *hba) {}
>  static inline void ufs_debugfs_hba_exit(struct ufs_hba *hba) {}
> +static inline void ufs_debugfs_eh_exit(void) {}
>  #endif
>  
>  #endif
> diff --git a/drivers/scsi/ufs/ufs-exynos.c b/drivers/scsi/ufs/ufs-exynos.c
> index 267943a1..45c0b02 100644
> --- a/drivers/scsi/ufs/ufs-exynos.c
> +++ b/drivers/scsi/ufs/ufs-exynos.c
> @@ -1268,6 +1268,8 @@ static const struct dev_pm_ops exynos_ufs_pm_ops = {
>   .runtime_suspend = ufshcd_pltfrm_runtime_suspend,
>   .runtime_resume  = ufshcd_pltfrm_runtime_resume,
>

Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-03-10 Thread Jaegeuk Kim

On 03/09, Chao Yu wrote:
> On 2021/3/9 8:01, Jaegeuk Kim wrote:
> > On 03/05, Chao Yu wrote:
> > > On 2021/3/5 4:20, Jaegeuk Kim wrote:
> > > > On 02/27, Jaegeuk Kim wrote:
> > > > > On 02/04, Chao Yu wrote:
> > > > > > Jaegeuk,
> > > > > > 
> > > > > > On 2021/2/2 16:00, Chao Yu wrote:
> > > > > > > - for (i = 0; i < dic->nr_cpages; i++) {
> > > > > > > + for (i = 0; i < cc->nr_cpages; i++) {
> > > > > > >   struct page *page = dic->cpages[i];
> > > > > > 
> > > > > > por_fsstress still hang in this line?
> > > > > 
> > > > > I'm stuck on testing the patches, since the latest kernel is 
> > > > > panicking somehow.
> > > > > Let me update later, once I can test a bit. :(
> > > > 
> > > > It seems this works without error.
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=4e6e1364dccba80ed44925870b97fbcf989b96c9
> > > 
> > > Ah, good news.
> > > 
> > > Thanks for helping to test the patch. :)
> > 
> > Hmm, I hit this again. Let me check w/o compress_cache back. :(
> 
> Oops :(

Ok, apprantely that panic is caused by compress_cache. The test is running over
24hours w/o it.

Re: [LTP] [f2fs] 02eb84b96b: ltp.swapon03.fail

2021-03-10 Thread Jaegeuk Kim

On 03/10, Huang Jianan wrote:
> Hi Richard,
> 
> On 2021/3/9 12:01, Matthew Wilcox wrote:
> > On Tue, Mar 09, 2021 at 10:23:35AM +0800, Weichao Guo wrote:
> > > Hi Richard,
> > > 
> > > On 2021/3/8 19:53, Richard Palethorpe wrote:
> > > > Hello,
> > > > 
> > > > > kern  :err   : [  187.461914] F2FS-fs (sda1): Swapfile does not align 
> > > > > to section
> > > > > commit 02eb84b96bc1b382dd138bf60724edbefe77b025
> > > > > Author: huangjia...@oppo.com 
> > > > > Date:   Mon Mar 1 12:58:44 2021 +0800
> > > > >   f2fs: check if swapfile is section-alligned
> > > > >   If the swapfile isn't created by pin and fallocate, it can't be
> > > > >   guaranteed section-aligned, so it may be selected by f2fs gc. 
> > > > > When
> > > > >   gc_pin_file_threshold is reached, the address of swapfile may 
> > > > > change,
> > > > >   but won't be synchronized to swap_extent, so swap will write to 
> > > > > wrong
> > > > >   address, which will cause data corruption.
> > > > >   Signed-off-by: Huang Jianan 
> > > > >   Signed-off-by: Guo Weichao 
> > > > >   Reviewed-by: Chao Yu 
> > > > >   Signed-off-by: Jaegeuk Kim 
> > > > The test uses fallocate to preallocate the swap file and writes zeros to
> > > > it. I'm not sure what pin refers to?
> > > 'pin' refers to pinned file feature in F2FS, the LBA(Logical Block 
> > > Address)
> > > of a file is fixed after pinned. Without this operation before fallocate,
> > > the LBA may not align with section(F2FS GC unit), some LBA of the file may
> > > be changed by F2FS GC in some extreme cases.
> > > 
> > > For this test case, how about pin the swap file before fallocate for F2FS 
> > > as
> > > following:
> > > 
> > > ioctl(fd, F2FS_IOC_SET_PIN_FILE, true);
> > No special ioctl should be needed.  f2fs_swap_activate() should pin the
> > file, just like it converts inline inodes and disables compression.
> 
> Now f2fs_swap_activate() will pin the file. The problem is that when
> f2fs_swap_activate()
> 
> is executed, the file has been created and may not be section-aligned.
> 
> So I think it would be better to consider aligning the swapfile during
> f2fs_swap_activate()?

Does it make sense to reallocate blocks like
in f2fs_swap_activate(),
set_inode_flag(inode, FI_PIN_FILE);
truncate_pagecache(inode, 0);
f2fs_truncate_blocks(inode, 0, true);
expand_inode_data();

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-08 Thread Jaegeuk Kim

On 03/05, Chao Yu wrote:
> On 2021/3/5 1:50, Jaegeuk Kim wrote:
> > On 03/04, Chao Yu wrote:
> > > On 2021/3/3 2:44, Jaegeuk Kim wrote:
> > > > On 03/02, Jaegeuk Kim wrote:
> > > > > On 03/02, Chao Yu wrote:
> > > > > > On 2021/3/2 13:42, Jaegeuk Kim wrote:
> > > > > > > This is useful when checking conditions during checkpoint=disable 
> > > > > > > in Android.
> > > > > > 
> > > > > > This sysfs entry is readonly, how about putting this at
> > > > > > /sys/fs/f2fs//stat/?
> > > > > 
> > > > > Urg.. "stat" is a bit confused. I'll take a look a better ones.
> > > 
> > > Oh, I mean put it into "stat" directory, not "stat" entry, something like 
> > > this:
> > > 
> > > /sys/fs/f2fs//stat/ovp_segments
> > 
> > I meant that too. Why is it like stat, since it's a geomerty?
> 
> Hmm.. I feel a little bit weired to treat ovp_segments as 'stat' class, one 
> reason
> is ovp_segments is readonly and is matching the readonly attribute of a stat.

It seems I don't fully understand what you suggest here. I don't want to add the
# of ovp_segments in /stat, since it is not part of status, but put it in
/ to sync with other # of free/dirty segments. If you can't read out 
easily,
I suggest to create symlinks to organize all the current mess.

> 
> > 
> > > 
> > > > 
> > > > Taking a look at other entries using in Android, I feel that this one 
> > > > can't be
> > > > in stat or whatever other location, since I worry about the consistency 
> > > > with
> > > > similar dirty/free segments. It seems it's not easy to clean up the 
> > > > existing
> > > > ones anymore.
> > > 
> > > Well, actually, the entry number are still increasing continuously, the 
> > > result is
> > > that it becomes more and more slower and harder for me to find target 
> > > entry name
> > > from that directory.
> > > 
> > > IMO, once new readonly entry was added to "" directory, there is no 
> > > chance
> > > to reloacate it due to interface compatibility. So I think this is the 
> > > only
> > > chance to put it to the appropriate place at this time.
> > 
> > I know, but this will diverge those info into different places. I don't have
> > big concern when finding a specific entry with this tho, how about making
> > symlinks to create a dir structure for your easy access? Or, using a script
> > would be alternative way.
> 
> Yes, there should be some alternative ways to help to access f2fs sysfs
> interface, but from a point view of user, I'm not sure he can figure out those
> ways.
> 
> For those fs meta stat, why not adding a single entry to include all info you
> need rather than adding them one by one? e.g.

You can add that in /proc as well, which requires to parse back when retrieving
specific values.

> 
> /proc/fs/f2fs//super_block
> /proc/fs/f2fs//checkpoint
> /proc/fs/f2fs//nat_table
> /proc/fs/f2fs//sit_table
> ...
> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Jaegeuk Kim 
> > > > > > > ---
> > > > > > > fs/f2fs/sysfs.c | 8 
> > > > > > > 1 file changed, 8 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > > > > > > index e38a7f6921dd..254b6fa17406 100644
> > > > > > > --- a/fs/f2fs/sysfs.c
> > > > > > > +++ b/fs/f2fs/sysfs.c
> > > > > > > @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct 
> > > > > > > f2fs_attr *a,
> > > > > > >   (unsigned long 
> > > > > > > long)(free_segments(sbi)));
> > > > > > > }
> > > > > > > +static ssize_t ovp_segments_show(struct f2fs_attr *a,
> > > > > > > + struct f2fs_sb_info *sbi, char *buf)
> > > > > > > +{
> > > > > > > + return sprintf(buf, "%llu\n",
> > > > > > > + (unsigned long 
> > > > > > > long)(overprovision_segments(sbi)));
> > > > > > > +}
> > &

Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-03-08 Thread Jaegeuk Kim

On 03/05, Chao Yu wrote:
> On 2021/3/5 4:20, Jaegeuk Kim wrote:
> > On 02/27, Jaegeuk Kim wrote:
> > > On 02/04, Chao Yu wrote:
> > > > Jaegeuk,
> > > > 
> > > > On 2021/2/2 16:00, Chao Yu wrote:
> > > > > - for (i = 0; i < dic->nr_cpages; i++) {
> > > > > + for (i = 0; i < cc->nr_cpages; i++) {
> > > > >   struct page *page = dic->cpages[i];
> > > > 
> > > > por_fsstress still hang in this line?
> > > 
> > > I'm stuck on testing the patches, since the latest kernel is panicking 
> > > somehow.
> > > Let me update later, once I can test a bit. :(
> > 
> > It seems this works without error.
> > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=4e6e1364dccba80ed44925870b97fbcf989b96c9
> 
> Ah, good news.
> 
> Thanks for helping to test the patch. :)

Hmm, I hit this again. Let me check w/o compress_cache back. :(

[159210.201131] [ cut here ]
[159210.204241] kernel BUG at fs/f2fs/compress.c:1082!
[159210.207321] invalid opcode:  [#1] SMP PTI
[159210.209407] CPU: 4 PID: 2753477 Comm: kworker/u16:2 Tainted: G   OE 
5.12.0-rc1-custom #1
[159210.212737] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.14.0-2 04/01/2014
[159210.224800] Workqueue: writeback wb_workfn (flush-252:16)
[159210.226851] RIP: 0010:prepare_compress_overwrite+0x4c0/0x760 [f2fs]
[159210.229506] Code: 8b bf 90 0a 00 00 be 40 0d 00 00 e8 4a 92 4f c4 49 89 44 
24 18 48 85 c0 0f 84 85 02 00 00 41 8b 54 24 10 e9 c5 fb ff ff 0f 0b <0f> 0b 41 
8b 44 24 20 85 c0 0f 84 2a ff ff ff 48 8
[159210.236311] RSP: 0018:9fa782177858 EFLAGS: 00010246
[159210.238517] RAX:  RBX:  RCX: 

[159210.240734] RDX: 001c RSI:  RDI: 

[159210.242941] RBP: 9fa7821778f0 R08: 93b9c89cb232 R09: 
0003
[159210.245107] R10: 86873420 R11: 0001 R12: 
9fa782177900
[159210.247319] R13: 93b906dca578 R14: 031c R15: 

[159210.249492] FS:  () GS:93b9fbd0() 
knlGS:
[159210.254724] CS:  0010 DS:  ES:  CR0: 80050033
[159210.258709] CR2: 7f0367d33738 CR3: 00012bc0c004 CR4: 
00370ee0
[159210.261608] DR0:  DR1:  DR2: 

[159210.264614] DR3:  DR6: fffe0ff0 DR7: 
0400
[159210.267476] Call Trace:
[159210.269075]  ? f2fs_compress_write_end+0xa2/0x100 [f2fs]
[159210.271165]  f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs]
[159210.273017]  f2fs_write_cache_pages+0x468/0x8a0 [f2fs]
[159210.274848]  f2fs_write_data_pages+0x2a4/0x2f0 [f2fs]
[159210.276612]  ? from_kgid+0x12/0x20
[159210.277994]  ? f2fs_update_inode+0x3cb/0x510 [f2fs]
[159210.279748]  do_writepages+0x38/0xc0
[159210.281183]  ? f2fs_write_inode+0x11c/0x300 [f2fs]
[159210.282877]  __writeback_single_inode+0x44/0x2a0
[159210.284526]  writeback_sb_inodes+0x223/0x4d0
[159210.286105]  __writeback_inodes_wb+0x56/0xf0
[159210.287740]  wb_writeback+0x1dd/0x290
[159210.289182]  wb_workfn+0x309/0x500
[159210.290553]  process_one_work+0x220/0x3c0
[159210.292048]  worker_thread+0x53/0x420
[159210.293403]  kthread+0x12f/0x150
[159210.294716]  ? process_one_work+0x3c0/0x3c0
[159210.296204]  ? __kthread_bind_mask+0x70/0x70
[159210.297702]  ret_from_fork+0x22/0x30


> 
> Thanks,
> 
> > 
> > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > >   block_t blkaddr;
> > > > >   struct bio_post_read_ctx *ctx;
> > > > > @@ -2201,6 +2207,14 @@ int f2fs_read_multi_pages(struct compress_ctx 
> > > > > *cc, struct bio **bio_ret,
> > > > >   blkaddr = data_blkaddr(dn.inode, dn.node_page,
> > > > >   dn.ofs_in_node + i + 1);
> > > > > + f2fs_wait_on_block_writeback(inode, blkaddr);
> > > > > +
> > > > > + if (f2fs_load_compressed_page(sbi, page, blkaddr)) {
> > > > > + if (atomic_dec_and_test(>remaining_pages))
> > > > > + f2fs_decompress_cluster(dic);
> > > > > + continue;
> > > > > + }
> > > > > +
> > > 
> > > 
> > > ___
> > > Linux-f2fs-devel mailing list
> > > linux-f2fs-de...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > .
> >

Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-03-04 Thread Jaegeuk Kim

On 02/27, Jaegeuk Kim wrote:
> On 02/04, Chao Yu wrote:
> > Jaegeuk,
> > 
> > On 2021/2/2 16:00, Chao Yu wrote:
> > > - for (i = 0; i < dic->nr_cpages; i++) {
> > > + for (i = 0; i < cc->nr_cpages; i++) {
> > >   struct page *page = dic->cpages[i];
> > 
> > por_fsstress still hang in this line?
> 
> I'm stuck on testing the patches, since the latest kernel is panicking 
> somehow.
> Let me update later, once I can test a bit. :(

It seems this works without error.
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev=4e6e1364dccba80ed44925870b97fbcf989b96c9

> 
> > 
> > Thanks,
> > 
> > >   block_t blkaddr;
> > >   struct bio_post_read_ctx *ctx;
> > > @@ -2201,6 +2207,14 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, 
> > > struct bio **bio_ret,
> > >   blkaddr = data_blkaddr(dn.inode, dn.node_page,
> > >   dn.ofs_in_node + i + 1);
> > > + f2fs_wait_on_block_writeback(inode, blkaddr);
> > > +
> > > + if (f2fs_load_compressed_page(sbi, page, blkaddr)) {
> > > + if (atomic_dec_and_test(>remaining_pages))
> > > + f2fs_decompress_cluster(dic);
> > > + continue;
> > > + }
> > > +
> 
> 
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-04 Thread Jaegeuk Kim

On 03/04, Chao Yu wrote:
> On 2021/3/3 2:44, Jaegeuk Kim wrote:
> > On 03/02, Jaegeuk Kim wrote:
> > > On 03/02, Chao Yu wrote:
> > > > On 2021/3/2 13:42, Jaegeuk Kim wrote:
> > > > > This is useful when checking conditions during checkpoint=disable in 
> > > > > Android.
> > > > 
> > > > This sysfs entry is readonly, how about putting this at
> > > > /sys/fs/f2fs//stat/?
> > > 
> > > Urg.. "stat" is a bit confused. I'll take a look a better ones.
> 
> Oh, I mean put it into "stat" directory, not "stat" entry, something like 
> this:
> 
> /sys/fs/f2fs//stat/ovp_segments

I meant that too. Why is it like stat, since it's a geomerty?

> 
> > 
> > Taking a look at other entries using in Android, I feel that this one can't 
> > be
> > in stat or whatever other location, since I worry about the consistency with
> > similar dirty/free segments. It seems it's not easy to clean up the existing
> > ones anymore.
> 
> Well, actually, the entry number are still increasing continuously, the 
> result is
> that it becomes more and more slower and harder for me to find target entry 
> name
> from that directory.
> 
> IMO, once new readonly entry was added to "" directory, there is no 
> chance
> to reloacate it due to interface compatibility. So I think this is the only
> chance to put it to the appropriate place at this time.

I know, but this will diverge those info into different places. I don't have
big concern when finding a specific entry with this tho, how about making
symlinks to create a dir structure for your easy access? Or, using a script
would be alternative way.

> 
> Thanks,
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Jaegeuk Kim 
> > > > > ---
> > > > >fs/f2fs/sysfs.c | 8 
> > > > >1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > > > > index e38a7f6921dd..254b6fa17406 100644
> > > > > --- a/fs/f2fs/sysfs.c
> > > > > +++ b/fs/f2fs/sysfs.c
> > > > > @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr 
> > > > > *a,
> > > > >   (unsigned long long)(free_segments(sbi)));
> > > > >}
> > > > > +static ssize_t ovp_segments_show(struct f2fs_attr *a,
> > > > > + struct f2fs_sb_info *sbi, char *buf)
> > > > > +{
> > > > > + return sprintf(buf, "%llu\n",
> > > > > + (unsigned long 
> > > > > long)(overprovision_segments(sbi)));
> > > > > +}
> > > > > +
> > > > >static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
> > > > >   struct f2fs_sb_info *sbi, char *buf)
> > > > >{
> > > > > @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, 
> > > > > node_io_flag, node_io_flag);
> > > > >F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
> > > > > ckpt_thread_ioprio);
> > > > >F2FS_GENERAL_RO_ATTR(dirty_segments);
> > > > >F2FS_GENERAL_RO_ATTR(free_segments);
> > > > > +F2FS_GENERAL_RO_ATTR(ovp_segments);
> > > > 
> > > > Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?
> > > 
> > > Yeah, thanks.
> > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > >F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
> > > > >F2FS_GENERAL_RO_ATTR(features);
> > > > >F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
> > > > > 
> > > 
> > > 
> > > ___
> > > Linux-f2fs-devel mailing list
> > > linux-f2fs-de...@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > .
> >

Re: f2fs_convert_inline_inode causing rebalance based on random uninitialized value in dn.node_changed

2021-03-03 Thread Jaegeuk Kim

On 03/03, Colin Ian King wrote:
> On 03/03/2021 19:44, Jaegeuk Kim wrote:
> > On 03/02, Colin Ian King wrote:
> >> Hi,
> >>
> >> Static analysis on linux-next detected a potential uninitialized
> >> variable dn.node_changed that does not get set when a call to
> >> f2fs_get_node_page() fails.  This uninitialized value gets used in the
> >> call to f2fs_balance_fs() that may or not may not balances dirty node
> >> and dentry pages depending on the uninitialized state of the variable.
> >>
> >> I believe the issue was introduced by commit:
> >>
> >> commit 2a3407607028f7c780f1c20faa4e922bf631d340
> >> Author: Jaegeuk Kim 
> >> Date:   Tue Dec 22 13:23:35 2015 -0800
> >>
> >> f2fs: call f2fs_balance_fs only when node was changed
> >>
> >>
> >> The analysis is a follows:
> >>
> >> 184 int f2fs_convert_inline_inode(struct inode *inode)
> >> 185 {
> >> 186struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> >>
> >>1. var_decl: Declaring variable dn without initializer.
> >>
> >> 187struct dnode_of_data dn;
> >>
> >>NOTE dn is not initialized here.
> >>
> >> 188struct page *ipage, *page;
> >> 189int err = 0;
> >> 190
> >>
> >>2. Condition !f2fs_has_inline_data(inode), taking false branch.
> >>3. Condition f2fs_hw_is_readonly(sbi), taking false branch.
> >>4. Condition f2fs_readonly(sbi->sb), taking false branch.
> >>
> >> 191if (!f2fs_has_inline_data(inode) ||
> >> 192f2fs_hw_is_readonly(sbi) ||
> >> f2fs_readonly(sbi->sb))
> >> 193return 0;
> >> 194
> >> 195err = dquot_initialize(inode);
> >>
> >>5. Condition err, taking false branch.
> >>
> >> 196if (err)
> >> 197return err;
> >> 198
> >> 199page = f2fs_grab_cache_page(inode->i_mapping, 0, false);
> >>
> >>6. Condition !page, taking false branch.
> >>
> >> 200if (!page)
> >> 201return -ENOMEM;
> >> 202
> >> 203f2fs_lock_op(sbi);
> >> 204
> >> 205ipage = f2fs_get_node_page(sbi, inode->i_ino);
> >>
> >>7. Condition IS_ERR(ipage), taking true branch.
> >>
> >> 206if (IS_ERR(ipage)) {
> >> 207err = PTR_ERR(ipage);
> >>
> >>8. Jumping to label out.
> >>
> >> 208goto out;
> >> 209}
> >> 210
> >>
> >>NOTE: set_new_dnode memset's dn so sets the flag to false, but we
> >> don't get to this memset if IS_ERR(ipage) above is true.
> >>
> >> 211set_new_dnode(, inode, ipage, ipage, 0);
> >> 212
> >> 213if (f2fs_has_inline_data(inode))
> >> 214err = f2fs_convert_inline_page(, page);
> >> 215
> >> 216f2fs_put_dnode();
> >> 217 out:
> >> 218f2fs_unlock_op(sbi);
> >> 219
> >> 220f2fs_put_page(page, 1);
> >> 221
> >>
> >> Uninitialized scalar variable:
> >>
> >>9. uninit_use_in_call: Using uninitialized value dn.node_changed when
> >> calling f2fs_balance_fs.
> >>
> >> 222f2fs_balance_fs(sbi, dn.node_changed);
> >> 223
> >> 224return err;
> >> 225 }
> >>
> >> I think a suitable fix will be to set dn.node_changed to false on in
> >> line 207-208 but I'm concerned if I'm missing something subtle to the
> >> rebalancing if I do this.
> >>
> >> Comments?
> > 
> > Thank you for the report. Yes, it seems that's a right call and we need to
> > check the error to decide calling f2fs_balance_fs() in line 222, since
> > set_new_dnode() is used to set all the fields in dnode_of_data. So, if you
> > don't mind, could you please post a patch?
> 
> Just to clarify, just setting dn.node_changes to false is enough?
> 
> I'm not entirely sure what you meant when you wrote "and we need to
> check the error to decide calling f2fs_balance_fs() in line 222".

I meant:

222 if (!err)
223 f2fs_balance_fs(sbi, dn.node_changed);

Thanks,

> 
> Colin
> 
> > 
> > Thanks,
> > 
> >>
> >> Colin
> >>

Re: [PATCH v2] f2fs: expose # of overprivision segments

2021-03-03 Thread Jaegeuk Kim

This is useful when checking conditions during checkpoint=disable in Android.

Signed-off-by: Jaegeuk Kim 
---
 Documentation/ABI/testing/sysfs-fs-f2fs | 5 +
 fs/f2fs/sysfs.c | 9 +
 2 files changed, 14 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index 9fa5a528cc23..4aa8f38b52d7 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -409,3 +409,8 @@ Description:Give a way to change checkpoint merge 
daemon's io priority.
I/O priority "3". We can select the class between "rt" and "be",
and set the I/O priority within valid range of it. "," delimiter
is necessary in between I/O class and priority number.
+
+What:  /sys/fs/f2fs//ovp_segments
+Date:      March 2021
+Contact:   "Jaegeuk Kim" 
+Description:   Shows the number of overprovision segments.
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..0c391ab2d8b7 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
 }
 
+static ssize_t ovp_segments_show(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
 static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
 {
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
 F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
 F2FS_GENERAL_RO_ATTR(dirty_segments);
 F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);
 F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
 F2FS_GENERAL_RO_ATTR(features);
 F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
@@ -715,6 +723,7 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(ckpt_thread_ioprio),
ATTR_LIST(dirty_segments),
ATTR_LIST(free_segments),
+   ATTR_LIST(ovp_segments),
ATTR_LIST(unusable),
ATTR_LIST(lifetime_write_kbytes),
ATTR_LIST(features),
-- 
2.31.0.rc0.254.gbdcc3b1a9d-goog

Re: linux-next: build warning after merge of the f2fs tree

2021-03-03 Thread Jaegeuk Kim

On 03/03, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the f2fs tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> fs/f2fs/sysfs.c:576:25: warning: 'f2fs_attr_ovp_segments' defined but not 
> used [-Wunused-variable]
>   576 | static struct f2fs_attr f2fs_attr_##name = __ATTR(name, 0444, 
> name##_show, NULL)
>   | ^~
> fs/f2fs/sysfs.c:639:1: note: in expansion of macro 'F2FS_GENERAL_RO_ATTR'
>   639 | F2FS_GENERAL_RO_ATTR(ovp_segments);
>   | ^~~~
> 
> Introduced by commit
> 
>   10e0b8ef8715 ("f2fs: expose # of overprivision segments")

Thanks. Should be fixed soon.

> 
> -- 
> Cheers,
> Stephen Rothwell

Re: f2fs_convert_inline_inode causing rebalance based on random uninitialized value in dn.node_changed

2021-03-03 Thread Jaegeuk Kim

On 03/02, Colin Ian King wrote:
> Hi,
> 
> Static analysis on linux-next detected a potential uninitialized
> variable dn.node_changed that does not get set when a call to
> f2fs_get_node_page() fails.  This uninitialized value gets used in the
> call to f2fs_balance_fs() that may or not may not balances dirty node
> and dentry pages depending on the uninitialized state of the variable.
> 
> I believe the issue was introduced by commit:
> 
> commit 2a3407607028f7c780f1c20faa4e922bf631d340
> Author: Jaegeuk Kim 
> Date:   Tue Dec 22 13:23:35 2015 -0800
> 
> f2fs: call f2fs_balance_fs only when node was changed
> 
> 
> The analysis is a follows:
> 
> 184 int f2fs_convert_inline_inode(struct inode *inode)
> 185 {
> 186struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> 
>1. var_decl: Declaring variable dn without initializer.
> 
> 187struct dnode_of_data dn;
> 
>NOTE dn is not initialized here.
> 
> 188struct page *ipage, *page;
> 189int err = 0;
> 190
> 
>2. Condition !f2fs_has_inline_data(inode), taking false branch.
>3. Condition f2fs_hw_is_readonly(sbi), taking false branch.
>4. Condition f2fs_readonly(sbi->sb), taking false branch.
> 
> 191if (!f2fs_has_inline_data(inode) ||
> 192f2fs_hw_is_readonly(sbi) ||
> f2fs_readonly(sbi->sb))
> 193return 0;
> 194
> 195err = dquot_initialize(inode);
> 
>5. Condition err, taking false branch.
> 
> 196if (err)
> 197return err;
> 198
> 199page = f2fs_grab_cache_page(inode->i_mapping, 0, false);
> 
>6. Condition !page, taking false branch.
> 
> 200if (!page)
> 201return -ENOMEM;
> 202
> 203f2fs_lock_op(sbi);
> 204
> 205ipage = f2fs_get_node_page(sbi, inode->i_ino);
> 
>7. Condition IS_ERR(ipage), taking true branch.
> 
> 206if (IS_ERR(ipage)) {
> 207err = PTR_ERR(ipage);
> 
>8. Jumping to label out.
> 
> 208goto out;
> 209}
> 210
> 
>NOTE: set_new_dnode memset's dn so sets the flag to false, but we
> don't get to this memset if IS_ERR(ipage) above is true.
> 
> 211set_new_dnode(, inode, ipage, ipage, 0);
> 212
> 213if (f2fs_has_inline_data(inode))
> 214err = f2fs_convert_inline_page(, page);
> 215
> 216f2fs_put_dnode();
> 217 out:
> 218f2fs_unlock_op(sbi);
> 219
> 220f2fs_put_page(page, 1);
> 221
> 
> Uninitialized scalar variable:
> 
>9. uninit_use_in_call: Using uninitialized value dn.node_changed when
> calling f2fs_balance_fs.
> 
> 222f2fs_balance_fs(sbi, dn.node_changed);
> 223
> 224return err;
> 225 }
> 
> I think a suitable fix will be to set dn.node_changed to false on in
> line 207-208 but I'm concerned if I'm missing something subtle to the
> rebalancing if I do this.
> 
> Comments?

Thank you for the report. Yes, it seems that's a right call and we need to
check the error to decide calling f2fs_balance_fs() in line 222, since
set_new_dnode() is used to set all the fields in dnode_of_data. So, if you
don't mind, could you please post a patch?

Thanks,

> 
> Colin
>

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-02 Thread Jaegeuk Kim

On 03/02, Jaegeuk Kim wrote:
> On 03/02, Chao Yu wrote:
> > On 2021/3/2 13:42, Jaegeuk Kim wrote:
> > > This is useful when checking conditions during checkpoint=disable in 
> > > Android.
> > 
> > This sysfs entry is readonly, how about putting this at
> > /sys/fs/f2fs//stat/?
> 
> Urg.. "stat" is a bit confused. I'll take a look a better ones.

Taking a look at other entries using in Android, I feel that this one can't be
in stat or whatever other location, since I worry about the consistency with
similar dirty/free segments. It seems it's not easy to clean up the existing
ones anymore.

> 
> > 
> > > 
> > > Signed-off-by: Jaegeuk Kim 
> > > ---
> > >   fs/f2fs/sysfs.c | 8 
> > >   1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > > index e38a7f6921dd..254b6fa17406 100644
> > > --- a/fs/f2fs/sysfs.c
> > > +++ b/fs/f2fs/sysfs.c
> > > @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
> > >   (unsigned long long)(free_segments(sbi)));
> > >   }
> > > +static ssize_t ovp_segments_show(struct f2fs_attr *a,
> > > + struct f2fs_sb_info *sbi, char *buf)
> > > +{
> > > + return sprintf(buf, "%llu\n",
> > > + (unsigned long long)(overprovision_segments(sbi)));
> > > +}
> > > +
> > >   static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
> > >   struct f2fs_sb_info *sbi, char *buf)
> > >   {
> > > @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
> > > node_io_flag);
> > >   F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
> > > ckpt_thread_ioprio);
> > >   F2FS_GENERAL_RO_ATTR(dirty_segments);
> > >   F2FS_GENERAL_RO_ATTR(free_segments);
> > > +F2FS_GENERAL_RO_ATTR(ovp_segments);
> > 
> > Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?
> 
> Yeah, thanks.
> 
> > 
> > Thanks,
> > 
> > >   F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
> > >   F2FS_GENERAL_RO_ATTR(features);
> > >   F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
> > > 
> 
> 
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments

2021-03-02 Thread Jaegeuk Kim

On 03/02, Chao Yu wrote:
> On 2021/3/2 13:42, Jaegeuk Kim wrote:
> > This is useful when checking conditions during checkpoint=disable in 
> > Android.
> 
> This sysfs entry is readonly, how about putting this at
> /sys/fs/f2fs//stat/?

Urg.. "stat" is a bit confused. I'll take a look a better ones.

> 
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >   fs/f2fs/sysfs.c | 8 
> >   1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > index e38a7f6921dd..254b6fa17406 100644
> > --- a/fs/f2fs/sysfs.c
> > +++ b/fs/f2fs/sysfs.c
> > @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
> > (unsigned long long)(free_segments(sbi)));
> >   }
> > +static ssize_t ovp_segments_show(struct f2fs_attr *a,
> > +   struct f2fs_sb_info *sbi, char *buf)
> > +{
> > +   return sprintf(buf, "%llu\n",
> > +   (unsigned long long)(overprovision_segments(sbi)));
> > +}
> > +
> >   static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
> > struct f2fs_sb_info *sbi, char *buf)
> >   {
> > @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
> > node_io_flag);
> >   F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
> > ckpt_thread_ioprio);
> >   F2FS_GENERAL_RO_ATTR(dirty_segments);
> >   F2FS_GENERAL_RO_ATTR(free_segments);
> > +F2FS_GENERAL_RO_ATTR(ovp_segments);
> 
> Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs?

Yeah, thanks.

> 
> Thanks,
> 
> >   F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
> >   F2FS_GENERAL_RO_ATTR(features);
> >   F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
> >

[PATCH] f2fs: expose # of overprivision segments

2021-03-02 Thread Jaegeuk Kim

This is useful when checking conditions during checkpoint=disable in Android.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/sysfs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index e38a7f6921dd..254b6fa17406 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a,
(unsigned long long)(free_segments(sbi)));
 }
 
+static ssize_t ovp_segments_show(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi, char *buf)
+{
+   return sprintf(buf, "%llu\n",
+   (unsigned long long)(overprovision_segments(sbi)));
+}
+
 static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
 {
@@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, 
node_io_flag);
 F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, 
ckpt_thread_ioprio);
 F2FS_GENERAL_RO_ATTR(dirty_segments);
 F2FS_GENERAL_RO_ATTR(free_segments);
+F2FS_GENERAL_RO_ATTR(ovp_segments);
 F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
 F2FS_GENERAL_RO_ATTR(features);
 F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
-- 
2.30.1.766.gb4fecdf3b7-goog

Re: [f2fs-dev] [PATCH 3/3] f2fs: check if swapfile is section-alligned

2021-03-02 Thread Jaegeuk Kim

On 03/01, Jaegeuk Kim wrote:
> On 03/01, Chao Yu wrote:
> > Hi Jianan,
> 
> Merged 1/3 and 2/3, so please post v2 on 3/3.

NVM. Found v2.

> 
> Thanks,
> 
> > 
> > On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote:
> > > If the swapfile isn't created by pin and fallocate, it cann't be
> > 
> > Typo:
> > 
> > can't
> > 
> > > guaranteed section-aligned, so it may be selected by f2fs gc. When
> > > gc_pin_file_threshold is reached, the address of swapfile may change,
> > > but won't be synchroniz to swap_extent, so swap will write to wrong
> > 
> > synchronized
> > 
> > > address, which will cause data corruption.
> > > 
> > > Signed-off-by: Huang Jianan 
> > > Signed-off-by: Guo Weichao 
> > > ---
> > >   fs/f2fs/data.c | 63 ++
> > >   1 file changed, 63 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > index 4dbc1cafc55d..3e523d6e4643 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -3781,11 +3781,63 @@ int f2fs_migrate_page(struct address_space 
> > > *mapping,
> > >   #endif
> > >   #ifdef CONFIG_SWAP
> > > +static int f2fs_check_file_aligned(struct inode *inode)
> > 
> > f2fs_check_file_alignment() or f2fs_is_file_aligned()?
> > 
> > > +{
> > > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > + block_t main_blkaddr = SM_I(sbi)->main_blkaddr;
> > > + block_t cur_lblock;
> > > + block_t last_lblock;
> > > + block_t pblock;
> > > + unsigned long len;
> > > + unsigned long nr_pblocks;
> > > + unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec;
> > 
> > unsigned int blocks_per_sec = BLKS_PER_SEC(sbi);
> > 
> > > + int ret;
> > > +
> > > + cur_lblock = 0;
> > > + last_lblock = bytes_to_blks(inode, i_size_read(inode));
> > > + len = i_size_read(inode);
> > > +
> > > + while (cur_lblock < last_lblock) {
> > > + struct f2fs_map_blocks map;
> > > + pgoff_t next_pgofs;
> > > +
> > > + memset(, 0, sizeof(map));
> > > + map.m_lblk = cur_lblock;
> > > + map.m_len = bytes_to_blks(inode, len) - cur_lblock;
> > 
> > map.m_len = last_lblock - cur_lblock;
> > 
> > > + map.m_next_pgofs = _pgofs;
> > 
> > map.m_next_pgofs = NULL;
> > map.m_next_extent = NULL;
> > 
> > > + map.m_seg_type = NO_CHECK_TYPE;
> > 
> > map.m_may_create = false;
> > 
> > > +
> > > + ret = f2fs_map_blocks(inode, , 0, F2FS_GET_BLOCK_FIEMAP);
> > > +
> > 
> > Unneeded blank line.
> > 
> > > + if (ret)
> > > + goto err_out;
> > > +
> > > + /* hole */
> > > + if (!(map.m_flags & F2FS_MAP_FLAGS))
> > 
> > ret = -ENOENT;
> > 
> > > + goto err_out;
> > > +
> > > + pblock = map.m_pblk;
> > > + nr_pblocks = map.m_len;
> > > +
> > > + if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
> > > + nr_pblocks & (blocks_per_sec - 1))
> > 
> > ret = -EINVAL;
> > 
> > > + goto err_out;
> > > +
> > > + cur_lblock += nr_pblocks;
> > > + }
> > > +
> > > + return 0;
> > > +err_out:
> > > + pr_err("swapon: swapfile isn't section-aligned\n");
> > 
> > We should show above message only after we fail in check condition:
> > 
> > if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
> > nr_pblocks & (blocks_per_sec - 1)) {
> > f2fs_err(sbi, "Swapfile does not align to section");
> > goto err_out;
> > }
> > 
> > And please use f2fs_{err,warn,info..} macro rather than 
> > pr_{err,warn,info..}.
> > 
> > Could you please fix above related issues in check_swap_activate_fast() as 
> > well.
> > 
> > > + return -EINVAL;
> > 
> > return ret;
> > 
> > > +}
> > > +
> > >   static int check_swap_activate_fast(struct swap_info_struct *sis,
> > >   struct file *swap_file, sector_t *span)
> > >   {
> > >   struct address_space *mapping = sw

Re: [f2fs-dev] [PATCH 3/3] f2fs: check if swapfile is section-alligned

2021-03-02 Thread Jaegeuk Kim

On 03/01, Chao Yu wrote:
> Hi Jianan,

Merged 1/3 and 2/3, so please post v2 on 3/3.

Thanks,

> 
> On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote:
> > If the swapfile isn't created by pin and fallocate, it cann't be
> 
> Typo:
> 
> can't
> 
> > guaranteed section-aligned, so it may be selected by f2fs gc. When
> > gc_pin_file_threshold is reached, the address of swapfile may change,
> > but won't be synchroniz to swap_extent, so swap will write to wrong
> 
> synchronized
> 
> > address, which will cause data corruption.
> > 
> > Signed-off-by: Huang Jianan 
> > Signed-off-by: Guo Weichao 
> > ---
> >   fs/f2fs/data.c | 63 ++
> >   1 file changed, 63 insertions(+)
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 4dbc1cafc55d..3e523d6e4643 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -3781,11 +3781,63 @@ int f2fs_migrate_page(struct address_space *mapping,
> >   #endif
> >   #ifdef CONFIG_SWAP
> > +static int f2fs_check_file_aligned(struct inode *inode)
> 
> f2fs_check_file_alignment() or f2fs_is_file_aligned()?
> 
> > +{
> > +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > +   block_t main_blkaddr = SM_I(sbi)->main_blkaddr;
> > +   block_t cur_lblock;
> > +   block_t last_lblock;
> > +   block_t pblock;
> > +   unsigned long len;
> > +   unsigned long nr_pblocks;
> > +   unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec;
> 
> unsigned int blocks_per_sec = BLKS_PER_SEC(sbi);
> 
> > +   int ret;
> > +
> > +   cur_lblock = 0;
> > +   last_lblock = bytes_to_blks(inode, i_size_read(inode));
> > +   len = i_size_read(inode);
> > +
> > +   while (cur_lblock < last_lblock) {
> > +   struct f2fs_map_blocks map;
> > +   pgoff_t next_pgofs;
> > +
> > +   memset(, 0, sizeof(map));
> > +   map.m_lblk = cur_lblock;
> > +   map.m_len = bytes_to_blks(inode, len) - cur_lblock;
> 
> map.m_len = last_lblock - cur_lblock;
> 
> > +   map.m_next_pgofs = _pgofs;
> 
> map.m_next_pgofs = NULL;
> map.m_next_extent = NULL;
> 
> > +   map.m_seg_type = NO_CHECK_TYPE;
> 
> map.m_may_create = false;
> 
> > +
> > +   ret = f2fs_map_blocks(inode, , 0, F2FS_GET_BLOCK_FIEMAP);
> > +
> 
> Unneeded blank line.
> 
> > +   if (ret)
> > +   goto err_out;
> > +
> > +   /* hole */
> > +   if (!(map.m_flags & F2FS_MAP_FLAGS))
> 
> ret = -ENOENT;
> 
> > +   goto err_out;
> > +
> > +   pblock = map.m_pblk;
> > +   nr_pblocks = map.m_len;
> > +
> > +   if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
> > +   nr_pblocks & (blocks_per_sec - 1))
> 
> ret = -EINVAL;
> 
> > +   goto err_out;
> > +
> > +   cur_lblock += nr_pblocks;
> > +   }
> > +
> > +   return 0;
> > +err_out:
> > +   pr_err("swapon: swapfile isn't section-aligned\n");
> 
> We should show above message only after we fail in check condition:
> 
>   if ((pblock - main_blkaddr) & (blocks_per_sec - 1) ||
>   nr_pblocks & (blocks_per_sec - 1)) {
>   f2fs_err(sbi, "Swapfile does not align to section");
>   goto err_out;
>   }
> 
> And please use f2fs_{err,warn,info..} macro rather than pr_{err,warn,info..}.
> 
> Could you please fix above related issues in check_swap_activate_fast() as 
> well.
> 
> > +   return -EINVAL;
> 
> return ret;
> 
> > +}
> > +
> >   static int check_swap_activate_fast(struct swap_info_struct *sis,
> > struct file *swap_file, sector_t *span)
> >   {
> > struct address_space *mapping = swap_file->f_mapping;
> > struct inode *inode = mapping->host;
> > +   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > sector_t cur_lblock;
> > sector_t last_lblock;
> > sector_t pblock;
> > @@ -3793,6 +3845,7 @@ static int check_swap_activate_fast(struct 
> > swap_info_struct *sis,
> > sector_t highest_pblock = 0;
> > int nr_extents = 0;
> > unsigned long nr_pblocks;
> > +   unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec;
> 
> Ditto,
> 
> > u64 len;
> > int ret;
> > @@ -3827,6 +3880,13 @@ static int check_swap_activate_fast(struct 
> > swap_info_struct *sis,
> > pblock = map.m_pblk;
> > nr_pblocks = map.m_len;
> > +   if ((pblock - SM_I(sbi)->main_blkaddr) & (blocks_per_sec - 1) ||
> > +   nr_pblocks & (blocks_per_sec - 1)) {
> > +   pr_err("swapon: swapfile isn't section-aligned\n");
> 
> Ditto,
> 
> > +   ret = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > if (cur_lblock + nr_pblocks >= sis->max)
> > nr_pblocks = sis->max - cur_lblock;
> > @@ -3878,6 +3938,9 @@ static int check_swap_activate(struct 
> > swap_info_struct *sis,
> > if (PAGE_SIZE == F2FS_BLKSIZE)
> > return

Re: [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst

2021-02-27 Thread Jaegeuk Kim

On 02/04, Chao Yu wrote:
> Jaegeuk,
> 
> On 2021/2/2 16:00, Chao Yu wrote:
> > -   for (i = 0; i < dic->nr_cpages; i++) {
> > +   for (i = 0; i < cc->nr_cpages; i++) {
> > struct page *page = dic->cpages[i];
> 
> por_fsstress still hang in this line?

I'm stuck on testing the patches, since the latest kernel is panicking somehow.
Let me update later, once I can test a bit. :(

> 
> Thanks,
> 
> > block_t blkaddr;
> > struct bio_post_read_ctx *ctx;
> > @@ -2201,6 +2207,14 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, 
> > struct bio **bio_ret,
> > blkaddr = data_blkaddr(dn.inode, dn.node_page,
> > dn.ofs_in_node + i + 1);
> > +   f2fs_wait_on_block_writeback(inode, blkaddr);
> > +
> > +   if (f2fs_load_compressed_page(sbi, page, blkaddr)) {
> > +   if (atomic_dec_and_test(>remaining_pages))
> > +   f2fs_decompress_cluster(dic);
> > +   continue;
> > +   }
> > +

Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator

2021-02-27 Thread Jaegeuk Kim

On 02/23, Chao Yu wrote:
> Jaegeuk,
> 
> Could you please help to review this patch? since I doubt that this
> issue can happen in real world... :(

Let me take a look as soon as I have some time. Sorry for the delay.

> 
> Thanks,
> 
> On 2021/2/22 21:43, Chao Yu wrote:
> > Ping,
> > 
> > On 2021/2/20 17:40, Chao Yu wrote:
> > > In cp disabling mode, there could be a condition
> > > - target segment has 128 ckpt valid blocks
> > > - GC migrates 128 valid blocks to other segment (segment is still in
> > > dirty list)
> > > - GC migrates 384 blocks to target segment (segment has 128 cp_vblocks
> > > and 384 vblocks)
> > > - If GC selects target segment via {AT,}SSR allocator, however there is
> > > no free space in targe segment.
> > > 
> > > Fixes: 4354994f097d ("f2fs: checkpoint disabling")
> > > Fixes: 093749e296e2 ("f2fs: support age threshold based garbage 
> > > collection")
> > > Signed-off-by: Chao Yu 
> > > ---
> > >fs/f2fs/f2fs.h|  1 +
> > >fs/f2fs/gc.c  | 17 +
> > >fs/f2fs/segment.c | 20 
> > >3 files changed, 34 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index ed7807103c8e..9c753eff0814 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct 
> > > f2fs_sb_info *sbi);
> > >int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
> > >void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
> > >int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool 
> > > for_ra);
> > > +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno);
> > >void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi);
> > >void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi);
> > >void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi);
> > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > > index 86ba8ed0b8a7..a1d8062cdace 100644
> > > --- a/fs/f2fs/gc.c
> > > +++ b/fs/f2fs/gc.c
> > > @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info 
> > > *sbi,
> > >   if (p->gc_mode == GC_AT &&
> > >   get_valid_blocks(sbi, segno, true) == 0)
> > >   return;
> > > -
> > > - if (p->alloc_mode == AT_SSR &&
> > > - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0)
> > > - return;
> > >   }
> > >   for (i = 0; i < sbi->segs_per_sec; i++)
> > > @@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info 
> > > *sbi,
> > >   if (gc_type == BG_GC && test_bit(secno, 
> > > dirty_i->victim_secmap))
> > >   goto next;
> > > + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
> > > + /*
> > > +  * to avoid selecting candidate which has below valid
> > > +  * block distribution:
> > > +  * partial blocks are valid and all left ones are valid
> > > +  * in previous checkpoint.
> > > +  */
> > > + if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) {
> > > + if (!segment_has_free_slot(sbi, segno))
> > > + goto next;
> > > + }
> > > + }
> > > +
> > >   if (is_atgc) {
> > >   add_victim_entry(sbi, , segno);
> > >   goto next;
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index 2d5a82c4ca15..deaf57e13125 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct 
> > > f2fs_sb_info *sbi,
> > >   seg->next_blkoff++;
> > >}
> > > +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
> > > +{
> > > + struct sit_info *sit = SIT_I(sbi);
> > > + struct seg_entry *se = get_seg_entry(sbi, segno);
> > > + int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long);
> > > + unsigned long *target_map = SIT_I(sbi)->tmp_map;
> > > + unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
> > > + unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
> > > + int i, pos;
> > > +
> > > + down_write(>sentry_lock);
> > > + for (i = 0; i < entries; i++)
> > > + target_map[i] = ckpt_map[i] | cur_map[i];
> > > +
> > > + pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0);
> > > + up_write(>sentry_lock);
> > > +
> > > + return pos < sbi->blocks_per_seg;
> > > +}
> > > +
> > >/*
> > > * This function always allocates a used segment(from dirty seglist) 
> > > by SSR
> > > * manner, so it should recover the existing segment information of 
> > > valid blocks
> > > 
> > .
> >

Re: [f2fs-dev] [PATCH][next] f2fs: Replace one-element array with flexible-array member

2021-02-27 Thread Jaegeuk Kim

On 02/25, Chao Yu wrote:
> Hello, Gustavo,
> 
> On 2021/2/25 3:03, Gustavo A. R. Silva wrote:
> > There is a regular need in the kernel to provide a way to declare having
> > a dynamically sized set of trailing elements in a structure. Kernel code
> > should always use “flexible array members”[1] for these cases. The older
> > style of one-element or zero-length arrays should no longer be used[2].
> 
> I proposal to do the similar cleanup, and I've no objection on doing this.
> 
> https://lore.kernel.org/patchwork/patch/869440/
> 
> Let's ask for Jaegeuk' opinion.

Merged, thanks.
This looks better reason than code readability. :)

> 
> > 
> > Refactor the code according to the use of a flexible-array member in
> > struct f2fs_checkpoint, instead of a one-element arrays.
> > 
> > Notice that a temporary pointer to void '*tmp_ptr' was used in order to
> > fix the following errors when using a flexible array instead of a one
> > element array in struct f2fs_checkpoint:
> > 
> >CC [M]  fs/f2fs/dir.o
> > In file included from fs/f2fs/dir.c:13:
> > fs/f2fs/f2fs.h: In function ‘__bitmap_ptr’:
> > fs/f2fs/f2fs.h:2227:40: error: invalid use of flexible array member
> >   2227 |   return >sit_nat_version_bitmap + offset + sizeof(__le32);
> >|^
> > fs/f2fs/f2fs.h:2227:49: error: invalid use of flexible array member
> >   2227 |   return >sit_nat_version_bitmap + offset + sizeof(__le32);
> >| ^
> > fs/f2fs/f2fs.h:2238:40: error: invalid use of flexible array member
> >   2238 |   return >sit_nat_version_bitmap + offset;
> >|^
> > make[2]: *** [scripts/Makefile.build:287: fs/f2fs/dir.o] Error 1
> > make[1]: *** [scripts/Makefile.build:530: fs/f2fs] Error 2
> > make: *** [Makefile:1819: fs] Error 2
> > 
> > [1] https://en.wikipedia.org/wiki/Flexible_array_member
> > [2] 
> > https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays
> > 
> > Link: https://github.com/KSPP/linux/issues/79
> > Build-tested-by: kernel test robot 
> > Link: 
> > https://lore.kernel.org/lkml/603647e4.deefbl4eqljuwaue%25...@intel.com/
> > Signed-off-by: Gustavo A. R. Silva 
> > ---
> >   fs/f2fs/f2fs.h  | 5 +++--
> >   include/linux/f2fs_fs.h | 2 +-
> >   2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index e2d302ae3a46..3f5cb097c30f 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -2215,6 +2215,7 @@ static inline block_t __cp_payload(struct 
> > f2fs_sb_info *sbi)
> >   static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
> >   {
> > struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> > +   void *tmp_ptr = >sit_nat_version_bitmap;
> > int offset;
> > if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) {
> > @@ -2224,7 +2225,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
> > *sbi, int flag)
> >  * if large_nat_bitmap feature is enabled, leave checksum
> >  * protection for all nat/sit bitmaps.
> >  */
> > -   return >sit_nat_version_bitmap + offset + sizeof(__le32);
> > +   return tmp_ptr + offset + sizeof(__le32);
> > }
> > if (__cp_payload(sbi) > 0) {
> > @@ -2235,7 +2236,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info 
> > *sbi, int flag)
> > } else {
> > offset = (flag == NAT_BITMAP) ?
> > le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0;
> > -   return >sit_nat_version_bitmap + offset;
> > +   return tmp_ptr + offset;
> > }
> >   }
> > diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> > index c6cc0a566ef5..5487a80617a3 100644
> > --- a/include/linux/f2fs_fs.h
> > +++ b/include/linux/f2fs_fs.h
> > @@ -168,7 +168,7 @@ struct f2fs_checkpoint {
> > unsigned char alloc_type[MAX_ACTIVE_LOGS];
> > /* SIT and NAT version bitmap */
> > -   unsigned char sit_nat_version_bitmap[1];
> > +   unsigned char sit_nat_version_bitmap[];
> >   } __packed;
> >   #define CP_CHKSUM_OFFSET  4092/* default chksum offset in checkpoint 
> > */
> >

[GIT PULL] f2fs update for 5.12-rc1

2021-02-17 Thread Jaegeuk Kim

Hi Linus,

Could you please consider this pull request?

Thanks,

The following changes since commit 76c057c84d286140c6c416c3b4ba832cd1d8984e:

  Merge branch 'parisc-5.11-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux (2021-01-27 
11:06:15 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git 
tags/f2fs-for-5.12-rc1

for you to fetch changes up to 092af2eb180062f5bafe02a75da9856676eb4f89:

  Documentation: f2fs: fix typo s/automaic/automatic (2021-02-16 07:58:35 -0800)


f2fs-for-5.12-rc1

We've added two major features: 1) compression level and 2) checkpoint_merge, in
this round. 1) compression level expands 'compress_algorithm' mount option to
accept parameter as format of :, by this way, it gives a way
to allow user to do more specified config on lz4 and zstd compression level,
then f2fs compression can provide higher compress ratio. 2) checkpoint_merge
creates a kernel daemon and makes it to merge concurrent checkpoint requests as
much as possible to eliminate redundant checkpoint issues. Plus, we can
eliminate the sluggish issue caused by slow checkpoint operation when the
checkpoint is done in a process context in a cgroup having low i/o budget and
cpu shares.

Enhancement:
 - add compress level for lz4 and zstd in mount option
 - checkpoint_merge mount option
 - deprecate f2fs_trace_io

Bug fix:
 - flush data when enabling checkpoint back
 - handle corner cases of mount options
 - missing ACL update and lock for I_LINKABLE flag
 - attach FIEMAP_EXTENT_MERGED in f2fs_fiemap
 - fix potential deadlock in compression flow
 - fix wrong submit_io condition

As usual, we've cleaned up many code flows and fixed minor bugs.


Chao Yu (13):
  f2fs: enhance to update i_mode and acl atomically in f2fs_setattr()
  f2fs: enforce the immutable flag on open files
  f2fs: relocate f2fs_precache_extents()
  f2fs: compress: deny setting unsupported compress algorithm
  f2fs: compress: support compress level
  f2fs: introduce a new per-sb directory in sysfs
  f2fs: fix to tag FIEMAP_EXTENT_MERGED in f2fs_fiemap()
  f2fs: fix out-of-repair __setattr_copy()
  f2fs: trival cleanup in move_data_block()
  f2fs: fix to set/clear I_LINKABLE under i_lock
  f2fs: compress: fix potential deadlock
  f2fs: introduce sb_status sysfs node
  f2fs: relocate inline conversion from mmap() to mkwrite()

Chengguang Xu (1):
  f2fs: fix to use per-inode maxbytes

Daeho Jeong (3):
  f2fs: fix null page reference in redirty_blocks
  f2fs: introduce checkpoint_merge mount option
  f2fs: add ckpt_thread_ioprio sysfs node

Dehe Gu (1):
  f2fs: fix a wrong condition in __submit_bio

Ed Tsai (1):
  Documentation: f2fs: fix typo s/automaic/automatic

Eric Biggers (2):
  f2fs: clean up post-read processing
  libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()

Jack Qiu (1):
  f2fs: remove unused stat_{inc, dec}_atomic_write

Jaegeuk Kim (5):
  f2fs: handle unallocated section and zone on pinned/atgc
  f2fs: deprecate f2fs_trace_io
  f2fs: flush data when enabling checkpoint back
  f2fs: don't grab superblock freeze for flush/ckpt thread
  f2fs: give a warning only for readonly partition

Liu Song (1):
  f2fs: remove unnecessary initialization in xattr.c

Matthew Wilcox (Oracle) (1):
  f2fs: Remove readahead collision detection

Weichao Guo (1):
  f2fs: fix to set inode->i_mode correctly for posix_acl_update_mode

Yi Chen (1):
  f2fs: fix to avoid inconsistent quota data

Zheng Yongjun (1):
  f2fs: Replace expression with offsetof()

 Documentation/ABI/testing/sysfs-fs-f2fs |  32 +++
 Documentation/filesystems/f2fs.rst  |  18 +-
 fs/f2fs/Kconfig |  20 +-
 fs/f2fs/Makefile|   1 -
 fs/f2fs/acl.c   |  23 +-
 fs/f2fs/checkpoint.c| 177 +-
 fs/f2fs/compress.c  | 195 +++
 fs/f2fs/data.c  | 404 
 fs/f2fs/debug.c |  12 +
 fs/f2fs/f2fs.h  | 104 ++--
 fs/f2fs/file.c  |  57 +++--
 fs/f2fs/gc.c|   8 +-
 fs/f2fs/inline.c|   4 +
 fs/f2fs/namei.c |   8 +
 fs/f2fs/node.c  |   4 +-
 fs/f2fs/segment.c   |   7 -
 fs/f2fs/segment.h   |   4 +-
 fs/f2fs/super.c | 198 +---
 fs/f2fs/sysfs.c | 132 ++-
 fs/f2fs/trace.c | 165 -
 fs/f2fs/trace.h |  43 
 fs/f2fs/xatt

[PATCH] f2fs: give a warning only for readonly partition

2021-02-12 Thread Jaegeuk Kim

Let's allow mounting readonly partition. We're able to recovery later once we
have it as read-write back.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/super.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 4aa533cb4340..30d5abef4361 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3933,12 +3933,10 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 * previous checkpoint was not done by clean system shutdown.
 */
if (f2fs_hw_is_readonly(sbi)) {
-   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   err = -EROFS;
+   if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG))
f2fs_err(sbi, "Need to recover fsync data, but 
write access unavailable");
-   goto free_meta;
-   }
-   f2fs_info(sbi, "write access unavailable, skipping 
recovery");
+   else
+   f2fs_info(sbi, "write access unavailable, 
skipping recovery");
goto reset_checkpoint;
}
 
-- 
2.30.0.478.g8a0d178c01-goog

[PATCH] f2fs: don't grab superblock freeze for flush/ckpt thread

2021-02-08 Thread Jaegeuk Kim

There are controlled by f2fs_freeze().

This fixes xfstests/generic/068 which is stuck at

 task:f2fs_ckpt-252:3 state:D stack:0 pid: 5761 ppid: 2 flags:0x4000
 Call Trace:
  __schedule+0x44c/0x8a0
  schedule+0x4f/0xc0
  percpu_rwsem_wait+0xd8/0x140
  ? percpu_down_write+0xf0/0xf0
  __percpu_down_read+0x56/0x70
  issue_checkpoint_thread+0x12c/0x160 [f2fs]
  ? wait_woken+0x80/0x80
  kthread+0x114/0x150
  ? __checkpoint_and_complete_reqs+0x110/0x110 [f2fs]
  ? kthread_park+0x90/0x90
  ret_from_fork+0x22/0x30

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 4 
 fs/f2fs/segment.c| 4 
 fs/f2fs/super.c  | 4 
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 579b9c3603cc..174a0819ad96 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1763,13 +1763,9 @@ static int issue_checkpoint_thread(void *data)
if (kthread_should_stop())
return 0;
 
-   sb_start_intwrite(sbi->sb);
-
if (!llist_empty(>issue_list))
__checkpoint_and_complete_reqs(sbi);
 
-   sb_end_intwrite(sbi->sb);
-
wait_event_interruptible(*q,
kthread_should_stop() || !llist_empty(>issue_list));
goto repeat;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7d34f1cacdee..440634dfaa56 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -607,8 +607,6 @@ static int issue_flush_thread(void *data)
if (kthread_should_stop())
return 0;
 
-   sb_start_intwrite(sbi->sb);
-
if (!llist_empty(>issue_list)) {
struct flush_cmd *cmd, *next;
int ret;
@@ -629,8 +627,6 @@ static int issue_flush_thread(void *data)
fcc->dispatch_list = NULL;
}
 
-   sb_end_intwrite(sbi->sb);
-
wait_event_interruptible(*q,
kthread_should_stop() || !llist_empty(>issue_list));
goto repeat;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1000d21120ca..4aa533cb4340 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1472,6 +1472,10 @@ static int f2fs_freeze(struct super_block *sb)
/* must be clean, since sync_filesystem() was already called */
if (is_sbi_flag_set(F2FS_SB(sb), SBI_IS_DIRTY))
return -EINVAL;
+
+   /* ensure no checkpoint required */
+   if (!llist_empty(_SB(sb)->cprc_info.issue_list))
+   return -EINVAL;
return 0;
 }
 
-- 
2.30.0.478.g8a0d178c01-goog

Re: [f2fs-dev] [PATCH v2] f2fs: rename checkpoint=merge mount option to checkpoint_merge

2021-02-01 Thread Jaegeuk Kim

On 02/02, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> As checkpoint=merge comes in, mount option setting related to checkpoint
> had been mixed up and it became hard to understand. So, I separated
> this option from "checkpoint=" and made another mount option
> "checkpoint_merge" for this.

Thanks, merged to the original patch.

> 
> Signed-off-by: Daeho Jeong 
> ---
> v2: renamed "checkpoint=merge" to "checkpoint_merge"
> ---
>  Documentation/filesystems/f2fs.rst |  6 +++---
>  fs/f2fs/super.c| 26 ++
>  2 files changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index d0ead45dc706..475994ed8b15 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -247,9 +247,9 @@ checkpoint=%s[:%u[%]]  Set to "disable" to turn off 
> checkpointing. Set to "enabl
>hide up to all remaining free space. The actual space 
> that
>would be unusable can be viewed at 
> /sys/fs/f2fs//unusable
>This space is reclaimed once checkpoint=enable.
> -  Here is another option "merge", which creates a kernel 
> daemon
> -  and makes it to merge concurrent checkpoint requests 
> as much
> -  as possible to eliminate redundant checkpoint issues. 
> Plus,
> +checkpoint_merge  When checkpoint is enabled, this can be used to create 
> a kernel
> +  daemon and make it to merge concurrent checkpoint 
> requests as
> +  much as possible to eliminate redundant checkpoint 
> issues. Plus,
>we can eliminate the sluggish issue caused by slow 
> checkpoint
>operation when the checkpoint is done in a process 
> context in
>a cgroup having low i/o budget and cpu shares. To make 
> this
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 56696f6cfa86..d8603e6c4916 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -145,6 +145,7 @@ enum {
>   Opt_checkpoint_disable_cap_perc,
>   Opt_checkpoint_enable,
>   Opt_checkpoint_merge,
> + Opt_nocheckpoint_merge,
>   Opt_compress_algorithm,
>   Opt_compress_log_size,
>   Opt_compress_extension,
> @@ -215,7 +216,8 @@ static match_table_t f2fs_tokens = {
>   {Opt_checkpoint_disable_cap, "checkpoint=disable:%u"},
>   {Opt_checkpoint_disable_cap_perc, "checkpoint=disable:%u%%"},
>   {Opt_checkpoint_enable, "checkpoint=enable"},
> - {Opt_checkpoint_merge, "checkpoint=merge"},
> + {Opt_checkpoint_merge, "checkpoint_merge"},
> + {Opt_nocheckpoint_merge, "nocheckpoint_merge"},
>   {Opt_compress_algorithm, "compress_algorithm=%s"},
>   {Opt_compress_log_size, "compress_log_size=%u"},
>   {Opt_compress_extension, "compress_extension=%s"},
> @@ -946,6 +948,9 @@ static int parse_options(struct super_block *sb, char 
> *options, bool is_remount)
>   case Opt_checkpoint_merge:
>   set_opt(sbi, MERGE_CHECKPOINT);
>   break;
> + case Opt_nocheckpoint_merge:
> + clear_opt(sbi, MERGE_CHECKPOINT);
> + break;
>  #ifdef CONFIG_F2FS_FS_COMPRESSION
>   case Opt_compress_algorithm:
>   if (!f2fs_sb_has_compression(sbi)) {
> @@ -1142,12 +1147,6 @@ static int parse_options(struct super_block *sb, char 
> *options, bool is_remount)
>   return -EINVAL;
>   }
>  
> - if (test_opt(sbi, DISABLE_CHECKPOINT) &&
> - test_opt(sbi, MERGE_CHECKPOINT)) {
> - f2fs_err(sbi, "checkpoint=merge cannot be used with 
> checkpoint=disable\n");
> - return -EINVAL;
> - }
> -
>   /* Not pass down write hints if the number of active logs is lesser
>* than NR_CURSEG_PERSIST_TYPE.
>*/
> @@ -1782,7 +1781,7 @@ static int f2fs_show_options(struct seq_file *seq, 
> struct dentry *root)
>   seq_printf(seq, ",checkpoint=disable:%u",
>   F2FS_OPTION(sbi).unusable_cap);
>   if (test_opt(sbi, MERGE_CHECKPOINT))
> - seq_puts(seq, ",checkpoint=merge");
> + seq_puts(seq, ",checkpoint_merge");
>   if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_POSIX)
>   seq_printf(seq, ",fsync_mode=%s", "posix");
>   else if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT)
> @@ -1827,6 +1826,7 @@ static void default_options(struct f2fs_sb_info *sbi)
>   sbi->sb->s_flags |= SB_LAZYTIME;
>   set_opt(sbi, FLUSH_MERGE);
>   set_opt(sbi, DISCARD);
> + clear_opt(sbi, MERGE_CHECKPOINT);
>   if (f2fs_sb_has_blkzoned(sbi))
>   F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;
>   else
> @@ -2066,9 +2066,8 @@ static int f2fs_remount(struct super_block *sb, int 
> *flags,

Re: [PATCH] f2fs: prevent setting ioprio of thread not in merge mode

2021-02-01 Thread Jaegeuk Kim

Thanks.
Merged into the original patch.

On 02/01, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> It causes a crash to change the ioprio of checkpoint thread not in
> checkpoint=merge. I fixed that to prevent setting the ioprio of the
> thread when checkpoint=merge is not enabled.
> 
> Signed-off-by: Daeho Jeong 
> ---
>  fs/f2fs/sysfs.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> index 100608bcd517..e38a7f6921dd 100644
> --- a/fs/f2fs/sysfs.c
> +++ b/fs/f2fs/sysfs.c
> @@ -357,8 +357,12 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
>   return -EINVAL;
>  
>   cprc->ckpt_thread_ioprio = IOPRIO_PRIO_VALUE(class, data);
> - ret = set_task_ioprio(cprc->f2fs_issue_ckpt,
> - cprc->ckpt_thread_ioprio);
> + if (test_opt(sbi, MERGE_CHECKPOINT)) {
> + ret = set_task_ioprio(cprc->f2fs_issue_ckpt,
> + cprc->ckpt_thread_ioprio);
> + if (ret)
> + return ret;
> + }
>  
>   return count;
>   }
> -- 
> 2.30.0.365.g02bc693789-goog

Re: [f2fs-dev] [PATCH] f2fs: fix checkpoint mount option wrong combination

2021-02-01 Thread Jaegeuk Kim

On 02/01, Daeho Jeong wrote:
> Actually, I think we need to select one among them, disable, enable
> and merge. I realized my previous understanding about that was wrong.
> In that case of "checkpoint=merge,checkpoint=enable", the last option
> will override the ones before that.
> This is how the other mount options like fsync_mode, whint_mode and etc.
> So, the answer will be "checkpoint=enable". What do you think?

We need to clarify a bit more. :)

mount checkpoint=disable,checkpoint=merge
remount checkpoint=enable,checkpoint=merge

Then, is it going to enable checkpoint with a thread?

> 
> 
> 
> 2021년 2월 1일 (월) 오후 9:40, Chao Yu 님이 작성:
> >
> > On 2021/2/1 8:06, Daeho Jeong wrote:
> > > From: Daeho Jeong 
> > >
> > > As checkpoint=merge comes in, mount option setting related to
> > > checkpoint had been mixed up. Fixed it.
> > >
> > > Signed-off-by: Daeho Jeong 
> > > ---
> > >   fs/f2fs/super.c | 11 +--
> > >   1 file changed, 5 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > > index 56696f6cfa86..8231c888c772 100644
> > > --- a/fs/f2fs/super.c
> > > +++ b/fs/f2fs/super.c
> > > @@ -930,20 +930,25 @@ static int parse_options(struct super_block *sb, 
> > > char *options, bool is_remount)
> > >   return -EINVAL;
> > >   F2FS_OPTION(sbi).unusable_cap_perc = arg;
> > >   set_opt(sbi, DISABLE_CHECKPOINT);
> > > + clear_opt(sbi, MERGE_CHECKPOINT);
> > >   break;
> > >   case Opt_checkpoint_disable_cap:
> > >   if (args->from && match_int(args, ))
> > >   return -EINVAL;
> > >   F2FS_OPTION(sbi).unusable_cap = arg;
> > >   set_opt(sbi, DISABLE_CHECKPOINT);
> > > + clear_opt(sbi, MERGE_CHECKPOINT);
> > >   break;
> > >   case Opt_checkpoint_disable:
> > >   set_opt(sbi, DISABLE_CHECKPOINT);
> > > + clear_opt(sbi, MERGE_CHECKPOINT);
> > >   break;
> > >   case Opt_checkpoint_enable:
> > >   clear_opt(sbi, DISABLE_CHECKPOINT);
> > > + clear_opt(sbi, MERGE_CHECKPOINT);
> >
> > What if: -o checkpoint=merge,checkpoint=enable
> >
> > Can you please explain the rule of merge/disable/enable combination and 
> > their
> > result? e.g.
> > checkpoint=merge,checkpoint=enable
> > checkpoint=enable,checkpoint=merge
> > checkpoint=merge,checkpoint=disable
> > checkpoint=disable,checkpoint=merge
> >
> > If the rule/result is clear, it should be documented.
> >
> > Thanks,
> >
> >
> > >   break;
> > >   case Opt_checkpoint_merge:
> > > + clear_opt(sbi, DISABLE_CHECKPOINT);
> > >   set_opt(sbi, MERGE_CHECKPOINT);
> > >   break;
> > >   #ifdef CONFIG_F2FS_FS_COMPRESSION
> > > @@ -1142,12 +1147,6 @@ static int parse_options(struct super_block *sb, 
> > > char *options, bool is_remount)
> > >   return -EINVAL;
> > >   }
> > >
> > > - if (test_opt(sbi, DISABLE_CHECKPOINT) &&
> > > - test_opt(sbi, MERGE_CHECKPOINT)) {
> > > - f2fs_err(sbi, "checkpoint=merge cannot be used with 
> > > checkpoint=disable\n");
> > > - return -EINVAL;
> > > - }
> > > -
> > >   /* Not pass down write hints if the number of active logs is lesser
> > >* than NR_CURSEG_PERSIST_TYPE.
> > >*/
> > >
> 
> 
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-28 Thread Jaegeuk Kim

On 01/20, Chao Yu wrote:
> On 2021/1/20 3:06, Jaegeuk Kim wrote:
> > On 01/15, Chao Yu wrote:
> > > On 2021/1/15 5:53, Jaegeuk Kim wrote:
> > > > On 12/30, Chao Yu wrote:
> > > > > ThreadA   ThreadB
> > > > > - f2fs_ioc_start_atomic_write
> > > > > - write
> > > > > - f2fs_ioc_commit_atomic_write
> > > > >- f2fs_commit_inmem_pages
> > > > >- f2fs_drop_inmem_pages
> > > > >- f2fs_drop_inmem_pages
> > > > > - __revoke_inmem_pages
> > > > >   - f2fs_vm_page_mkwrite
> > > > >- set_page_dirty
> > > > > - tag ATOMIC_WRITTEN_PAGE and 
> > > > > add page
> > > > >   to inmem_pages list
> > > > > - clear_inode_flag(FI_ATOMIC_FILE)
> > > > >   - f2fs_vm_page_mkwrite
> > > > > - set_page_dirty
> > > > >  - f2fs_update_dirty_page
> > > > >   - f2fs_trace_pid
> > > > >- tag inmem page private 
> > > > > to pid
> > > > 
> > > > Hmm, how about removing fs/f2fs/trace.c to make private more complicated
> > > > like this? I think we can get IO traces from tracepoints.
> > > 
> > > Hmm, actually, there is are issues, one is the trace IO, the other is the
> > > race issue (atomic_start,commit,drop vs mkwrite) which can make isolation
> > > semantics of transaction be broken.
> > > 
> > > Or can we avoid atomic file racing with file mmap?
> 
> Otherwise I think we should add i_mmap_sem to avoid the race.
> 
> > 
> > No, we can't. We may need to find other way to check the race. :)
> 
> Well, any thoughts about this issue?
> 
> Thanks,
> 
> > 
> > > 
> > > - atomic_start- file_mmap
> > >- inode_lock
> > >- if (FI_ATOMIC_FILE) return
> > >   - inode_lock
> > >   - if (FI_MMAP_FILE) return
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > >   - truncate
> > > > >- f2fs_invalidate_page
> > > > >- set page->mapping to NULL
> > > > > then it will cause panic once 
> > > > > we
> > > > > access page->mapping

Are we hitting this, since page was referenced by in-mem list?

> > > > > 
> > > > > The root cause is we missed to keep isolation of atomic write in the 
> > > > > case
> > > > > of commit_atomic_write vs mkwrite, let commit_atomic_write helds 
> > > > > i_mmap_sem
> > > > > lock to avoid this issue.
> > > > > 
> > > > > Signed-off-by: Chao Yu 
> > > > > ---
> > > > > v2:
> > > > > - use i_mmap_sem to avoid mkwrite racing with below flows:
> > > > >* f2fs_ioc_start_atomic_write
> > > > >* f2fs_drop_inmem_pages
> > > > >* f2fs_commit_inmem_pages
> > > > > 
> > > > >fs/f2fs/file.c| 3 +++
> > > > >fs/f2fs/segment.c | 7 +++
> > > > >2 files changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index 4e6d4b9120a8..a48ec650d691 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct 
> > > > > file *filp)
> > > > >   goto out;
> > > > >   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > > > > + down_write(_I(inode)->i_mmap_sem);
> > > > >   /*
> > > > >* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > > > > @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct 
> > > > > file *filp)
> > > > >

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-28 Thread Jaegeuk Kim

On 01/28, Chao Yu wrote:
> On 2021/1/22 10:17, Chao Yu wrote:
> > > No, it seems this is not the case.
> > Oops, could you please help to remove all below codes and do the test again
> > to check whether they are the buggy codes? as I doubt there is 
> > use-after-free
> > bug.
> 
> Any test result? :)

It seems I don't see the errors anymore. Will you post another version?

> 
> Thanks,

[PATCH] f2fs: flush data when enabling checkpoint back

2021-01-26 Thread Jaegeuk Kim

During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as
sync and fsync. So, when enabling it back, we must flush all of them in order
to keep the data persistent. Otherwise, suddern power-cut right after enabling
checkpoint will cause data loss.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: sta...@vger.kernel.org
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/super.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 359cc5a2f8f5..073b51af62c8 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1887,6 +1887,9 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info 
*sbi)
 
 static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi)
 {
+   /* we should flush all the data to keep data consistency */
+   sync_inodes_sb(sbi->sb);
+
down_write(>gc_lock);
f2fs_dirty_to_prefree(sbi);
 
-- 
2.30.0.280.ga3ce27912f-goog

Re: [f2fs-dev] [PATCH v5 1/2] f2fs: introduce checkpoint=merge mount option

2021-01-21 Thread Jaegeuk Kim

On 01/21, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> We've added a new mount option "checkpoint=merge", which creates a
> kernel daemon and makes it to merge concurrent checkpoint requests as
> much as possible to eliminate redundant checkpoint issues. Plus, we
> can eliminate the sluggish issue caused by slow checkpoint operation
> when the checkpoint is done in a process context in a cgroup having
> low i/o budget and cpu shares. To make this do better, we set the
> default i/o priority of the kernel daemon to "3", to give one higher
> priority than other kernel threads. The below verification result
> explains this.
> The basic idea has come from https://opensource.samsung.com.
> 
> [Verification]
> Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
> Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
> Set "strict_guarantees" to "1" in BFQ tunables
> 
> In "fg" cgroup,
> - thread A => trigger 1000 checkpoint operations
>   "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
>done"
> - thread B => gererating async. I/O
>   "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
>--filename=test_img --name=test"
> 
> In "bg" cgroup,
> - thread C => trigger repeated checkpoint operations
>   "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
>fsync test_dir2; done"
> 
> We've measured thread A's execution time.
> 
> [ w/o patch ]
> Elapsed Time: Avg. 68 seconds
> [ w/  patch ]
> Elapsed Time: Avg. 48 seconds
> 
> Signed-off-by: Daeho Jeong 
> Signed-off-by: Sungjong Seo 
> ---
> v2:
> - inlined ckpt_req_control into f2fs_sb_info and collected stastics
>   of checkpoint merge operations
> v3:
> - fixed some minor errors and cleaned up f2fs_sync_fs()
> v4:
> - added an explanation to raise the default i/o priority of the
>   checkpoint daemon
> ---
>  Documentation/filesystems/f2fs.rst |  10 ++
>  fs/f2fs/checkpoint.c   | 177 +
>  fs/f2fs/debug.c|  12 ++
>  fs/f2fs/f2fs.h |  27 +
>  fs/f2fs/super.c|  55 +++--
>  5 files changed, 273 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index dae15c96e659..9624a0be0364 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -247,6 +247,16 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off 
> checkpointing. Set to "enabl
>hide up to all remaining free space. The actual space 
> that
>would be unusable can be viewed at 
> /sys/fs/f2fs//unusable
>This space is reclaimed once checkpoint=enable.
> +  Here is another option "merge", which creates a kernel 
> daemon
> +  and makes it to merge concurrent checkpoint requests 
> as much
> +  as possible to eliminate redundant checkpoint issues. 
> Plus,
> +  we can eliminate the sluggish issue caused by slow 
> checkpoint
> +  operation when the checkpoint is done in a process 
> context in
> +  a cgroup having low i/o budget and cpu shares. To make 
> this
> +  do better, we set the default i/o priority of the 
> kernel daemon
> +  to "3", to give one higher priority than other kernel 
> threads.
> +  This is the same way to give a I/O priority to the jbd2
> +  journaling thread of ext4 filesystem.
>  compress_algorithm=%s Control compress algorithm, currently f2fs 
> supports "lzo",
>"lz4", "zstd" and "lzo-rle" algorithm.
>  compress_log_size=%u  Support configuring compress cluster size, the size 
> will
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 897edb7c951a..ef6ad3d1957d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> @@ -20,6 +21,8 @@
>  #include "trace.h"
>  #include 
>  
> +#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
> +
>  static struct kmem_cache *ino_entry_slab;
>  struct kmem_cache *f2fs_inode_entry_slab;
>  
> @@ -1707,3 +1710,177 @@ void f2fs_destroy_checkpoint_caches(void)
>   kmem_cache_destroy(ino_entry_slab);
>   kmem_cache_destroy(f2fs_inode_entry_slab);
>  }
> +
> +static int __write_checkpoint_sync(struct f2fs_sb_info *sbi)
> +{
> + struct cp_control cpc = { .reason = CP_SYNC, };
> + int err;
> +
> + down_write(>gc_lock);
> + err = f2fs_write_checkpoint(sbi, );
> + up_write(>gc_lock);
> +
> + return err;
> +}
> +
> +static void __checkpoint_and_complete_reqs(struct f2fs_sb_info *sbi)
> +{
> + struct ckpt_req_control *cprc = >cprc_info;
> + struct ckpt_req *req, *next;
> + struct

Re: [PATCH] scsi: ufs: Fix some problems in task management request implementation

2021-01-20 Thread Jaegeuk Kim

On 01/20, Can Guo wrote:
> Current task management request send/compl implementation is broken, the
> problems and fixes are listed as below:
> 
> Problem: TMR completion timeout. ufshcd_tmc_handler() calls
>  blk_mq_tagset_busy_iter(fn == ufshcd_compl_tm()), but since
>  blk_mq_tagset_busy_iter() only iterates over all reserved tags and
>  started requests, so ufshcd_compl_tm() never gets a chance to run.
> Fix: Call blk_mq_start_request() in __ufshcd_issue_tm_cmd().
> 
> Problem: Race condition in send/compl paths. ufshcd_compl_tm() looks for
>  all 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL and call complete()
>  for each req who has the req->end_io_data set. There can be a race
>  condition btw tmc send/compl, because req->end_io_data is set, in
>  __ufshcd_issue_tm_cmd(), without host lock protection, so it is
>  possible that when ufshcd_compl_tm() checks the req->end_io_data,
>  req->end_io_data is set but the corresponding tag has not been set
>  in the REG_UTP_TASK_REQ_DOOR_BELL. Thus, ufshcd_tmc_handler() may
>  wrongly complete TMRs which have not been sent.
> Fix: Protect req->end_io_data with host lock. And let ufshcd_compl_tm()
>  only handle those tm cmds which have been completed instead of
>  looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.
> 
> Problem: In __ufshcd_issue_tm_cmd(), it is not right to use hba->nutrs +
>  req->tag as the Task Tag in one TMR UPIU.
> Fix: Directly use req->tag as Task Tag.
> 
> Cc: Jaegeuk Kim 
> Signed-off-by: Can Guo 
> 
> ---
> 
> This change is based on Jaegeuk's change - 
> https://git.kernel.org/mkp/scsi/c/eeb1b55b6e25
> 
> ---
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index fb07e3a..44d09509 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6252,7 +6252,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba 
> *hba)
>  
>  struct ctm_info {
>   struct ufs_hba  *hba;
> - unsigned long   pending;
> + unsigned long   completed;
>   unsigned intncpl;
>  };
>  
> @@ -6261,13 +6261,13 @@ static bool ufshcd_compl_tm(struct request *req, void 
> *priv, bool reserved)
>   struct ctm_info *const ci = priv;
>   struct completion *c;
>  
> - WARN_ON_ONCE(reserved);
> - if (test_bit(req->tag, >pending))
> - return true;
> - ci->ncpl++;
> - c = req->end_io_data;
> - if (c)
> - complete(c);
> + if (test_bit(req->tag, >completed)) {
> + __clear_bit(req->tag, >outstanding_tasks);

Is the below fixed in -next?

__clear_bit(req->tag, >hba->outstanding_tasks);

> + ci->ncpl++;
> + c = req->end_io_data;
> + if (c)
> + complete(c);
> + }
>   return true;
>  }
>  
> @@ -6283,11 +6283,19 @@ static irqreturn_t ufshcd_tmc_handler(struct ufs_hba 
> *hba)
>  {
>   struct request_queue *q = hba->tmf_queue;
>   struct ctm_info ci = {
> - .hba = hba,
> - .pending = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL),
> + .hba = hba,
> + .ncpl = 0,
>   };
> + u32 tm_doorbell;
> + unsigned long completed;
> +
> + tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
> + completed = tm_doorbell ^ hba->outstanding_tasks;
>  
> - blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, );
> + if (completed) {
> + ci.completed = completed;
> + blk_mq_tagset_busy_iter(q->tag_set, ufshcd_compl_tm, );
> + }
>   return ci.ncpl ? IRQ_HANDLED : IRQ_NONE;
>  }
>  
> @@ -6405,37 +6413,33 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   DECLARE_COMPLETION_ONSTACK(wait);
>   struct request *req;
>   unsigned long flags;
> - int free_slot, task_tag, err;
> + int task_tag, err;
>  
>   /*
> -  * Get free slot, sleep if slots are unavailable.
> -  * Even though we use wait_event() which sleeps indefinitely,
> -  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> +  * blk_get_request() used here is only to get a free tag.
>*/
>   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
>   if (IS_ERR(req))
>   return PTR_ERR(req);
>  
> - req->end_io_data = 
> - free_slot = req->tag;
> - WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
>   ufshcd_hold(hba, false);
> -
>   spi

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-19 Thread Jaegeuk Kim

On 01/16, Chao Yu wrote:
> On 2021/1/15 22:59, Jaegeuk Kim wrote:
> > On 01/15, Chao Yu wrote:
> > > On 2021/1/14 12:06, Jaegeuk Kim wrote:
> > > > On 01/14, Chao Yu wrote:
> > > > > On 2021/1/13 23:41, Jaegeuk Kim wrote:
> > > > > > [58690.961685] F2FS-fs (vdb) : inject page get in 
> > > > > > f2fs_pagecache_get_page of f2fs_quota_write+0x150/0x1f0 [f2fs]
> > > > > > [58691.071481] F2FS-fs (vdb): Inconsistent error blkaddr:31058, sit 
> > > > > > bitmap:0
> > > > > > [58691.077338] [ cut here ]
> > > > > > [58691.081461] WARNING: CPU: 5 PID: 8308 at 
> > > > > > fs/f2fs/checkpoint.c:151 f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > > > > > [58691.086734] Modules linked in: f2fs(O) quota_v2 quota_tree 
> > > > > > dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev 
> > > > > > intel_rapl_msr intel_rapl_common sb_edac kvm_intel kvm irqbypass 
> > > > > > joydev parport_pc parport input_leds serio_raw mac_hid qemu_fw_cfg 
> > > > > > sch_fq_codel ip_tables x_tables autofs4 btrfs blake2b_generic 
> > > > > > raid10 raid456 async_raid6_recov async_memcpy asy
> > > > > > [58691.120632] CPU: 5 PID: 8308 Comm: kworker/u17:5 Tainted: G  
> > > > > > DO  5.11.0-rc3-custom #1
> > > > > > [58691.125438] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> > > > > > 1996), BIOS 1.14.0-1 04/01/2014
> > > > > > [58691.129625] Workqueue: f2fs_post_read_wq f2fs_post_read_work 
> > > > > > [f2fs]
> > > > > > [58691.133142] RIP: 0010:f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > > > > > [58691.136221] Code: 3c 07 b8 01 00 00 00 d3 e0 21 f8 75 57 83 fa 
> > > > > > 07 75 52 89 f2 31 c9 48 c7 c6 20 6a a7 c0 48 89 df e8 bc d6 03 00 
> > > > > > f0 80 4b 48 04 <0f> 0b 31 c0 e9 5e fe ff ff 48 8b 57 10 8b 42 30 d3 
> > > > > > e0 03 42 48 39
> > > > > > [58691.143142] RSP: 0018:b429047afd40 EFLAGS: 00010206
> > > > > > [58691.145639] RAX:  RBX: 9c3b84041000 RCX: 
> > > > > > 
> > > > > > [58691.148899] RDX:  RSI: 9c3bbbd58940 RDI: 
> > > > > > 9c3bbbd58940
> > > > > > [58691.152130] RBP: b429047afd48 R08: 9c3bbbd58940 R09: 
> > > > > > b429047afaa8
> > > > > > [58691.155266] R10: 001ba090 R11: 0003 R12: 
> > > > > > 7952
> > > > > > [58691.158304] R13: f5cc81266ac0 R14: 00db R15: 
> > > > > > 
> > > > > > [58691.161160] FS:  () 
> > > > > > GS:9c3bbbd4() knlGS:
> > > > > > [58691.164286] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > [58691.166869] CR2: 7f0fee9d3000 CR3: 5ee76001 CR4: 
> > > > > > 00370ee0
> > > > > > [58691.169714] DR0:  DR1:  DR2: 
> > > > > > 
> > > > > > [58691.173102] DR3:  DR6: fffe0ff0 DR7: 
> > > > > > 0400
> > > > > > [58691.176163] Call Trace:
> > > > > > [58691.177948]  f2fs_cache_compressed_page+0x69/0x280 [f2fs]
> > > > > > [58691.180549]  ? newidle_balance+0x253/0x3d0
> > > > > > [58691.183238]  f2fs_end_read_compressed_page+0x5a/0x70 [f2fs]
> > > > > > [58691.188205]  f2fs_post_read_work+0x11d/0x120 [f2fs]
> > > > > > [58691.192489]  process_one_work+0x221/0x3a0
> > > > > > [58691.194482]  worker_thread+0x4d/0x3f0
> > > > > > [58691.198867]  kthread+0x114/0x150
> > > > > > [58691.202243]  ? process_one_work+0x3a0/0x3a0
> > > > > > [58691.205367]  ? kthread_park+0x90/0x90
> > > > > > [58691.208244]  ret_from_fork+0x22/0x30
> > > > > 
> > > > > Below patch fixes two issues, I expect this can fix above warning at 
> > > > > least.
> > > > 
> > > > [106115.591837] general protection fault, probably for non-canonical 
> > > > address 0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
> > > > [106115.595584] CPU: 3 PID: 10109 Comm: fsstress Tainted: G   O

Re: [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-19 Thread Jaegeuk Kim

On 01/15, Chao Yu wrote:
> On 2021/1/15 5:53, Jaegeuk Kim wrote:
> > On 12/30, Chao Yu wrote:
> > > ThreadA   ThreadB
> > > - f2fs_ioc_start_atomic_write
> > > - write
> > > - f2fs_ioc_commit_atomic_write
> > >   - f2fs_commit_inmem_pages
> > >   - f2fs_drop_inmem_pages
> > >   - f2fs_drop_inmem_pages
> > >- __revoke_inmem_pages
> > >   - f2fs_vm_page_mkwrite
> > >- set_page_dirty
> > > - tag ATOMIC_WRITTEN_PAGE and add page
> > >   to inmem_pages list
> > >- clear_inode_flag(FI_ATOMIC_FILE)
> > >   - f2fs_vm_page_mkwrite
> > > - set_page_dirty
> > >  - f2fs_update_dirty_page
> > >   - f2fs_trace_pid
> > >- tag inmem page private to pid
> > 
> > Hmm, how about removing fs/f2fs/trace.c to make private more complicated
> > like this? I think we can get IO traces from tracepoints.
> 
> Hmm, actually, there is are issues, one is the trace IO, the other is the
> race issue (atomic_start,commit,drop vs mkwrite) which can make isolation
> semantics of transaction be broken.
> 
> Or can we avoid atomic file racing with file mmap?

No, we can't. We may need to find other way to check the race. :)

> 
> - atomic_start- file_mmap
>- inode_lock
>- if (FI_ATOMIC_FILE) return
>  - inode_lock
>  - if (FI_MMAP_FILE) return
> 
> Thanks,
> 
> > 
> > >   - truncate
> > >- f2fs_invalidate_page
> > >- set page->mapping to NULL
> > > then it will cause panic once we
> > > access page->mapping
> > > 
> > > The root cause is we missed to keep isolation of atomic write in the case
> > > of commit_atomic_write vs mkwrite, let commit_atomic_write helds 
> > > i_mmap_sem
> > > lock to avoid this issue.
> > > 
> > > Signed-off-by: Chao Yu 
> > > ---
> > > v2:
> > > - use i_mmap_sem to avoid mkwrite racing with below flows:
> > >   * f2fs_ioc_start_atomic_write
> > >   * f2fs_drop_inmem_pages
> > >   * f2fs_commit_inmem_pages
> > > 
> > >   fs/f2fs/file.c| 3 +++
> > >   fs/f2fs/segment.c | 7 +++
> > >   2 files changed, 10 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 4e6d4b9120a8..a48ec650d691 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   goto out;
> > >   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > > + down_write(_I(inode)->i_mmap_sem);
> > >   /*
> > >* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > > @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > > inode->i_ino, get_dirty_pages(inode));
> > >   ret = filemap_write_and_wait_range(inode->i_mapping, 0, 
> > > LLONG_MAX);
> > >   if (ret) {
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >   goto out;
> > >   }
> > > @@ -2073,6 +2075,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   /* add inode in inmem_list first and set atomic_file */
> > >   set_inode_flag(inode, FI_ATOMIC_FILE);
> > >   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index d8570b0359f5..dab870d9faf6 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -327,6 +327,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > >   s

Re: [f2fs-dev] [PATCH v4 1/2] f2fs: introduce checkpoint=merge mount option

2021-01-19 Thread Jaegeuk Kim

Is there v4 2/2?

On 01/19, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> We've added a new mount option "checkpoint=merge", which creates a
> kernel daemon and makes it to merge concurrent checkpoint requests as
> much as possible to eliminate redundant checkpoint issues. Plus, we
> can eliminate the sluggish issue caused by slow checkpoint operation
> when the checkpoint is done in a process context in a cgroup having
> low i/o budget and cpu shares. To make this do better, we set the
> default i/o priority of the kernel daemon to "3", to give one higher
> priority than other kernel threads. The below verification result
> explains this.
> The basic idea has come from https://opensource.samsung.com.
> 
> [Verification]
> Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
> Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
> Set "strict_guarantees" to "1" in BFQ tunables
> 
> In "fg" cgroup,
> - thread A => trigger 1000 checkpoint operations
>   "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
>done"
> - thread B => gererating async. I/O
>   "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
>--filename=test_img --name=test"
> 
> In "bg" cgroup,
> - thread C => trigger repeated checkpoint operations
>   "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
>fsync test_dir2; done"
> 
> We've measured thread A's execution time.
> 
> [ w/o patch ]
> Elapsed Time: Avg. 68 seconds
> [ w/  patch ]
> Elapsed Time: Avg. 48 seconds
> 
> Signed-off-by: Daeho Jeong 
> Signed-off-by: Sungjong Seo 
> ---
> v2:
> - inlined ckpt_req_control into f2fs_sb_info and collected stastics
>   of checkpoint merge operations
> v3:
> - fixed some minor errors and cleaned up f2fs_sync_fs()
> v4:
> - added an explanation to raise the default i/o priority of the
>   checkpoint daemon
> ---
>  Documentation/filesystems/f2fs.rst |  10 ++
>  fs/f2fs/checkpoint.c   | 177 +
>  fs/f2fs/debug.c|  12 ++
>  fs/f2fs/f2fs.h |  27 +
>  fs/f2fs/super.c|  55 +++--
>  5 files changed, 273 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index dae15c96e659..9624a0be0364 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -247,6 +247,16 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off 
> checkpointing. Set to "enabl
>hide up to all remaining free space. The actual space 
> that
>would be unusable can be viewed at 
> /sys/fs/f2fs//unusable
>This space is reclaimed once checkpoint=enable.
> +  Here is another option "merge", which creates a kernel 
> daemon
> +  and makes it to merge concurrent checkpoint requests 
> as much
> +  as possible to eliminate redundant checkpoint issues. 
> Plus,
> +  we can eliminate the sluggish issue caused by slow 
> checkpoint
> +  operation when the checkpoint is done in a process 
> context in
> +  a cgroup having low i/o budget and cpu shares. To make 
> this
> +  do better, we set the default i/o priority of the 
> kernel daemon
> +  to "3", to give one higher priority than other kernel 
> threads.
> +  This is the same way to give a I/O priority to the jbd2
> +  journaling thread of ext4 filesystem.
>  compress_algorithm=%s Control compress algorithm, currently f2fs 
> supports "lzo",
>"lz4", "zstd" and "lzo-rle" algorithm.
>  compress_log_size=%u  Support configuring compress cluster size, the size 
> will
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 897edb7c951a..ef6ad3d1957d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> @@ -20,6 +21,8 @@
>  #include "trace.h"
>  #include 
>  
> +#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
> +
>  static struct kmem_cache *ino_entry_slab;
>  struct kmem_cache *f2fs_inode_entry_slab;
>  
> @@ -1707,3 +1710,177 @@ void f2fs_destroy_checkpoint_caches(void)
>   kmem_cache_destroy(ino_entry_slab);
>   kmem_cache_destroy(f2fs_inode_entry_slab);
>  }
> +
> +static int __write_checkpoint_sync(struct f2fs_sb_info *sbi)
> +{
> + struct cp_control cpc = { .reason = CP_SYNC, };
> + int err;
> +
> + down_write(>gc_lock);
> + err = f2fs_write_checkpoint(sbi, );
> + up_write(>gc_lock);
> +
> + return err;
> +}
> +
> +static void __checkpoint_and_complete_reqs(struct f2fs_sb_info *sbi)
> +{
> + struct ckpt_req_control *cprc = >cprc_info;
> + struct ckpt_req *req,

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-15 Thread Jaegeuk Kim

On 01/15, Chao Yu wrote:
> On 2021/1/14 12:06, Jaegeuk Kim wrote:
> > On 01/14, Chao Yu wrote:
> > > On 2021/1/13 23:41, Jaegeuk Kim wrote:
> > > > [58690.961685] F2FS-fs (vdb) : inject page get in 
> > > > f2fs_pagecache_get_page of f2fs_quota_write+0x150/0x1f0 [f2fs]
> > > > [58691.071481] F2FS-fs (vdb): Inconsistent error blkaddr:31058, sit 
> > > > bitmap:0
> > > > [58691.077338] [ cut here ]
> > > > [58691.081461] WARNING: CPU: 5 PID: 8308 at fs/f2fs/checkpoint.c:151 
> > > > f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > > > [58691.086734] Modules linked in: f2fs(O) quota_v2 quota_tree 
> > > > dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev intel_rapl_msr 
> > > > intel_rapl_common sb_edac kvm_intel kvm irqbypass joydev parport_pc 
> > > > parport input_leds serio_raw mac_hid qemu_fw_cfg sch_fq_codel ip_tables 
> > > > x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
> > > > async_memcpy asy
> > > > [58691.120632] CPU: 5 PID: 8308 Comm: kworker/u17:5 Tainted: G  D   
> > > >  O  5.11.0-rc3-custom #1
> > > > [58691.125438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS 1.14.0-1 04/01/2014
> > > > [58691.129625] Workqueue: f2fs_post_read_wq f2fs_post_read_work [f2fs]
> > > > [58691.133142] RIP: 0010:f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > > > [58691.136221] Code: 3c 07 b8 01 00 00 00 d3 e0 21 f8 75 57 83 fa 07 75 
> > > > 52 89 f2 31 c9 48 c7 c6 20 6a a7 c0 48 89 df e8 bc d6 03 00 f0 80 4b 48 
> > > > 04 <0f> 0b 31 c0 e9 5e fe ff ff 48 8b 57 10 8b 42 30 d3 e0 03 42 48 39
> > > > [58691.143142] RSP: 0018:b429047afd40 EFLAGS: 00010206
> > > > [58691.145639] RAX:  RBX: 9c3b84041000 RCX: 
> > > > 
> > > > [58691.148899] RDX:  RSI: 9c3bbbd58940 RDI: 
> > > > 9c3bbbd58940
> > > > [58691.152130] RBP: b429047afd48 R08: 9c3bbbd58940 R09: 
> > > > b429047afaa8
> > > > [58691.155266] R10: 001ba090 R11: 0003 R12: 
> > > > 7952
> > > > [58691.158304] R13: f5cc81266ac0 R14: 00db R15: 
> > > > 
> > > > [58691.161160] FS:  () GS:9c3bbbd4() 
> > > > knlGS:
> > > > [58691.164286] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [58691.166869] CR2: 7f0fee9d3000 CR3: 5ee76001 CR4: 
> > > > 00370ee0
> > > > [58691.169714] DR0:  DR1:  DR2: 
> > > > 
> > > > [58691.173102] DR3:  DR6: fffe0ff0 DR7: 
> > > > 0400
> > > > [58691.176163] Call Trace:
> > > > [58691.177948]  f2fs_cache_compressed_page+0x69/0x280 [f2fs]
> > > > [58691.180549]  ? newidle_balance+0x253/0x3d0
> > > > [58691.183238]  f2fs_end_read_compressed_page+0x5a/0x70 [f2fs]
> > > > [58691.188205]  f2fs_post_read_work+0x11d/0x120 [f2fs]
> > > > [58691.192489]  process_one_work+0x221/0x3a0
> > > > [58691.194482]  worker_thread+0x4d/0x3f0
> > > > [58691.198867]  kthread+0x114/0x150
> > > > [58691.202243]  ? process_one_work+0x3a0/0x3a0
> > > > [58691.205367]  ? kthread_park+0x90/0x90
> > > > [58691.208244]  ret_from_fork+0x22/0x30
> > > 
> > > Below patch fixes two issues, I expect this can fix above warning at 
> > > least.
> > 
> > [106115.591837] general protection fault, probably for non-canonical 
> > address 0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
> > [106115.595584] CPU: 3 PID: 10109 Comm: fsstress Tainted: G   O 
> >  5.11.0-rc3-custom #1
> > [106115.601087] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.14.0-1 04/01/2014
> > [106115.601087] RIP: 0010:f2fs_read_multi_pages+0x415/0xa70 [f2fs]
> 
> Jaegeuk,
> 
> Could you please help to run:
> 
> gdb f2fs.ko
> (gdb) l *(f2fs_read_multi_pages+0x415)
> 
> to see where we hit the panic.

It's fs/f2fs/data.c:2203

2199 goto out_put_dnode;
2200 }
2201
2202 for (i = 0; i < dic->nr_cpages; i++) {
2203 struct page *page = dic->cpages[i];
2204 block_t blkaddr;
2205 struct bio_post_read_ctx *ctx;
2206
2207 blkaddr = data_blkaddr(dn.inode, dn.node_page,
2

[PATCH] f2fs: deprecate f2fs_trace_io

2021-01-14 Thread Jaegeuk Kim

This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly,
resulting in more buggy cases.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Kconfig  |  10 ---
 fs/f2fs/Makefile |   1 -
 fs/f2fs/checkpoint.c |   3 -
 fs/f2fs/data.c   |   4 --
 fs/f2fs/file.c   |   2 -
 fs/f2fs/node.c   |   2 -
 fs/f2fs/segment.c|   3 -
 fs/f2fs/super.c  |   6 --
 fs/f2fs/trace.c  | 165 ---
 fs/f2fs/trace.h  |  43 ---
 10 files changed, 239 deletions(-)
 delete mode 100644 fs/f2fs/trace.c
 delete mode 100644 fs/f2fs/trace.h

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 63c1fc1a0e3b..62e638a49bbf 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -76,16 +76,6 @@ config F2FS_CHECK_FS
 
  If you want to improve the performance, say N.
 
-config F2FS_IO_TRACE
-   bool "F2FS IO tracer"
-   depends on F2FS_FS
-   depends on FUNCTION_TRACER
-   help
- F2FS IO trace is based on a function trace, which gathers process
- information and block IO patterns in the filesystem level.
-
- If unsure, say N.
-
 config F2FS_FAULT_INJECTION
bool "F2FS fault injection facility"
depends on F2FS_FS
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index ee7316b42f69..e5295746208b 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -7,6 +7,5 @@ f2fs-y  += shrinker.o extent_cache.o sysfs.o
 f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
 f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
 f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
-f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
 f2fs-$(CONFIG_FS_VERITY) += verity.o
 f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 897edb7c951a..8c79ba0566b1 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -17,7 +17,6 @@
 #include "f2fs.h"
 #include "node.h"
 #include "segment.h"
-#include "trace.h"
 #include 
 
 static struct kmem_cache *ino_entry_slab;
@@ -443,7 +442,6 @@ static int f2fs_set_meta_page_dirty(struct page *page)
__set_page_dirty_nobuffers(page);
inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_META);
f2fs_set_page_private(page, 0);
-   f2fs_trace_pid(page);
return 1;
}
return 0;
@@ -1017,7 +1015,6 @@ void f2fs_update_dirty_page(struct inode *inode, struct 
page *page)
spin_unlock(>inode_lock[type]);
 
f2fs_set_page_private(page, 0);
-   f2fs_trace_pid(page);
 }
 
 void f2fs_remove_dirty_inode(struct inode *inode)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d9a063d8a63d..38476d0d3916 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -25,7 +25,6 @@
 #include "f2fs.h"
 #include "node.h"
 #include "segment.h"
-#include "trace.h"
 #include 
 
 #define NUM_PREALLOC_POST_READ_CTXS128
@@ -679,7 +678,6 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
return -EFSCORRUPTED;
 
trace_f2fs_submit_page_bio(page, fio);
-   f2fs_trace_ios(fio, 0);
 
/* Allocate a new bio */
bio = __bio_alloc(fio, 1);
@@ -884,7 +882,6 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
return -EFSCORRUPTED;
 
trace_f2fs_submit_page_bio(page, fio);
-   f2fs_trace_ios(fio, 0);
 
if (bio && !page_is_mergeable(fio->sbi, bio, *fio->last_block,
fio->new_blkaddr))
@@ -981,7 +978,6 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
wbc_account_cgroup_owner(fio->io_wbc, bio_page, PAGE_SIZE);
 
io->last_block_in_bio = fio->new_blkaddr;
-   f2fs_trace_ios(fio, 0);
 
trace_f2fs_submit_page_write(fio->page, fio);
 skip:
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e768371c6575..7db27c81d034 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -29,7 +29,6 @@
 #include "xattr.h"
 #include "acl.h"
 #include "gc.h"
-#include "trace.h"
 #include 
 #include 
 
@@ -369,7 +368,6 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
start, loff_t end,
f2fs_update_time(sbi, REQ_TIME);
 out:
trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret);
-   f2fs_trace_ios(NULL, 1);
return ret;
 }
 
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 5e3fabacefb5..a8a0fb890e8d 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -17,7 +17,6 @@
 #include "node.h"
 #include "segment.h"
 #include "xattr.h"
-#include "trace.h"
 #include 
 
 #define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock)
@@ -2089,7 +2088,6 @@ static int f2fs_set_node_page_dirty(struct page *page)
__set_page_dirty_nobuffers(page);
inc_page_count(F2FS_P_SB(page), F2FS_D

Re: [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-14 Thread Jaegeuk Kim

On 12/30, Chao Yu wrote:
> ThreadA   ThreadB
> - f2fs_ioc_start_atomic_write
> - write
> - f2fs_ioc_commit_atomic_write
>  - f2fs_commit_inmem_pages
>  - f2fs_drop_inmem_pages
>  - f2fs_drop_inmem_pages
>   - __revoke_inmem_pages
>   - f2fs_vm_page_mkwrite
>- set_page_dirty
> - tag ATOMIC_WRITTEN_PAGE and add page
>   to inmem_pages list
>   - clear_inode_flag(FI_ATOMIC_FILE)
>   - f2fs_vm_page_mkwrite
> - set_page_dirty
>  - f2fs_update_dirty_page
>   - f2fs_trace_pid
>- tag inmem page private to pid

Hmm, how about removing fs/f2fs/trace.c to make private more complicated
like this? I think we can get IO traces from tracepoints.

>   - truncate
>- f2fs_invalidate_page
>- set page->mapping to NULL
> then it will cause panic once we
> access page->mapping
> 
> The root cause is we missed to keep isolation of atomic write in the case
> of commit_atomic_write vs mkwrite, let commit_atomic_write helds i_mmap_sem
> lock to avoid this issue.
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
> - use i_mmap_sem to avoid mkwrite racing with below flows:
>  * f2fs_ioc_start_atomic_write
>  * f2fs_drop_inmem_pages
>  * f2fs_commit_inmem_pages
> 
>  fs/f2fs/file.c| 3 +++
>  fs/f2fs/segment.c | 7 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 4e6d4b9120a8..a48ec650d691 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   goto out;
>  
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   /*
>* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
> inode->i_ino, get_dirty_pages(inode));
>   ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
>   if (ret) {
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   goto out;
>   }
> @@ -2073,6 +2075,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   /* add inode in inmem_list first and set atomic_file */
>   set_inode_flag(inode, FI_ATOMIC_FILE);
>   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>  
>   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index d8570b0359f5..dab870d9faf6 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -327,6 +327,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>  
> + down_write(_I(inode)->i_mmap_sem);
> +
>   while (!list_empty(>inmem_pages)) {
>   mutex_lock(>inmem_lock);
>   __revoke_inmem_pages(inode, >inmem_pages,
> @@ -344,6 +346,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   sbi->atomic_files--;
>   }
>   spin_unlock(>inode_lock[ATOMIC_FILE]);
> +
> + up_write(_I(inode)->i_mmap_sem);
>  }
>  
>  void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
> @@ -467,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   f2fs_balance_fs(sbi, true);
>  
>   down_write(>i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   f2fs_lock_op(sbi);
>   set_inode_flag(inode, FI_ATOMIC_COMMIT);
> @@ -478,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   clear_inode_flag(inode, FI_ATOMIC_COMMIT);
>  
>   f2fs_unlock_op(sbi);
> +
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(>i_gc_rwsem[WRITE]);
>  
>   return err;
> -- 
> 2.29.2

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-13 Thread Jaegeuk Kim

On 01/14, Chao Yu wrote:
> On 2021/1/13 23:41, Jaegeuk Kim wrote:
> > [58690.961685] F2FS-fs (vdb) : inject page get in f2fs_pagecache_get_page 
> > of f2fs_quota_write+0x150/0x1f0 [f2fs]
> > [58691.071481] F2FS-fs (vdb): Inconsistent error blkaddr:31058, sit bitmap:0
> > [58691.077338] [ cut here ]
> > [58691.081461] WARNING: CPU: 5 PID: 8308 at fs/f2fs/checkpoint.c:151 
> > f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > [58691.086734] Modules linked in: f2fs(O) quota_v2 quota_tree dm_multipath 
> > scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev intel_rapl_msr 
> > intel_rapl_common sb_edac kvm_intel kvm irqbypass joydev parport_pc parport 
> > input_leds serio_raw mac_hid qemu_fw_cfg sch_fq_codel ip_tables x_tables 
> > autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy 
> > asy
> > [58691.120632] CPU: 5 PID: 8308 Comm: kworker/u17:5 Tainted: G  DO  
> > 5.11.0-rc3-custom #1
> > [58691.125438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.14.0-1 04/01/2014
> > [58691.129625] Workqueue: f2fs_post_read_wq f2fs_post_read_work [f2fs]
> > [58691.133142] RIP: 0010:f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > [58691.136221] Code: 3c 07 b8 01 00 00 00 d3 e0 21 f8 75 57 83 fa 07 75 52 
> > 89 f2 31 c9 48 c7 c6 20 6a a7 c0 48 89 df e8 bc d6 03 00 f0 80 4b 48 04 
> > <0f> 0b 31 c0 e9 5e fe ff ff 48 8b 57 10 8b 42 30 d3 e0 03 42 48 39
> > [58691.143142] RSP: 0018:b429047afd40 EFLAGS: 00010206
> > [58691.145639] RAX:  RBX: 9c3b84041000 RCX: 
> > 
> > [58691.148899] RDX:  RSI: 9c3bbbd58940 RDI: 
> > 9c3bbbd58940
> > [58691.152130] RBP: b429047afd48 R08: 9c3bbbd58940 R09: 
> > b429047afaa8
> > [58691.155266] R10: 001ba090 R11: 0003 R12: 
> > 7952
> > [58691.158304] R13: f5cc81266ac0 R14: 00db R15: 
> > 
> > [58691.161160] FS:  () GS:9c3bbbd4() 
> > knlGS:
> > [58691.164286] CS:  0010 DS:  ES:  CR0: 80050033
> > [58691.166869] CR2: 7f0fee9d3000 CR3: 5ee76001 CR4: 
> > 00370ee0
> > [58691.169714] DR0:  DR1:  DR2: 
> > 
> > [58691.173102] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [58691.176163] Call Trace:
> > [58691.177948]  f2fs_cache_compressed_page+0x69/0x280 [f2fs]
> > [58691.180549]  ? newidle_balance+0x253/0x3d0
> > [58691.183238]  f2fs_end_read_compressed_page+0x5a/0x70 [f2fs]
> > [58691.188205]  f2fs_post_read_work+0x11d/0x120 [f2fs]
> > [58691.192489]  process_one_work+0x221/0x3a0
> > [58691.194482]  worker_thread+0x4d/0x3f0
> > [58691.198867]  kthread+0x114/0x150
> > [58691.202243]  ? process_one_work+0x3a0/0x3a0
> > [58691.205367]  ? kthread_park+0x90/0x90
> > [58691.208244]  ret_from_fork+0x22/0x30
> 
> Below patch fixes two issues, I expect this can fix above warning at least.

[106115.591837] general protection fault, probably for non-canonical address 
0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
[106115.595584] CPU: 3 PID: 10109 Comm: fsstress Tainted: G   O  
5.11.0-rc3-custom #1
[106115.601087] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.14.0-1 04/01/2014
[106115.601087] RIP: 0010:f2fs_read_multi_pages+0x415/0xa70 [f2fs]
[106115.601087] Code: ff ff ff 45 31 ff f7 d0 25 00 00 08 00 89 45 80 48 8b 45 
a0 48 83 c0 6c 48 89 85 78 ff ff ff 48 8b 7d a0 49 63 c7 48 8b 57 30 <48> 8b 1c 
c2 8b 45 c4 8d 50 01 48 8b 45 b8 48 2b 05 c6 55 92 dc 48
[106115.601087] RSP: 0018:c0a4822f7710 EFLAGS: 00010206
[106115.620978] RAX: 0001 RBX: e801820034c0 RCX: 
0020
[106115.620978] RDX: 6b6b6b6b6b6b6b6b RSI: c09487af RDI: 
9bc1d87c4200
[106115.627351] RBP: c0a4822f77c0 R08:  R09: 

[106115.627351] R10: 9bc1d87c4200 R11: 0001 R12: 
00105343
[106115.627351] R13: 9bc2d2184000 R14:  R15: 
0001
[106115.635587] FS:  7f188e909b80() GS:9bc2fbcc() 
knlGS:
[106115.635587] CS:  0010 DS:  ES:  CR0: 80050033
[106115.635587] CR2: 56446d88b358 CR3: 534b4002 CR4: 
00370ee0
[106115.635587] DR0:  DR1:  DR2: 

[106115.635587] DR3:  DR6: fffe0ff0 DR7: 
0400
[106115.635587] Call Trace:
[106115.635587]  f2fs_mpage_readpages+0x4e4/0xac0 [f2fs]
[106115.635587]  f2fs_readahead+0x47/0x90 [f2fs]
[106115.635587]  read_pages+0x8e/

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-13 Thread Jaegeuk Kim

On 01/13, Chao Yu wrote:
> On 2021/1/13 6:36, Jaegeuk Kim wrote:
> > On 01/12, Chao Yu wrote:
> > > On 2021/1/12 10:04, Jaegeuk Kim wrote:
> > > > On 01/12, Chao Yu wrote:
> > > > > On 2021/1/11 19:45, Chao Yu wrote:
> > > > > > On 2021/1/11 18:31, Chao Yu wrote:
> > > > > > > On 2021/1/11 17:48, Jaegeuk Kim wrote:
> > > > > > > > Hi Chao,
> > > > > > > > 
> > > > > > > > After quick test of fsstress w/ fault injection, it gave wrong 
> > > > > > > > block address
> > > > > > > > errors. Could you please run the test a bit?
> > > > > > > 
> > > > > > > Jaegeuk,
> > > > > > > 
> > > > > > > Oh, I've covered with fstest cases and there is no such error 
> > > > > > > message, let me
> > > > > > > try fault injection + SPO case soon.
> > > > > > 
> > > > > > Till now, I haven't see any problem... will let the test run for 
> > > > > > longer time in
> > > > > > this night.
> > > > > > 
> > > > > > Could you share me detailed error message you encounter?
> > > > > 
> > > > > Still, I don't see wrong block address error...
> > > > > 
> > > > > Did the error occur from below path:
> > > > > 
> > > > > - f2fs_end_read_compressed_page
> > > > >- f2fs_cache_compressed_page
> > > > > - f2fs_is_valid_blkaddr
> > > > 
> > > > [58690.176668] general protection fault, probably for non-canonical 
> > > > address 0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
> > > > [58690.180563] CPU: 0 PID: 29371 Comm: fsstress Tainted: G   O  
> > > > 5.11.0-rc3-custom #1
> > > > [58690.186466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS 1.14.0-1 04/01/2014
> > > > [58690.189352] RIP: 0010:f2fs_read_multi_pages+0x413/0xa70 [f2fs]
> > > > [58690.193366] Code: ad 54 ff ff ff 4c 8b ad 68 ff ff ff 25 00 00 08 00 
> > > > 89 85 78 ff ff ff 49 8d 47 6c 48 89 85 70 ff ff ff 48 63 45 a0 49 8b 57 
> > > > 30 <4c> 8b 34 c2 8b 45 c4 8d 50 01 48 8b 45 b8 48 2b 05 98 56 40 c8 48
> > > > [58690.212479] RSP: 0018:b429022dfa60 EFLAGS: 00010206
> > > > [58690.218410] RAX: 0001 RBX: 78af RCX: 
> > > > 0020
> > > > [58690.222473] RDX: 6b6b6b6b6b6b6b6b RSI: c0a6872f RDI: 
> > > > 0246
> > > > [58690.227349] RBP: b429022dfb10 R08:  R09: 
> > > > 
> > > > [58690.234425] R10: 9c3af1f78200 R11: 0001 R12: 
> > > > 
> > > > [58690.238503] R13: 9c3b84041000 R14: f5cc8166f5c0 R15: 
> > > > 9c3af1f78200
> > > > [58690.242455] FS:  7f0fee9d4b80() GS:9c3bbbc0() 
> > > > knlGS:
> > > > [58690.246401] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [58690.250471] CR2: 563b839c1000 CR3: 2cb0e004 CR4: 
> > > > 00370ef0
> > > > [58690.250471] DR0:  DR1:  DR2: 
> > > > 
> > > > [58690.258758] DR3:  DR6: fffe0ff0 DR7: 
> > > > 0400
> > > > [58690.262464] Call Trace:
> > > > [58690.262464]  prepare_compress_overwrite+0x380/0x510 [f2fs]
> > > > [58690.266489]  ? xas_load+0x9/0x80
> > > > [58690.270452]  f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs]
> > > > [58690.274466]  f2fs_write_begin+0x81e/0x1120 [f2fs]
> > > > [58690.277213]  generic_perform_write+0xc2/0x1c0
> > > > [58690.278698]  __generic_file_write_iter+0x167/0x1d0
> > > > [58690.286472]  f2fs_file_write_iter+0x39e/0x590 [f2fs]
> > > > [58690.290398]  new_sync_write+0x117/0x1b0
> > > > [58690.290461]  vfs_write+0x185/0x250
> > > > [58690.295197]  ksys_write+0x67/0xe0
> > > > [58690.298173]  __x64_sys_write+0x1a/0x20
> > > > [58690.298437]  do_syscall_64+0x38/0x90
> > > > [58690.298437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > 
> > > > [58690.961685] F2FS-fs (vdb) : inject page get in 
> > > > f2fs_pagecache_get_page of f2fs_quota_write+0x150/0x1f0 [f2fs]
> &g

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-12 Thread Jaegeuk Kim

On 01/12, Chao Yu wrote:
> On 2021/1/12 10:04, Jaegeuk Kim wrote:
> > On 01/12, Chao Yu wrote:
> > > On 2021/1/11 19:45, Chao Yu wrote:
> > > > On 2021/1/11 18:31, Chao Yu wrote:
> > > > > On 2021/1/11 17:48, Jaegeuk Kim wrote:
> > > > > > Hi Chao,
> > > > > > 
> > > > > > After quick test of fsstress w/ fault injection, it gave wrong 
> > > > > > block address
> > > > > > errors. Could you please run the test a bit?
> > > > > 
> > > > > Jaegeuk,
> > > > > 
> > > > > Oh, I've covered with fstest cases and there is no such error 
> > > > > message, let me
> > > > > try fault injection + SPO case soon.
> > > > 
> > > > Till now, I haven't see any problem... will let the test run for longer 
> > > > time in
> > > > this night.
> > > > 
> > > > Could you share me detailed error message you encounter?
> > > 
> > > Still, I don't see wrong block address error...
> > > 
> > > Did the error occur from below path:
> > > 
> > > - f2fs_end_read_compressed_page
> > >   - f2fs_cache_compressed_page
> > >- f2fs_is_valid_blkaddr
> > 
> > [58690.176668] general protection fault, probably for non-canonical address 
> > 0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
> > [58690.180563] CPU: 0 PID: 29371 Comm: fsstress Tainted: G   O  
> > 5.11.0-rc3-custom #1
> > [58690.186466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.14.0-1 04/01/2014
> > [58690.189352] RIP: 0010:f2fs_read_multi_pages+0x413/0xa70 [f2fs]
> > [58690.193366] Code: ad 54 ff ff ff 4c 8b ad 68 ff ff ff 25 00 00 08 00 89 
> > 85 78 ff ff ff 49 8d 47 6c 48 89 85 70 ff ff ff 48 63 45 a0 49 8b 57 30 
> > <4c> 8b 34 c2 8b 45 c4 8d 50 01 48 8b 45 b8 48 2b 05 98 56 40 c8 48
> > [58690.212479] RSP: 0018:b429022dfa60 EFLAGS: 00010206
> > [58690.218410] RAX: 0001 RBX: 78af RCX: 
> > 0020
> > [58690.222473] RDX: 6b6b6b6b6b6b6b6b RSI: c0a6872f RDI: 
> > 0246
> > [58690.227349] RBP: b429022dfb10 R08:  R09: 
> > 
> > [58690.234425] R10: 9c3af1f78200 R11: 0001 R12: 
> > 
> > [58690.238503] R13: 9c3b84041000 R14: f5cc8166f5c0 R15: 
> > 9c3af1f78200
> > [58690.242455] FS:  7f0fee9d4b80() GS:9c3bbbc0() 
> > knlGS:
> > [58690.246401] CS:  0010 DS:  ES:  CR0: 80050033
> > [58690.250471] CR2: 563b839c1000 CR3: 2cb0e004 CR4: 
> > 00370ef0
> > [58690.250471] DR0:  DR1:  DR2: 
> > 
> > [58690.258758] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [58690.262464] Call Trace:
> > [58690.262464]  prepare_compress_overwrite+0x380/0x510 [f2fs]
> > [58690.266489]  ? xas_load+0x9/0x80
> > [58690.270452]  f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs]
> > [58690.274466]  f2fs_write_begin+0x81e/0x1120 [f2fs]
> > [58690.277213]  generic_perform_write+0xc2/0x1c0
> > [58690.278698]  __generic_file_write_iter+0x167/0x1d0
> > [58690.286472]  f2fs_file_write_iter+0x39e/0x590 [f2fs]
> > [58690.290398]  new_sync_write+0x117/0x1b0
> > [58690.290461]  vfs_write+0x185/0x250
> > [58690.295197]  ksys_write+0x67/0xe0
> > [58690.298173]  __x64_sys_write+0x1a/0x20
> > [58690.298437]  do_syscall_64+0x38/0x90
> > [58690.298437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > [58690.961685] F2FS-fs (vdb) : inject page get in f2fs_pagecache_get_page 
> > of f2fs_quota_write+0x150/0x1f0 [f2fs]
> > [58691.071481] F2FS-fs (vdb): Inconsistent error blkaddr:31058, sit bitmap:0
> > [58691.077338] [ cut here ]
> > [58691.081461] WARNING: CPU: 5 PID: 8308 at fs/f2fs/checkpoint.c:151 
> > f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
> > [58691.086734] Modules linked in: f2fs(O) quota_v2 quota_tree dm_multipath 
> > scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev intel_rapl_msr 
> > intel_rapl_common sb_edac kvm_intel kvm irqbypass joydev parport_pc parport 
> > input_leds serio_raw mac_hid qemu_fw_cfg sch_fq_codel ip_tables x_tables 
> > autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy 
> > asy
> > [58691.120632] CPU: 5 PID: 8308 Comm: kworker/u17:5 Tainted: G  DO  
> > 5.11.0-rc3-custom #1
> > [58691.125438] Hardw

Re: [f2fs-dev] [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-12 Thread Jaegeuk Kim

On 01/12, Chao Yu wrote:
> On 2021/1/12 0:32, Jaegeuk Kim wrote:
> > On 01/06, Jaegeuk Kim wrote:
> > > On 01/06, Jaegeuk Kim wrote:
> > > > Hi Chao,
> > > > 
> > > > With a quick test, this patch causes down_write failure resulting in 
> > > > blocking
> > > > process. I didn't dig in the bug so, please check the code again. :P
> > > 
> > > nvm. I can see it works now.
> > 
> > Hmm, this gives a huge perf regression when running sqlite. :(
> > We may need to check the lock coverage. Thoughts?
> 
> I added i_mmap_sem lock only, so it can cause atomic_{start,commit,finish}
> race with mmap and truncation operations in additionally.
> 
> I'd like to know what's your sqlite testcase?

Nothing special. Just generating multiple sqlite transactions to the same db.

> 
> Thanks,
> 
> > 
> > > 
> > > > 
> > > > On 12/30, Chao Yu wrote:
> > > > > ThreadA   ThreadB
> > > > > - f2fs_ioc_start_atomic_write
> > > > > - write
> > > > > - f2fs_ioc_commit_atomic_write
> > > > >   - f2fs_commit_inmem_pages
> > > > >   - f2fs_drop_inmem_pages
> > > > >   - f2fs_drop_inmem_pages
> > > > >- __revoke_inmem_pages
> > > > >   - f2fs_vm_page_mkwrite
> > > > >- set_page_dirty
> > > > > - tag ATOMIC_WRITTEN_PAGE and 
> > > > > add page
> > > > >   to inmem_pages list
> > > > >- clear_inode_flag(FI_ATOMIC_FILE)
> > > > >   - f2fs_vm_page_mkwrite
> > > > > - set_page_dirty
> > > > >  - f2fs_update_dirty_page
> > > > >   - f2fs_trace_pid
> > > > >- tag inmem page private 
> > > > > to pid
> > > > >   - truncate
> > > > >- f2fs_invalidate_page
> > > > >- set page->mapping to NULL
> > > > > then it will cause panic once 
> > > > > we
> > > > > access page->mapping
> > > > > 
> > > > > The root cause is we missed to keep isolation of atomic write in the 
> > > > > case
> > > > > of commit_atomic_write vs mkwrite, let commit_atomic_write helds 
> > > > > i_mmap_sem
> > > > > lock to avoid this issue.
> > > > > 
> > > > > Signed-off-by: Chao Yu 
> > > > > ---
> > > > > v2:
> > > > > - use i_mmap_sem to avoid mkwrite racing with below flows:
> > > > >   * f2fs_ioc_start_atomic_write
> > > > >   * f2fs_drop_inmem_pages
> > > > >   * f2fs_commit_inmem_pages
> > > > > 
> > > > >   fs/f2fs/file.c| 3 +++
> > > > >   fs/f2fs/segment.c | 7 +++
> > > > >   2 files changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index 4e6d4b9120a8..a48ec650d691 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct 
> > > > > file *filp)
> > > > >   goto out;
> > > > >   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > > > > + down_write(_I(inode)->i_mmap_sem);
> > > > >   /*
> > > > >* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > > > > @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct 
> > > > > file *filp)
> > > > > inode->i_ino, get_dirty_pages(inode));
> > > > >   ret = filemap_write_and_wait_range(inode->i_mapping, 0, 
> > > > > LLONG_MAX);
> > > > >   if (ret) {
> > > > > + up_write(_I(inode)->i_mmap_sem);
> > > > >   up_write(_I(inode)->i_gc_rwsem[WRIT

Re: [f2fs-dev] [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-11 Thread Jaegeuk Kim

On 01/12, Chao Yu wrote:
> On 2021/1/11 19:45, Chao Yu wrote:
> > On 2021/1/11 18:31, Chao Yu wrote:
> > > On 2021/1/11 17:48, Jaegeuk Kim wrote:
> > > > Hi Chao,
> > > > 
> > > > After quick test of fsstress w/ fault injection, it gave wrong block 
> > > > address
> > > > errors. Could you please run the test a bit?
> > > 
> > > Jaegeuk,
> > > 
> > > Oh, I've covered with fstest cases and there is no such error message, 
> > > let me
> > > try fault injection + SPO case soon.
> > 
> > Till now, I haven't see any problem... will let the test run for longer 
> > time in
> > this night.
> > 
> > Could you share me detailed error message you encounter?
> 
> Still, I don't see wrong block address error...
> 
> Did the error occur from below path:
> 
> - f2fs_end_read_compressed_page
>  - f2fs_cache_compressed_page
>   - f2fs_is_valid_blkaddr

[58690.176668] general protection fault, probably for non-canonical address 
0x6b6b6b6b6b6b6b73:  [#1] SMP PTI
[58690.180563] CPU: 0 PID: 29371 Comm: fsstress Tainted: G   O  
5.11.0-rc3-custom #1
[58690.186466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.14.0-1 04/01/2014
[58690.189352] RIP: 0010:f2fs_read_multi_pages+0x413/0xa70 [f2fs]
[58690.193366] Code: ad 54 ff ff ff 4c 8b ad 68 ff ff ff 25 00 00 08 00 89 85 
78 ff ff ff 49 8d 47 6c 48 89 85 70 ff ff ff 48 63 45 a0 49 8b 57 30 <4c> 8b 34 
c2 8b 45 c4 8d 50 01 48 8b 45 b8 48 2b 05 98 56 40 c8 48
[58690.212479] RSP: 0018:b429022dfa60 EFLAGS: 00010206
[58690.218410] RAX: 0001 RBX: 78af RCX: 0020
[58690.222473] RDX: 6b6b6b6b6b6b6b6b RSI: c0a6872f RDI: 0246
[58690.227349] RBP: b429022dfb10 R08:  R09: 
[58690.234425] R10: 9c3af1f78200 R11: 0001 R12: 
[58690.238503] R13: 9c3b84041000 R14: f5cc8166f5c0 R15: 9c3af1f78200
[58690.242455] FS:  7f0fee9d4b80() GS:9c3bbbc0() 
knlGS:
[58690.246401] CS:  0010 DS:  ES:  CR0: 80050033
[58690.250471] CR2: 563b839c1000 CR3: 2cb0e004 CR4: 00370ef0
[58690.250471] DR0:  DR1:  DR2: 
[58690.258758] DR3:  DR6: fffe0ff0 DR7: 0400
[58690.262464] Call Trace:
[58690.262464]  prepare_compress_overwrite+0x380/0x510 [f2fs]
[58690.266489]  ? xas_load+0x9/0x80
[58690.270452]  f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs]
[58690.274466]  f2fs_write_begin+0x81e/0x1120 [f2fs]
[58690.277213]  generic_perform_write+0xc2/0x1c0
[58690.278698]  __generic_file_write_iter+0x167/0x1d0
[58690.286472]  f2fs_file_write_iter+0x39e/0x590 [f2fs]
[58690.290398]  new_sync_write+0x117/0x1b0
[58690.290461]  vfs_write+0x185/0x250
[58690.295197]  ksys_write+0x67/0xe0
[58690.298173]  __x64_sys_write+0x1a/0x20
[58690.298437]  do_syscall_64+0x38/0x90
[58690.298437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[58690.961685] F2FS-fs (vdb) : inject page get in f2fs_pagecache_get_page of 
f2fs_quota_write+0x150/0x1f0 [f2fs]
[58691.071481] F2FS-fs (vdb): Inconsistent error blkaddr:31058, sit bitmap:0
[58691.077338] [ cut here ]
[58691.081461] WARNING: CPU: 5 PID: 8308 at fs/f2fs/checkpoint.c:151 
f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
[58691.086734] Modules linked in: f2fs(O) quota_v2 quota_tree dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev intel_rapl_msr intel_rapl_common 
sb_edac kvm_intel kvm irqbypass joydev parport_pc parport input_leds serio_raw 
mac_hid qemu_fw_cfg sch_fq_codel ip_tables x_tables autofs4 btrfs 
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy asy
[58691.120632] CPU: 5 PID: 8308 Comm: kworker/u17:5 Tainted: G  DO  
5.11.0-rc3-custom #1
[58691.125438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.14.0-1 04/01/2014
[58691.129625] Workqueue: f2fs_post_read_wq f2fs_post_read_work [f2fs]
[58691.133142] RIP: 0010:f2fs_is_valid_blkaddr+0x1e9/0x280 [f2fs]
[58691.136221] Code: 3c 07 b8 01 00 00 00 d3 e0 21 f8 75 57 83 fa 07 75 52 89 
f2 31 c9 48 c7 c6 20 6a a7 c0 48 89 df e8 bc d6 03 00 f0 80 4b 48 04 <0f> 0b 31 
c0 e9 5e fe ff ff 48 8b 57 10 8b 42 30 d3 e0 03 42 48 39
[58691.143142] RSP: 0018:b429047afd40 EFLAGS: 00010206
[58691.145639] RAX:  RBX: 9c3b84041000 RCX: 
[58691.148899] RDX:  RSI: 9c3bbbd58940 RDI: 9c3bbbd58940
[58691.152130] RBP: b429047afd48 R08: 9c3bbbd58940 R09: b429047afaa8
[58691.155266] R10: 001ba090 R11: 0003 R12: 7952
[58691.158304] R13: f5cc81266ac0 R14: 00db R15: 
[58691.161160] FS:  () GS:9c3bbbd4() 
knlGS

Re: [f2fs-dev] [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-11 Thread Jaegeuk Kim

On 01/06, Jaegeuk Kim wrote:
> On 01/06, Jaegeuk Kim wrote:
> > Hi Chao,
> > 
> > With a quick test, this patch causes down_write failure resulting in 
> > blocking
> > process. I didn't dig in the bug so, please check the code again. :P
> 
> nvm. I can see it works now.

Hmm, this gives a huge perf regression when running sqlite. :(
We may need to check the lock coverage. Thoughts?

> 
> > 
> > On 12/30, Chao Yu wrote:
> > > ThreadA   ThreadB
> > > - f2fs_ioc_start_atomic_write
> > > - write
> > > - f2fs_ioc_commit_atomic_write
> > >  - f2fs_commit_inmem_pages
> > >  - f2fs_drop_inmem_pages
> > >  - f2fs_drop_inmem_pages
> > >   - __revoke_inmem_pages
> > >   - f2fs_vm_page_mkwrite
> > >- set_page_dirty
> > > - tag ATOMIC_WRITTEN_PAGE and add page
> > >   to inmem_pages list
> > >   - clear_inode_flag(FI_ATOMIC_FILE)
> > >   - f2fs_vm_page_mkwrite
> > > - set_page_dirty
> > >  - f2fs_update_dirty_page
> > >   - f2fs_trace_pid
> > >- tag inmem page private to pid
> > >   - truncate
> > >- f2fs_invalidate_page
> > >- set page->mapping to NULL
> > > then it will cause panic once we
> > > access page->mapping
> > > 
> > > The root cause is we missed to keep isolation of atomic write in the case
> > > of commit_atomic_write vs mkwrite, let commit_atomic_write helds 
> > > i_mmap_sem
> > > lock to avoid this issue.
> > > 
> > > Signed-off-by: Chao Yu 
> > > ---
> > > v2:
> > > - use i_mmap_sem to avoid mkwrite racing with below flows:
> > >  * f2fs_ioc_start_atomic_write
> > >  * f2fs_drop_inmem_pages
> > >  * f2fs_commit_inmem_pages
> > > 
> > >  fs/f2fs/file.c| 3 +++
> > >  fs/f2fs/segment.c | 7 +++
> > >  2 files changed, 10 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 4e6d4b9120a8..a48ec650d691 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   goto out;
> > >  
> > >   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > > + down_write(_I(inode)->i_mmap_sem);
> > >  
> > >   /*
> > >* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > > @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > > inode->i_ino, get_dirty_pages(inode));
> > >   ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
> > >   if (ret) {
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >   goto out;
> > >   }
> > > @@ -2073,6 +2075,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > > *filp)
> > >   /* add inode in inmem_list first and set atomic_file */
> > >   set_inode_flag(inode, FI_ATOMIC_FILE);
> > >   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> > > + up_write(_I(inode)->i_mmap_sem);
> > >   up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > >  
> > >   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index d8570b0359f5..dab870d9faf6 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -327,6 +327,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > >   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > >   struct f2fs_inode_info *fi = F2FS_I(inode);
> > >  
> > > + down_write(_I(inode)->i_mmap_sem);
> > > +
> > >   while (!list_empty(>inmem_pages)) {
> > >   mutex_lock(>inmem_lock);
> > >   __revoke_inmem_pages(inode, >inmem_pages,
> > > @@ -344,6 +346,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > >   sbi->

[PATCH v3] scsi: ufs: WB is only available on LUN #0 to #7

2021-01-11 Thread Jaegeuk Kim

From: Jaegeuk Kim 

Kernel stack violation when getting unit_descriptor/wb_buf_alloc_units from
rpmb lun. The reason is the unit descriptor length is different per LU.

The lengh of Normal LU is 45, while the one of rpmb LU is 35.

int ufshcd_read_desc_param(struct ufs_hba *hba, ...)
{
param_offset=41;
param_size=4;
buff_len=45;
...
buff_len=35 by rpmb LU;

if (is_kmalloc) {
/* Make sure we don't copy more data than available */
if (param_offset + param_size > buff_len)
param_size = buff_len - param_offset;
--> param_size = 250;
memcpy(param_read_buf, _buf[param_offset], param_size);
--> memcpy(param_read_buf, desc_buf+41, 250);

[  141.868974][ T9174] Kernel panic - not syncing: stack-protector: Kernel 
stack is corrupted in: wb_buf_alloc_units_show+0x11c/0x11c
}
}

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufs-sysfs.c | 3 ++-
 drivers/scsi/ufs/ufs.h   | 6 --
 drivers/scsi/ufs/ufshcd.c| 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 08e72b7eef6a..50e90416262b 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -792,7 +792,8 @@ static ssize_t _pname##_show(struct device *dev,
\
struct scsi_device *sdev = to_scsi_device(dev); \
struct ufs_hba *hba = shost_priv(sdev->host);   \
u8 lun = ufshcd_scsi_to_upiu_lun(sdev->lun);\
-   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun))   \
+   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun,\
+   _duname##_DESC_PARAM##_puname)) \
return -EINVAL; \
return ufs_sysfs_read_desc_param(hba, QUERY_DESC_IDN_##_duname, \
lun, _duname##_DESC_PARAM##_puname, buf, _size);\
diff --git a/drivers/scsi/ufs/ufs.h b/drivers/scsi/ufs/ufs.h
index 14dfda735adf..580aa56965d0 100644
--- a/drivers/scsi/ufs/ufs.h
+++ b/drivers/scsi/ufs/ufs.h
@@ -552,13 +552,15 @@ struct ufs_dev_info {
  * @return: true if the lun has a matching unit descriptor, false otherwise
  */
 static inline bool ufs_is_valid_unit_desc_lun(struct ufs_dev_info *dev_info,
-   u8 lun)
+   u8 lun, u8 param_offset)
 {
if (!dev_info || !dev_info->max_lu_supported) {
pr_err("Max General LU supported by UFS isn't initialized\n");
return false;
}
-
+   /* WB is available only for the logical unit from 0 to 7 */
+   if (param_offset == UNIT_DESC_PARAM_WB_BUF_ALLOC_UNITS)
+   return lun < UFS_UPIU_MAX_WB_LUN_ID;
return lun == UFS_UPIU_RPMB_WLUN || (lun < dev_info->max_lu_supported);
 }
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 2a715f13fe1d..48cbd4f294dd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3425,7 +3425,7 @@ static inline int ufshcd_read_unit_desc_param(struct 
ufs_hba *hba,
 * Unit descriptors are only available for general purpose LUs (LUN id
 * from 0 to 7) and RPMB Well known LU.
 */
-   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun))
+   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun, param_offset))
return -EOPNOTSUPP;
 
return ufshcd_read_desc_param(hba, QUERY_DESC_IDN_UNIT, lun,
-- 
2.30.0.284.gd98b1dd5eaa7-goog

Re: [PATCH v2] scsi: ufs: WB is not allowed in RPMB_LUN

2021-01-11 Thread Jaegeuk Kim

On 01/11, Avri Altman wrote:
> >  static inline bool ufs_is_valid_unit_desc_lun(struct ufs_dev_info 
> > *dev_info,
> > -   u8 lun)
> > +   u8 lun, u8 param_offset)
> >  {
> > if (!dev_info || !dev_info->max_lu_supported) {
> > pr_err("Max General LU supported by UFS isn't 
> > initialized\n");
> > return false;
> > }
> > -
> > +   /* WB is not allowed in RPMB_WLUN */
> /* wb is only allowed to either a sha*/
> > +   if (param_offset == UNIT_DESC_PARAM_WB_BUF_ALLOC_UNITS)
> > +   return lun < dev_info->max_lu_supported;
> I think here you should use UFS_UPIU_MAX_WB_LUN_ID and not 
> dev_info->max_lu_supported.

Ok, sending v3.

> 
> Thanks,
> Avri

Re: [f2fs-dev] [PATCH 1/2] f2fs: introduce checkpoint=merge mount option

2021-01-11 Thread Jaegeuk Kim

On 01/11, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> We've added a new mount option "checkpoint=merge", which creates a
> kernel daemon and makes it to merge concurrent checkpoint requests as
> much as possible to eliminate redundant checkpoint issues. Plus, we
> can eliminate the sluggish issue caused by slow checkpoint operation
> when the checkpoint is done in a process context in a cgroup having
> low i/o budget and cpu shares, and The below verification result
> explains this.
> The basic idea has come from https://opensource.samsung.com.
> 
> [Verification]
> Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
> Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
> 
> In "fg" cgroup,
> - thread A => trigger 1000 checkpoint operations
>   "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
>done"
> - thread B => gererating async. I/O
>   "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
>--filename=test_img --name=test"
> 
> In "bg" cgroup,
> - thread C => trigger repeated checkpoint operations
>   "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
>fsync test_dir2; done"
> 
> We've measured thread A's execution time.
> 
> [ w/o patch ]
> Elapsed Time: Avg. 68 seconds
> [ w/  patch ]
> Elapsed Time: Avg. 48 seconds
> 
> Signed-off-by: Daeho Jeong 
> Signed-off-by: Sungjong Seo 
> ---
>  Documentation/filesystems/f2fs.rst |   6 +
>  fs/f2fs/checkpoint.c   | 176 +
>  fs/f2fs/debug.c|   6 +
>  fs/f2fs/f2fs.h |  24 
>  fs/f2fs/super.c|  53 -
>  5 files changed, 261 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index dae15c96e659..bccc021bf31a 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -247,6 +247,12 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off 
> checkpointing. Set to "enabl
>hide up to all remaining free space. The actual space 
> that
>would be unusable can be viewed at 
> /sys/fs/f2fs//unusable
>This space is reclaimed once checkpoint=enable.
> +  Here is another option "merge", which creates a kernel 
> daemon
> +  and makes it to merge concurrent checkpoint requests 
> as much
> +  as possible to eliminate redundant checkpoint issues. 
> Plus,
> +  we can eliminate the sluggish issue caused by slow 
> checkpoint
> +  operation when the checkpoint is done in a process 
> context in
> +  a cgroup having low i/o budget and cpu shares.
>  compress_algorithm=%s Control compress algorithm, currently f2fs 
> supports "lzo",
>"lz4", "zstd" and "lzo-rle" algorithm.
>  compress_log_size=%u  Support configuring compress cluster size, the size 
> will
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 897edb7c951a..11288f435dbe 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> @@ -20,6 +21,8 @@
>  #include "trace.h"
>  #include 
>  
> +#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
> +
>  static struct kmem_cache *ino_entry_slab;
>  struct kmem_cache *f2fs_inode_entry_slab;
>  
> @@ -1707,3 +1710,176 @@ void f2fs_destroy_checkpoint_caches(void)
>   kmem_cache_destroy(ino_entry_slab);
>   kmem_cache_destroy(f2fs_inode_entry_slab);
>  }
> +
> +static int __write_checkpoint_sync(struct f2fs_sb_info *sbi)
> +{
> + struct cp_control cpc = { .reason = CP_SYNC, };
> + int err;
> +
> + down_write(>gc_lock);
> + err = f2fs_write_checkpoint(sbi, );
> + up_write(>gc_lock);
> +
> + return err;
> +}
> +
> +static void __checkpoint_and_complete_reqs(struct f2fs_sb_info *sbi)
> +{
> + struct ckpt_req_control *cprc = sbi->cprc_info;
> + struct ckpt_req *req, *next;
> + struct llist_node *dispatch_list;
> + int ret;
> +
> + dispatch_list = llist_del_all(>issue_list);
> + if (!dispatch_list)
> + return;
> + dispatch_list = llist_reverse_order(dispatch_list);
> +
> + ret = __write_checkpoint_sync(sbi);
> + atomic_inc(>issued_ckpt);
> +
> + llist_for_each_entry_safe(req, next, dispatch_list, llnode) {
> + atomic_dec(>queued_ckpt);
> + atomic_inc(>total_ckpt);
> + req->complete_time = jiffies;
> + req->ret = ret;
> + complete(>wait);
> + }
> +}
> +
> +static int issue_checkpoint_thread(void *data)
> +{
> + struct f2fs_sb_info *sbi = data;
> + struct ckpt_req_control *cprc = sbi->cprc_info;
> + wait_queue_head_t *q = >ckpt_wait_queue;
> +repeat:
> + if

Re: [PATCH v3 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-11 Thread Jaegeuk Kim

Hi Chao,

After quick test of fsstress w/ fault injection, it gave wrong block address
errors. Could you please run the test a bit?

Thanks,

On 01/07, Chao Yu wrote:
> Support to use address space of inner inode to cache compressed block,
> in order to improve cache hit ratio of random read.
> 
> Signed-off-by: Chao Yu 
> ---
> v3:
> - rebase to last dev branch.
> - add blkaddr sanity check in f2fs_cache_compressed_page()
>  Documentation/filesystems/f2fs.rst |   3 +
>  fs/f2fs/compress.c | 171 -
>  fs/f2fs/data.c |  19 +++-
>  fs/f2fs/debug.c|  13 +++
>  fs/f2fs/f2fs.h |  39 ++-
>  fs/f2fs/gc.c   |   1 +
>  fs/f2fs/inode.c|  21 +++-
>  fs/f2fs/segment.c  |   6 +-
>  fs/f2fs/super.c|  19 +++-
>  include/linux/f2fs_fs.h|   1 +
>  10 files changed, 282 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index 5eff4009e77e..cd1e5b826ba3 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -273,6 +273,9 @@ compress_mode=%s   Control file compression mode. This 
> supports "fs" and "user"
>choosing the target file and the timing. The user can 
> do manual
>compression/decompression on the compression enabled 
> files using
>ioctls.
> +compress_cacheSupport to use address space of a filesystem 
> managed inode to
> +  cache compressed block, in order to improve cache hit 
> ratio of
> +  random read.
>  inlinecrypt   When possible, encrypt/decrypt the contents of 
> encrypted
>files using the blk-crypto framework rather than
>filesystem-layer encryption. This allows the use of
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index 1696f9183ff5..cb16b0437bd4 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -12,9 +12,11 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> +#include "segment.h"
>  #include 
>  
>  static struct kmem_cache *cic_entry_slab;
> @@ -756,7 +758,7 @@ static int f2fs_compress_pages(struct compress_ctx *cc)
>   return ret;
>  }
>  
> -static void f2fs_decompress_cluster(struct decompress_io_ctx *dic)
> +void f2fs_decompress_cluster(struct decompress_io_ctx *dic)
>  {
>   struct f2fs_sb_info *sbi = F2FS_I_SB(dic->inode);
>   struct f2fs_inode_info *fi = F2FS_I(dic->inode);
> @@ -855,7 +857,8 @@ static void f2fs_decompress_cluster(struct 
> decompress_io_ctx *dic)
>   * page being waited on in the cluster, and if so, it decompresses the 
> cluster
>   * (or in the case of a failure, cleans up without actually decompressing).
>   */
> -void f2fs_end_read_compressed_page(struct page *page, bool failed)
> +void f2fs_end_read_compressed_page(struct page *page, bool failed,
> + block_t blkaddr)
>  {
>   struct decompress_io_ctx *dic =
>   (struct decompress_io_ctx *)page_private(page);
> @@ -865,6 +868,9 @@ void f2fs_end_read_compressed_page(struct page *page, 
> bool failed)
>  
>   if (failed)
>   WRITE_ONCE(dic->failed, true);
> + else if (blkaddr)
> + f2fs_cache_compressed_page(sbi, page,
> + dic->inode->i_ino, blkaddr);
>  
>   if (atomic_dec_and_test(>remaining_pages))
>   f2fs_decompress_cluster(dic);
> @@ -1702,6 +1708,167 @@ void f2fs_put_page_dic(struct page *page)
>   f2fs_put_dic(dic);
>  }
>  
> +const struct address_space_operations f2fs_compress_aops = {
> + .releasepage = f2fs_release_page,
> + .invalidatepage = f2fs_invalidate_page,
> +};
> +
> +struct address_space *COMPRESS_MAPPING(struct f2fs_sb_info *sbi)
> +{
> + return sbi->compress_inode->i_mapping;
> +}
> +
> +void f2fs_invalidate_compress_page(struct f2fs_sb_info *sbi, block_t blkaddr)
> +{
> + if (!sbi->compress_inode)
> + return;
> + invalidate_mapping_pages(COMPRESS_MAPPING(sbi), blkaddr, blkaddr);
> +}
> +
> +void f2fs_cache_compressed_page(struct f2fs_sb_info *sbi, struct page *page,
> + nid_t ino, block_t blkaddr)
> +{
> + struct page *cpage;
> + int ret;
> + struct sysinfo si;
> + unsigned long free_ram, avail_ram;
> +
> + if (!test_opt(sbi, COMPRESS_CACHE))
> + return;
> +
> + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC_ENHANCE))
> + return;
> +
> + si_meminfo();
> + free_ram = si.freeram;
> + avail_ram = si.totalram - si.totalhigh;
> +
> + /* free memory is lower than watermark, deny caching compress page */
> + if (free_ram <=

Re: [f2fs-dev] [PATCH 1/2] f2fs: introduce checkpoint=merge mount option

2021-01-11 Thread Jaegeuk Kim

On 01/11, Daeho Jeong wrote:
> From: Daeho Jeong 
> 
> We've added a new mount option "checkpoint=merge", which creates a
> kernel daemon and makes it to merge concurrent checkpoint requests as
> much as possible to eliminate redundant checkpoint issues. Plus, we
> can eliminate the sluggish issue caused by slow checkpoint operation
> when the checkpoint is done in a process context in a cgroup having
> low i/o budget and cpu shares, and The below verification result
> explains this.
> The basic idea has come from https://opensource.samsung.com.
> 
> [Verification]
> Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
> Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
> 
> In "fg" cgroup,
> - thread A => trigger 1000 checkpoint operations
>   "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
>done"
> - thread B => gererating async. I/O
>   "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
>--filename=test_img --name=test"
> 
> In "bg" cgroup,
> - thread C => trigger repeated checkpoint operations
>   "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
>fsync test_dir2; done"
> 
> We've measured thread A's execution time.
> 
> [ w/o patch ]
> Elapsed Time: Avg. 68 seconds
> [ w/  patch ]
> Elapsed Time: Avg. 48 seconds
> 
> Signed-off-by: Daeho Jeong 
> Signed-off-by: Sungjong Seo 
> ---
>  Documentation/filesystems/f2fs.rst |   6 +
>  fs/f2fs/checkpoint.c   | 176 +
>  fs/f2fs/debug.c|   6 +
>  fs/f2fs/f2fs.h |  24 
>  fs/f2fs/super.c|  53 -
>  5 files changed, 261 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.rst 
> b/Documentation/filesystems/f2fs.rst
> index dae15c96e659..bccc021bf31a 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -247,6 +247,12 @@ checkpoint=%s[:%u[%]] Set to "disable" to turn off 
> checkpointing. Set to "enabl
>hide up to all remaining free space. The actual space 
> that
>would be unusable can be viewed at 
> /sys/fs/f2fs//unusable
>This space is reclaimed once checkpoint=enable.
> +  Here is another option "merge", which creates a kernel 
> daemon
> +  and makes it to merge concurrent checkpoint requests 
> as much
> +  as possible to eliminate redundant checkpoint issues. 
> Plus,
> +  we can eliminate the sluggish issue caused by slow 
> checkpoint
> +  operation when the checkpoint is done in a process 
> context in
> +  a cgroup having low i/o budget and cpu shares.
>  compress_algorithm=%s Control compress algorithm, currently f2fs 
> supports "lzo",
>"lz4", "zstd" and "lzo-rle" algorithm.
>  compress_log_size=%u  Support configuring compress cluster size, the size 
> will
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 897edb7c951a..11288f435dbe 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> @@ -20,6 +21,8 @@
>  #include "trace.h"
>  #include 
>  
> +#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
> +
>  static struct kmem_cache *ino_entry_slab;
>  struct kmem_cache *f2fs_inode_entry_slab;
>  
> @@ -1707,3 +1710,176 @@ void f2fs_destroy_checkpoint_caches(void)
>   kmem_cache_destroy(ino_entry_slab);
>   kmem_cache_destroy(f2fs_inode_entry_slab);
>  }
> +
> +static int __write_checkpoint_sync(struct f2fs_sb_info *sbi)
> +{
> + struct cp_control cpc = { .reason = CP_SYNC, };
> + int err;
> +
> + down_write(>gc_lock);
> + err = f2fs_write_checkpoint(sbi, );
> + up_write(>gc_lock);
> +
> + return err;
> +}
> +
> +static void __checkpoint_and_complete_reqs(struct f2fs_sb_info *sbi)
> +{
> + struct ckpt_req_control *cprc = sbi->cprc_info;
> + struct ckpt_req *req, *next;
> + struct llist_node *dispatch_list;
> + int ret;
> +
> + dispatch_list = llist_del_all(>issue_list);
> + if (!dispatch_list)
> + return;
> + dispatch_list = llist_reverse_order(dispatch_list);
> +
> + ret = __write_checkpoint_sync(sbi);
> + atomic_inc(>issued_ckpt);
> +
> + llist_for_each_entry_safe(req, next, dispatch_list, llnode) {
> + atomic_dec(>queued_ckpt);
> + atomic_inc(>total_ckpt);
> + req->complete_time = jiffies;
> + req->ret = ret;
> + complete(>wait);
> + }
> +}
> +
> +static int issue_checkpoint_thread(void *data)
> +{
> + struct f2fs_sb_info *sbi = data;
> + struct ckpt_req_control *cprc = sbi->cprc_info;
> + wait_queue_head_t *q = >ckpt_wait_queue;
> +repeat:
> + if

[PATCH v2] scsi: ufs: WB is not allowed in RPMB_LUN

2021-01-11 Thread Jaegeuk Kim

From: Jaegeuk Kim 

Kernel stack violation when getting unit_descriptor/wb_buf_alloc_units from
rpmb lun. The reason is the unit descriptor length is different per LU.

The lengh of Normal LU is 45, while the one of rpmb LU is 35.

int ufshcd_read_desc_param(struct ufs_hba *hba, ...)
{
param_offset=41;
param_size=4;
buff_len=45;
...
buff_len=35 by rpmb LU;

if (is_kmalloc) {
/* Make sure we don't copy more data than available */
if (param_offset + param_size > buff_len)
param_size = buff_len - param_offset;
--> param_size = 250;
memcpy(param_read_buf, _buf[param_offset], param_size);
--> memcpy(param_read_buf, desc_buf+41, 250);

[  141.868974][ T9174] Kernel panic - not syncing: stack-protector: Kernel 
stack is corrupted in: wb_buf_alloc_units_show+0x11c/0x11c
}
}

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufs-sysfs.c | 3 ++-
 drivers/scsi/ufs/ufs.h   | 6 --
 drivers/scsi/ufs/ufshcd.c| 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 08e72b7eef6a..50e90416262b 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -792,7 +792,8 @@ static ssize_t _pname##_show(struct device *dev,
\
struct scsi_device *sdev = to_scsi_device(dev); \
struct ufs_hba *hba = shost_priv(sdev->host);   \
u8 lun = ufshcd_scsi_to_upiu_lun(sdev->lun);\
-   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun))   \
+   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun,\
+   _duname##_DESC_PARAM##_puname)) \
return -EINVAL; \
return ufs_sysfs_read_desc_param(hba, QUERY_DESC_IDN_##_duname, \
lun, _duname##_DESC_PARAM##_puname, buf, _size);\
diff --git a/drivers/scsi/ufs/ufs.h b/drivers/scsi/ufs/ufs.h
index 14dfda735adf..7a0069c83900 100644
--- a/drivers/scsi/ufs/ufs.h
+++ b/drivers/scsi/ufs/ufs.h
@@ -552,13 +552,15 @@ struct ufs_dev_info {
  * @return: true if the lun has a matching unit descriptor, false otherwise
  */
 static inline bool ufs_is_valid_unit_desc_lun(struct ufs_dev_info *dev_info,
-   u8 lun)
+   u8 lun, u8 param_offset)
 {
if (!dev_info || !dev_info->max_lu_supported) {
pr_err("Max General LU supported by UFS isn't initialized\n");
return false;
}
-
+   /* WB is not allowed in RPMB_WLUN */
+   if (param_offset == UNIT_DESC_PARAM_WB_BUF_ALLOC_UNITS)
+   return lun < dev_info->max_lu_supported;
return lun == UFS_UPIU_RPMB_WLUN || (lun < dev_info->max_lu_supported);
 }
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 2a715f13fe1d..48cbd4f294dd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3425,7 +3425,7 @@ static inline int ufshcd_read_unit_desc_param(struct 
ufs_hba *hba,
 * Unit descriptors are only available for general purpose LUs (LUN id
 * from 0 to 7) and RPMB Well known LU.
 */
-   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun))
+   if (!ufs_is_valid_unit_desc_lun(>dev_info, lun, param_offset))
return -EOPNOTSUPP;
 
return ufshcd_read_desc_param(hba, QUERY_DESC_IDN_UNIT, lun,
-- 
2.30.0.284.gd98b1dd5eaa7-goog

Re: [PATCH] scsi: ufs: should not override buffer lengh

2021-01-11 Thread Jaegeuk Kim

On 01/11, Can Guo wrote:
> On 2021-01-11 16:15, Avri Altman wrote:
> > > 
> > > Sorry, typo corrected.
> > > 
> > > Hi Jaegeuk,
> > > 
> > > I think the problem is that func ufshcd_read_desc_param() is not
> > > expecting
> > > one access unsupported descriptors on RPMB LU.
> > Correct.
> > This is about wb introducing a new constraint: wb buffer is only
> > allowed in lu 0..7.
> > And this is why, IMHO, the fix should be in ufs_is_valid_unit_desc_lun,
> > To include param offset, as it is only called in contingency of
> > ufshcd_read_desc_param.
> > 
> > Thanks,
> > Avri
> 
> Yeah... Fixing it in ufs-sysfs.c also works. Anyways, the math in
> ufshcd_read_desc_param is already complex. Let's fix it somewhere
> close to the source/initiator.

Thank you, Can and Avri.
I think fixing the lun check makese sense. Let me post v2. :)

> 
> Thanks,
> Can Guo.

[PATCH] scsi: ufs: should not override buffer lengh

2021-01-10 Thread Jaegeuk Kim

From: Jaegeuk Kim 

Kernel stack violation when getting unit_descriptor/wb_buf_alloc_units from
rpmb lun. The reason is the unit descriptor length is different per LU.

The lengh of Normal LU is 45, while the one of rpmb LU is 35.

int ufshcd_read_desc_param(struct ufs_hba *hba, ...)
{
param_offset=41;
param_size=4;
buff_len=45;
...
buff_len=35 by rpmb LU;

if (is_kmalloc) {
/* Make sure we don't copy more data than available */
if (param_offset + param_size > buff_len)
param_size = buff_len - param_offset;
--> param_size = 250;
memcpy(param_read_buf, _buf[param_offset], param_size);
--> memcpy(param_read_buf, desc_buf+41, 250);

[  141.868974][ T9174] Kernel panic - not syncing: stack-protector: Kernel 
stack is corrupted in: wb_buf_alloc_units_show+0x11c/0x11c
}
}

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 2a715f13fe1d..722697b5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3293,8 +3293,12 @@ int ufshcd_read_desc_param(struct ufs_hba *hba,
 
if (is_kmalloc) {
/* Make sure we don't copy more data than available */
-   if (param_offset + param_size > buff_len)
-   param_size = buff_len - param_offset;
+   if (param_offset + param_size > buff_len) {
+   if (buff_len > param_offset)
+   param_size = buff_len - param_offset;
+   else
+   param_size = 0;
+   }
memcpy(param_read_buf, _buf[param_offset], param_size);
}
 out:
-- 
2.30.0.284.gd98b1dd5eaa7-goog

Re: [PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-07 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 16:46, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 16:07, Jaegeuk Kim wrote:
> > > > On 01/07, Can Guo wrote:
> > > > > On 2021-01-07 15:47, Jaegeuk Kim wrote:
> > > > > > From: Jaegeuk Kim 
> > > > > >
> > > > > > This fixes a warning caused by wrong reserve tag usage in
> > > > > > __ufshcd_issue_tm_cmd.
> > > > > >
> > > > > > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
> > > > > > blk_get_request+0x68/0x70
> > > > > > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > > > > > blk_mq_get_tag+0x438/0x46c
> > > > > >
> > > > > > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > > > > > aborting
> > > > > > outstanding commands by waiting a bit for IO completion like this.
> > > > > >
> > > > > > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > > > > >
> > > > > > Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> > > > > > allocate and free TMFs")
> > > > > > Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err
> > > > > > handler")
> > > > >
> > > > > Hi Jaegeuk,
> > > > >
> > > > > Sorry, what is wrong with commit 2355b66ed20c? Clearing pending I/O
> > > > > reqs is a general procedure for handling all non-fatal errors.
> > > >
> > > > Without waiting IOs, I hit the below timeout all the time from
> > > > LINERESET, which
> > > > causes UFS stuck permanently, as mentioned in the description.
> > > >
> > > > "__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out"
> > > 
> > > In that case, ufshcd_try_to_abort_task(), the caller of
> > > __ufshcd_issue_tm_cmd(),
> > > should return -ETIMEOUT, then err_handler would jump to do a full
> > > reset,
> > > then bail.
> > > I am not sure what gets UFS stuck permanently. Could you please
> > > share the
> > > callstack
> > > if possible? I really want to know what is happening. Thanks.
> > 
> > I can't share all the log tho, it entered full reset. While printing out
> > whole registers, the device was hard reset. Thanks,
> 
> Hi Jaegeuk,
> 
> Entering full reset is expected in this case, which is why I am saying
> line-reset handling logic should not be penalized. I think we need to
> find out what caused the hard reset but not just adding a delay before
> clearing pending reqs, because let's say 3 sec expires and you hit the
> same tm req timeout (maybe with a lower possibility), you may still end
> up same at the hard reset. You don't need to share all the log, just the
> last call stacks before hard reset. Is it a QCOM's platform used in your
> case? Can you check the log/dump if NoC error happened?

Hi Can,

I figured out it is caused by verbose kernel logs printed in terminal.
I posted v5, so could you please review it?

Thanks,

> 
> Thanks.
> Can Guo.
> 
> > 
> > > 
> > > Regards,
> > > Can Guo.
> > > 
> > > >
> > > > >
> > > > > Thanks,
> > > > > Can Guo.
> > > > >
> > > > > > Signed-off-by: Jaegeuk Kim 
> > > > > > ---
> > > > > >  drivers/scsi/ufs/ufshcd.c | 35 +++
> > > > > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > > > index e6e7bdf99cd7..340dd5e515dd 100644
> > > > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > > > @@ -44,6 +44,9 @@
> > > > > >  /* Query request timeout */
> > > > > >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > > > > >
> > > > > > +/* LINERESET TIME OUT */
> > > > > > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 
> > > > > > sec */
> > > > > > +
> > > > > >  /* Task management command timeout */
> > > > > >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > > > > >
> > > > > > @@

[PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens

2021-01-07 Thread Jaegeuk Kim

From: Jaegeuk Kim 

When non-fatal error like line-reset happens, ufshcd_err_handler() starts to
abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm request,
we've hit two warnings.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

After fixing the above warnings, I've hit another tm_cmd timeout, which may be
caused by unstable controller state.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. It turned
out ufshcd_print_trs() printed too many messages in console which requires CPU
locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages.
Actually it came from ufshcd_transfer_rsp_status() when requeuing commands back.
Indeed, this is actually not an error case, so let's fix it.

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and 
free TMFs")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..2a715f13fe1d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct 
ufshcd_lrb *lrbp)
break;
} /* end of switch */
 
-   if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
+   if ((host_byte(result) != DID_OK) &&
+   (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
return result;
 }
@@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v5 0/2] Two UFS fixes

2021-01-07 Thread Jaegeuk Kim

Change log from v4:
 - remove RESERVE tag for tm command
 - remove waiting IOs and let full reset handle it
 - avoid verbose error log which causes cpu lock up

[PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-07 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
Reviewed-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
if (ret)
goto out;
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Initialize devfreq after UFS device is detected */
if (ufshcd_is_clkscaling_supported(hba)) {
memcpy(>clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t 
cookie)
pm_runtime_put_sync(hba->dev);
ufshcd_exit_clk_scaling(hba);
ufshcd_hba_exit(hba);
-   } else {
-   ufshcd_clear_ua_wluns(hba);
}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

Re: [PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-07 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 16:07, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 15:47, Jaegeuk Kim wrote:
> > > > From: Jaegeuk Kim 
> > > >
> > > > This fixes a warning caused by wrong reserve tag usage in
> > > > __ufshcd_issue_tm_cmd.
> > > >
> > > > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > > > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > > > blk_mq_get_tag+0x438/0x46c
> > > >
> > > > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > > > aborting
> > > > outstanding commands by waiting a bit for IO completion like this.
> > > >
> > > > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > > >
> > > > Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> > > > allocate and free TMFs")
> > > > Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err
> > > > handler")
> > > 
> > > Hi Jaegeuk,
> > > 
> > > Sorry, what is wrong with commit 2355b66ed20c? Clearing pending I/O
> > > reqs is a general procedure for handling all non-fatal errors.
> > 
> > Without waiting IOs, I hit the below timeout all the time from
> > LINERESET, which
> > causes UFS stuck permanently, as mentioned in the description.
> > 
> > "__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out"
> 
> In that case, ufshcd_try_to_abort_task(), the caller of
> __ufshcd_issue_tm_cmd(),
> should return -ETIMEOUT, then err_handler would jump to do a full reset,
> then bail.
> I am not sure what gets UFS stuck permanently. Could you please share the
> callstack
> if possible? I really want to know what is happening. Thanks.

I can't share all the log tho, it entered full reset. While printing out
whole registers, the device was hard reset. Thanks,

> 
> Regards,
> Can Guo.
> 
> > 
> > > 
> > > Thanks,
> > > Can Guo.
> > > 
> > > > Signed-off-by: Jaegeuk Kim 
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 35 +++
> > > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index e6e7bdf99cd7..340dd5e515dd 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -44,6 +44,9 @@
> > > >  /* Query request timeout */
> > > >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > > >
> > > > +/* LINERESET TIME OUT */
> > > > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 
> > > > sec */
> > > > +
> > > >  /* Task management command timeout */
> > > >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > > >
> > > > @@ -5826,6 +5829,7 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > int err = 0, pmc_err;
> > > > int tag;
> > > > bool needs_reset = false, needs_restore = false;
> > > > +   ktime_t start;
> > > >
> > > > hba = container_of(work, struct ufs_hba, eh_work);
> > > >
> > > > @@ -5911,6 +5915,22 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > }
> > > >
> > > > hba->silence_err_logs = true;
> > > > +
> > > > +   /* Wait for IO completion for non-fatal errors to avoid 
> > > > aborting IOs
> > > > */
> > > > +   start = ktime_get();
> > > > +   while (hba->outstanding_reqs) {
> > > > +   ufshcd_complete_requests(hba);
> > > > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > > > +   schedule();
> > > > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > > > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > > > +   
> > > > LINERESET_IO_TIMEOUT_MS) {
> > > > +   dev_err(hba->dev, "%s: timeout, 
> > > > outstanding=0x%lx\n",
> > > > +   __func__, 
> > > >

Re: [PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-07 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 15:47, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim 
> > 
> > This fixes a warning caused by wrong reserve tag usage in
> > __ufshcd_issue_tm_cmd.
> > 
> > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > blk_mq_get_tag+0x438/0x46c
> > 
> > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > aborting
> > outstanding commands by waiting a bit for IO completion like this.
> > 
> > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > 
> > Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> > allocate and free TMFs")
> > Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err
> > handler")
> 
> Hi Jaegeuk,
> 
> Sorry, what is wrong with commit 2355b66ed20c? Clearing pending I/O
> reqs is a general procedure for handling all non-fatal errors.

Without waiting IOs, I hit the below timeout all the time from LINERESET, which
causes UFS stuck permanently, as mentioned in the description.

"__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out"

> 
> Thanks,
> Can Guo.
> 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 35 +++
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index e6e7bdf99cd7..340dd5e515dd 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -44,6 +44,9 @@
> >  /* Query request timeout */
> >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > 
> > +/* LINERESET TIME OUT */
> > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
> > +
> >  /* Task management command timeout */
> >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > 
> > @@ -5826,6 +5829,7 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > int err = 0, pmc_err;
> > int tag;
> > bool needs_reset = false, needs_restore = false;
> > +   ktime_t start;
> > 
> > hba = container_of(work, struct ufs_hba, eh_work);
> > 
> > @@ -5911,6 +5915,22 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > }
> > 
> > hba->silence_err_logs = true;
> > +
> > +   /* Wait for IO completion for non-fatal errors to avoid aborting IOs
> > */
> > +   start = ktime_get();
> > +   while (hba->outstanding_reqs) {
> > +   ufshcd_complete_requests(hba);
> > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +   schedule();
> > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > +   LINERESET_IO_TIMEOUT_MS) {
> > +   dev_err(hba->dev, "%s: timeout, outstanding=0x%lx\n",
> > +   __func__, hba->outstanding_reqs);
> > +   break;
> > +   }
> > +   }
> > +
> > /* release lock as clear command might sleep */
> > spin_unlock_irqrestore(hba->host->host_lock, flags);
> > /* Clear pending transfer requests */
> > @@ -6302,9 +6322,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > *__hba)
> > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > }
> > 
> > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > -   __func__, intr_status);
> > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > +   !ufshcd_eh_in_progress(hba)) {
> > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> > 0x%08x)\n",
> > +   __func__,
> > +   intr_status,
> > +   hba->ufs_stats.last_intr_status,
> > +   enabled_intr_status);
> > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
> > }
> > 
> > @@ -6348,7 +6372,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > *hba,
> >  * Even though we use wait_event() which sleeps indefinitely,
> >  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> >  */
> > -   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> > +   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> > +   if (IS_ERR(req))
> > +   return PTR_ERR(req);
> > +
> > req->end_io_data = 
> > free_slot = req->tag;
> > WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);

[PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in __ufshcd_issue_tm_cmd.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

And, in ufshcd_err_handler(), we can avoid to send tm_cmd before aborting
outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and 
free TMFs")
Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err handler")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 35 +++
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..340dd5e515dd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
 
+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */
 
@@ -5826,6 +5829,7 @@ static void ufshcd_err_handler(struct work_struct *work)
int err = 0, pmc_err;
int tag;
bool needs_reset = false, needs_restore = false;
+   ktime_t start;
 
hba = container_of(work, struct ufs_hba, eh_work);
 
@@ -5911,6 +5915,22 @@ static void ufshcd_err_handler(struct work_struct *work)
}
 
hba->silence_err_logs = true;
+
+   /* Wait for IO completion for non-fatal errors to avoid aborting IOs */
+   start = ktime_get();
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, outstanding=0x%lx\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
/* release lock as clear command might sleep */
spin_unlock_irqrestore(hba->host->host_lock, flags);
/* Clear pending transfer requests */
@@ -6302,9 +6322,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6372,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v4 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
if (ret)
goto out;
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Initialize devfreq after UFS device is detected */
if (ufshcd_is_clkscaling_supported(hba)) {
memcpy(>clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t 
cookie)
pm_runtime_put_sync(hba->dev);
ufshcd_exit_clk_scaling(hba);
ufshcd_hba_exit(hba);
-   } else {
-   ufshcd_clear_ua_wluns(hba);
}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v4 0/2] UFS bug fixes

2021-01-06 Thread Jaegeuk Kim

Change log from v3:
 - move ufshcd_clear_ua_wluns() after ufshcd_scsi_add_wlus()
 - remove BLK_MQ_REQ_RESERVED for tm tag
 - move IO wait to cover all the non-fatal errors

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 14:57, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > > > When gate_work/ungate_work gets an error during hibern8_enter or exit,
> > > >  ufshcd_err_handler()
> > > >ufshcd_scsi_block_requests()
> > > >ufshcd_reset_and_restore()
> > > >  ufshcd_clear_ua_wluns() -> stuck
> > > >ufshcd_scsi_unblock_requests()
> > > >
> > > > In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery
> > > > flows
> > > > such as suspend/resume, link_recovery, and error_handler.
> > > >
> > > > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd
> > > > resets")
> > > > Signed-off-by: Jaegeuk Kim 
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 15 ++-
> > > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index bedb822a40a3..1678cec08b51 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
> > > > if (ret)
> > > > dev_err(hba->dev, "%s: link recovery failed, err %d",
> > > > __func__, ret);
> > > > +   else
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > 
> > > Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?
> > 
> > May I ask the reason? We'll call it after ufshcd_add_lus() later tho.
> > 
> 
> I think the code will be more readable - we do all the LU related
> stuffs in one func, just nit-picking though. I found this because
> I am planning to move the devfreq init codes out of ufshcd_add_lus()
> due to it is inappropriate to init devfreq in there by its naming,
> but it might be a good place for ufshcd_clear_ua_wluns().

Ok, that looks good to me. Thanks.

> 
> Thanks,
> Can Guo.
> 
> > > 
> > > Thanks,
> > > Can Guo.
> > > 
> > > >
> > > > return ret;
> > > >  }
> > > > @@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > ufshcd_scsi_unblock_requests(hba);
> > > > ufshcd_err_handling_unprepare(hba);
> > > > up(>eh_sem);
> > > > +
> > > > +   if (!err && needs_reset)
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -6940,14 +6945,11 @@ static int
> > > > ufshcd_host_reset_and_restore(struct ufs_hba *hba)
> > > > ufshcd_set_clk_freq(hba, true);
> > > >
> > > > err = ufshcd_hba_enable(hba);
> > > > -   if (err)
> > > > -   goto out;
> > > >
> > > > /* Establish the link again and restore the device */
> > > > -   err = ufshcd_probe_hba(hba, false);
> > > > if (!err)
> > > > -   ufshcd_clear_ua_wluns(hba);
> > > > -out:
> > > > +   err = ufshcd_probe_hba(hba, false);
> > > > +
> > > > if (err)
> > > > dev_err(hba->dev, "%s: Host init failed %d\n", 
> > > > __func__, err);
> > > > ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
> > > > @@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
> > > > enum ufs_pm_op pm_op)
> > > > ufshcd_resume_clkscaling(hba);
> > > > hba->clk_gating.is_suspended = false;
> > > > hba->dev_info.b_rpm_dev_flush_capable = false;
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > > ufshcd_release(hba);
> > > >  out:
> > > > if (hba->dev_info.b_rpm_dev_flush_capable) {
> > > > @@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
> > > > enum ufs_pm_op pm_op)
> > > > cancel_delayed_work(>rpm_dev_flush_recheck_work);
> > > > }
> > > >
> > > > +   ufshcd_clear_ua_wluns(hba);
> > > > +
> > > > /* Schedule clock gating in case of no access to UFS device yet 
> > > > */
> > > > ufshcd_release(hba);

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 14:51, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > > > From: Jaegeuk Kim 
> > > >
> > > > This fixes a warning caused by wrong reserve tag usage in
> > > > __ufshcd_issue_tm_cmd.
> > > >
> > > > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > > > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > > > blk_mq_get_tag+0x438/0x46c
> > > >
> > > > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > > > aborting
> > > > outstanding commands by waiting a bit for IO completion like this.
> > > >
> > > > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > > >
> > > 
> > > Would you mind add a Fixes tag?
> > 
> > Ok.
> > 
> > > 
> > > > Signed-off-by: Jaegeuk Kim 
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 36 
> > > >  1 file changed, 32 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index 1678cec08b51..47fc8da3cbf9 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -44,6 +44,9 @@
> > > >  /* Query request timeout */
> > > >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > > >
> > > > +/* LINERESET TIME OUT */
> > > > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 
> > > > sec */
> > > > +
> > > >  /* Task management command timeout */
> > > >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > > >
> > > > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > >  * check if power mode restore is needed.
> > > >  */
> > > > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > > > +   ktime_t start = ktime_get();
> > > > +
> > > > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > > > if (!hba->saved_uic_err)
> > > > hba->saved_err &= ~UIC_ERROR;
> > > > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > > > *work)
> > > > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > > > needs_restore = true;
> > > > spin_lock_irqsave(hba->host->host_lock, flags);
> > > > +   /* Wait for IO completion to avoid aborting IOs */
> > > > +   while (hba->outstanding_reqs) {
> > > > +   ufshcd_complete_requests(hba);
> > > > +   spin_unlock_irqrestore(hba->host->host_lock, 
> > > > flags);
> > > > +   schedule();
> > > > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > > > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > > > +   
> > > > LINERESET_IO_TIMEOUT_MS) {
> > > > +   dev_err(hba->dev, "%s: timeout, 
> > > > outstanding=0x%lx\n",
> > > > +   __func__, 
> > > > hba->outstanding_reqs);
> > > > +   break;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (!hba->saved_err && !needs_restore)
> > > > goto skip_err_handling;
> > > > }
> > > > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > > > *__hba)
> > > > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > > > }
> > > >
> > > > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > > > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > > > -   __func__, intr_status);
> > > > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > > &g

Re: [PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > When gate_work/ungate_work gets an error during hibern8_enter or exit,
> >  ufshcd_err_handler()
> >ufshcd_scsi_block_requests()
> >ufshcd_reset_and_restore()
> >  ufshcd_clear_ua_wluns() -> stuck
> >ufshcd_scsi_unblock_requests()
> > 
> > In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery
> > flows
> > such as suspend/resume, link_recovery, and error_handler.
> > 
> > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd
> > resets")
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index bedb822a40a3..1678cec08b51 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
> > if (ret)
> > dev_err(hba->dev, "%s: link recovery failed, err %d",
> > __func__, ret);
> > +   else
> > +   ufshcd_clear_ua_wluns(hba);
> 
> Can we put it right after ufshcd_scsi_add_wlus() in ufshcd_add_lus()?

May I ask the reason? We'll call it after ufshcd_add_lus() later tho.

> 
> Thanks,
> Can Guo.
> 
> > 
> > return ret;
> >  }
> > @@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > ufshcd_scsi_unblock_requests(hba);
> > ufshcd_err_handling_unprepare(hba);
> > up(>eh_sem);
> > +
> > +   if (!err && needs_reset)
> > +   ufshcd_clear_ua_wluns(hba);
> >  }
> > 
> >  /**
> > @@ -6940,14 +6945,11 @@ static int
> > ufshcd_host_reset_and_restore(struct ufs_hba *hba)
> > ufshcd_set_clk_freq(hba, true);
> > 
> > err = ufshcd_hba_enable(hba);
> > -   if (err)
> > -   goto out;
> > 
> > /* Establish the link again and restore the device */
> > -   err = ufshcd_probe_hba(hba, false);
> > if (!err)
> > -   ufshcd_clear_ua_wluns(hba);
> > -out:
> > +   err = ufshcd_probe_hba(hba, false);
> > +
> > if (err)
> > dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
> > ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
> > @@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
> > enum ufs_pm_op pm_op)
> > ufshcd_resume_clkscaling(hba);
> > hba->clk_gating.is_suspended = false;
> > hba->dev_info.b_rpm_dev_flush_capable = false;
> > +   ufshcd_clear_ua_wluns(hba);
> > ufshcd_release(hba);
> >  out:
> > if (hba->dev_info.b_rpm_dev_flush_capable) {
> > @@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba,
> > enum ufs_pm_op pm_op)
> > cancel_delayed_work(>rpm_dev_flush_recheck_work);
> > }
> > 
> > +   ufshcd_clear_ua_wluns(hba);
> > +
> > /* Schedule clock gating in case of no access to UFS device yet */
> > ufshcd_release(hba);

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim 
> > 
> > This fixes a warning caused by wrong reserve tag usage in
> > __ufshcd_issue_tm_cmd.
> > 
> > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > blk_mq_get_tag+0x438/0x46c
> > 
> > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > aborting
> > outstanding commands by waiting a bit for IO completion like this.
> > 
> > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > 
> 
> Would you mind add a Fixes tag?

Ok.

> 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 36 
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 1678cec08b51..47fc8da3cbf9 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -44,6 +44,9 @@
> >  /* Query request timeout */
> >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > 
> > +/* LINERESET TIME OUT */
> > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
> > +
> >  /* Task management command timeout */
> >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > 
> > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> >  * check if power mode restore is needed.
> >  */
> > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > +   ktime_t start = ktime_get();
> > +
> > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > if (!hba->saved_uic_err)
> > hba->saved_err &= ~UIC_ERROR;
> > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > needs_restore = true;
> > spin_lock_irqsave(hba->host->host_lock, flags);
> > +   /* Wait for IO completion to avoid aborting IOs */
> > +   while (hba->outstanding_reqs) {
> > +   ufshcd_complete_requests(hba);
> > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +   schedule();
> > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > +   LINERESET_IO_TIMEOUT_MS) {
> > +   dev_err(hba->dev, "%s: timeout, 
> > outstanding=0x%lx\n",
> > +   __func__, hba->outstanding_reqs);
> > +   break;
> > +   }
> > +   }
> > +
> > if (!hba->saved_err && !needs_restore)
> > goto skip_err_handling;
> > }
> > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > *__hba)
> > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > }
> > 
> > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > -   __func__, intr_status);
> > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > +   !ufshcd_eh_in_progress(hba)) {
> > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> > 0x%08x)\n",
> > +   __func__,
> > +   intr_status,
> > +   hba->ufs_stats.last_intr_status,
> > +   enabled_intr_status);
> > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
> > }
> > 
> > @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > *hba,
> >  * Even though we use wait_event() which sleeps indefinitely,
> >  * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
> >  */
> > -   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> > +   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
> > +   BLK_MQ_REQ_NOWAIT);
> 
> Sorry that I didn't pay much attention to

Re: [PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

On 01/07, Can Guo wrote:
> Hi Jaegeuk,
> 
> On 2021-01-07 05:41, Jaegeuk Kim wrote:
> > From: Jaegeuk Kim 
> > 
> > This fixes a warning caused by wrong reserve tag usage in
> > __ufshcd_issue_tm_cmd.
> > 
> > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
> > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > blk_mq_get_tag+0x438/0x46c
> > 
> > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > aborting
> > outstanding commands by waiting a bit for IO completion like this.
> > 
> > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  drivers/scsi/ufs/ufshcd.c | 36 
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index 1678cec08b51..47fc8da3cbf9 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -44,6 +44,9 @@
> >  /* Query request timeout */
> >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > 
> > +/* LINERESET TIME OUT */
> > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
> > +
> >  /* Task management command timeout */
> >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > 
> > @@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> >  * check if power mode restore is needed.
> >  */
> > if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
> > +   ktime_t start = ktime_get();
> 
> I don't see the connection btw line-reset and following tmf cmd.
> My point is that line-reset is not the only non-fatal error which
> leads us to the following tmf cmd. So the wait should be outside
> of this check - just put it right before clearing outstanding reqs.

Ok. Let me move it in v4.

> 
> Thanks,
> Can Guo.
> 
> > +
> > hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
> > if (!hba->saved_uic_err)
> > hba->saved_err &= ~UIC_ERROR;
> > @@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct
> > *work)
> > if (ufshcd_is_pwr_mode_restore_needed(hba))
> > needs_restore = true;
> > spin_lock_irqsave(hba->host->host_lock, flags);
> > +   /* Wait for IO completion to avoid aborting IOs */
> > +   while (hba->outstanding_reqs) {
> > +   ufshcd_complete_requests(hba);
> > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +   schedule();
> > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > +   LINERESET_IO_TIMEOUT_MS) {
> > +   dev_err(hba->dev, "%s: timeout, 
> > outstanding=0x%lx\n",
> > +   __func__, hba->outstanding_reqs);
> > +   break;
> > +   }
> > +   }
> > +
> > if (!hba->saved_err && !needs_restore)
> > goto skip_err_handling;
> > }
> > @@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void
> > *__hba)
> > intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
> > }
> > 
> > -   if (enabled_intr_status && retval == IRQ_NONE) {
> > -   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> > -   __func__, intr_status);
> > +   if (enabled_intr_status && retval == IRQ_NONE &&
> > +   !ufshcd_eh_in_progress(hba)) {
> > +   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
> > 0x%08x)\n",
> > +   __func__,
> > +   intr_status,
> > +   hba->ufs_stats.last_intr_status,
> > +   enabled_intr_status);
> > ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
> > }
> > 
> > @@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
> > *hba,
> >  * Even though we use wait_event() which sleeps indefinitely,
> >  * the maximum wait time is bounded by %TM

Re: [f2fs-dev] [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-06 Thread Jaegeuk Kim

On 01/06, Jaegeuk Kim wrote:
> Hi Chao,
> 
> With a quick test, this patch causes down_write failure resulting in blocking
> process. I didn't dig in the bug so, please check the code again. :P

nvm. I can see it works now.

> 
> On 12/30, Chao Yu wrote:
> > ThreadA ThreadB
> > - f2fs_ioc_start_atomic_write
> > - write
> > - f2fs_ioc_commit_atomic_write
> >  - f2fs_commit_inmem_pages
> >  - f2fs_drop_inmem_pages
> >  - f2fs_drop_inmem_pages
> >   - __revoke_inmem_pages
> > - f2fs_vm_page_mkwrite
> >  - set_page_dirty
> >   - tag ATOMIC_WRITTEN_PAGE and add page
> > to inmem_pages list
> >   - clear_inode_flag(FI_ATOMIC_FILE)
> > - f2fs_vm_page_mkwrite
> >   - set_page_dirty
> >- f2fs_update_dirty_page
> > - f2fs_trace_pid
> >  - tag inmem page private to pid
> > - truncate
> >  - f2fs_invalidate_page
> >  - set page->mapping to NULL
> >   then it will cause panic once we
> >   access page->mapping
> > 
> > The root cause is we missed to keep isolation of atomic write in the case
> > of commit_atomic_write vs mkwrite, let commit_atomic_write helds i_mmap_sem
> > lock to avoid this issue.
> > 
> > Signed-off-by: Chao Yu 
> > ---
> > v2:
> > - use i_mmap_sem to avoid mkwrite racing with below flows:
> >  * f2fs_ioc_start_atomic_write
> >  * f2fs_drop_inmem_pages
> >  * f2fs_commit_inmem_pages
> > 
> >  fs/f2fs/file.c| 3 +++
> >  fs/f2fs/segment.c | 7 +++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 4e6d4b9120a8..a48ec650d691 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > *filp)
> > goto out;
> >  
> > down_write(_I(inode)->i_gc_rwsem[WRITE]);
> > +   down_write(_I(inode)->i_mmap_sem);
> >  
> > /*
> >  * Should wait end_io to count F2FS_WB_CP_DATA correctly by
> > @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > *filp)
> >   inode->i_ino, get_dirty_pages(inode));
> > ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
> > if (ret) {
> > +   up_write(_I(inode)->i_mmap_sem);
> > up_write(_I(inode)->i_gc_rwsem[WRITE]);
> > goto out;
> > }
> > @@ -2073,6 +2075,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> > *filp)
> > /* add inode in inmem_list first and set atomic_file */
> > set_inode_flag(inode, FI_ATOMIC_FILE);
> > clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> > +   up_write(_I(inode)->i_mmap_sem);
> > up_write(_I(inode)->i_gc_rwsem[WRITE]);
> >  
> > f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index d8570b0359f5..dab870d9faf6 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -327,6 +327,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > struct f2fs_inode_info *fi = F2FS_I(inode);
> >  
> > +   down_write(_I(inode)->i_mmap_sem);
> > +
> > while (!list_empty(>inmem_pages)) {
> > mutex_lock(>inmem_lock);
> > __revoke_inmem_pages(inode, >inmem_pages,
> > @@ -344,6 +346,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
> > sbi->atomic_files--;
> > }
> > spin_unlock(>inode_lock[ATOMIC_FILE]);
> > +
> > +   up_write(_I(inode)->i_mmap_sem);
> >  }
> >  
> >  void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
> > @@ -467,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
> > f2fs_balance_fs(sbi, true);
> >  
> > down_write(>i_gc_rwsem[WRITE]);
> > +   down_write(_I(inode)->i_mmap_sem);
> >  
> > f2fs_lock_op(sbi);
> > set_inode_flag(inode, FI_ATOMIC_COMMIT);
> > @@ -478,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
> > clear_inode_flag(inode, FI_ATOMIC_COMMIT);
> >  
> > f2fs_unlock_op(sbi);
> > +
> > +   up_write(_I(inode)->i_mmap_sem);
> > up_write(>i_gc_rwsem[WRITE]);
> >  
> > return err;
> > -- 
> > 2.29.2
> 
> 
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [PATCH v2] f2fs: fix to keep isolation of atomic write

2021-01-06 Thread Jaegeuk Kim

Hi Chao,

With a quick test, this patch causes down_write failure resulting in blocking
process. I didn't dig in the bug so, please check the code again. :P

On 12/30, Chao Yu wrote:
> ThreadA   ThreadB
> - f2fs_ioc_start_atomic_write
> - write
> - f2fs_ioc_commit_atomic_write
>  - f2fs_commit_inmem_pages
>  - f2fs_drop_inmem_pages
>  - f2fs_drop_inmem_pages
>   - __revoke_inmem_pages
>   - f2fs_vm_page_mkwrite
>- set_page_dirty
> - tag ATOMIC_WRITTEN_PAGE and add page
>   to inmem_pages list
>   - clear_inode_flag(FI_ATOMIC_FILE)
>   - f2fs_vm_page_mkwrite
> - set_page_dirty
>  - f2fs_update_dirty_page
>   - f2fs_trace_pid
>- tag inmem page private to pid
>   - truncate
>- f2fs_invalidate_page
>- set page->mapping to NULL
> then it will cause panic once we
> access page->mapping
> 
> The root cause is we missed to keep isolation of atomic write in the case
> of commit_atomic_write vs mkwrite, let commit_atomic_write helds i_mmap_sem
> lock to avoid this issue.
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
> - use i_mmap_sem to avoid mkwrite racing with below flows:
>  * f2fs_ioc_start_atomic_write
>  * f2fs_drop_inmem_pages
>  * f2fs_commit_inmem_pages
> 
>  fs/f2fs/file.c| 3 +++
>  fs/f2fs/segment.c | 7 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 4e6d4b9120a8..a48ec650d691 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2050,6 +2050,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   goto out;
>  
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   /*
>* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> @@ -2060,6 +2061,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
> inode->i_ino, get_dirty_pages(inode));
>   ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
>   if (ret) {
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   goto out;
>   }
> @@ -2073,6 +2075,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   /* add inode in inmem_list first and set atomic_file */
>   set_inode_flag(inode, FI_ATOMIC_FILE);
>   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>  
>   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index d8570b0359f5..dab870d9faf6 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -327,6 +327,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>  
> + down_write(_I(inode)->i_mmap_sem);
> +
>   while (!list_empty(>inmem_pages)) {
>   mutex_lock(>inmem_lock);
>   __revoke_inmem_pages(inode, >inmem_pages,
> @@ -344,6 +346,8 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   sbi->atomic_files--;
>   }
>   spin_unlock(>inode_lock[ATOMIC_FILE]);
> +
> + up_write(_I(inode)->i_mmap_sem);
>  }
>  
>  void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
> @@ -467,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   f2fs_balance_fs(sbi, true);
>  
>   down_write(>i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   f2fs_lock_op(sbi);
>   set_inode_flag(inode, FI_ATOMIC_COMMIT);
> @@ -478,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   clear_inode_flag(inode, FI_ATOMIC_COMMIT);
>  
>   f2fs_unlock_op(sbi);
> +
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(>i_gc_rwsem[WRITE]);
>  
>   return err;
> -- 
> 2.29.2

[PATCH v3 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in __ufshcd_issue_tm_cmd.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

And, in ufshcd_err_handler(), we can avoid to send tm_cmd before aborting
outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1678cec08b51..47fc8da3cbf9 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
 
+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */
 
@@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct *work)
 * check if power mode restore is needed.
 */
if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
+   ktime_t start = ktime_get();
+
hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
if (!hba->saved_uic_err)
hba->saved_err &= ~UIC_ERROR;
@@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct *work)
if (ufshcd_is_pwr_mode_restore_needed(hba))
needs_restore = true;
spin_lock_irqsave(hba->host->host_lock, flags);
+   /* Wait for IO completion to avoid aborting IOs */
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, 
outstanding=0x%lx\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
if (!hba->saved_err && !needs_restore)
goto skip_err_handling;
}
@@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
+   BLK_MQ_REQ_NOWAIT);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
@@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem 
*mmio_base, unsigned int irq)
 
hba->tmf_tag_set = (struct blk_mq_tag_set) {
.nr_hw_queues   = 1,
+   .reserved_tags  = 1,
.queue_depth= hba->nutmrs,
.ops= _tmf_ops,
.flags  = BLK_MQ_F_NO_SCHED,
-- 
2.29.2.729.g45daf8777d-goog

[PATCH v3 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..1678cec08b51 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

[PATH v3 0/2] Two UFS bug fixes

2021-01-06 Thread Jaegeuk Kim

Change log from v2:
 - fix build warning

Re: [f2fs-dev] [PATCH RESEND v2 1/5] f2fs: compress: add compress_inode to cache compressed blocks

2021-01-06 Thread Jaegeuk Kim

On 12/09, Jaegeuk Kim wrote:
> On 12/10, Chao Yu wrote:
> > Hi Daeho, Jaegeuk
> > 
> > I found one missing place in this patch which should adapt
> > "compress vs verity race bugfix"
> > 
> > Could you please check and apply below diff?
> 
> Applied.

Hi Chao,

Could you please rebase this patch on top of Eric's cleanup?

Thanks,

> 
> > 
> > From 61a9812944ac2f6f64fb458d5ef8b662c007bc50 Mon Sep 17 00:00:00 2001
> > From: Chao Yu 
> > Date: Thu, 10 Dec 2020 09:52:42 +0800
> > Subject: [PATCH] fix
> > 
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/data.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 6787a7a03e86..894c5680db4a 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -2271,11 +2271,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, 
> > struct bio **bio_ret,
> > f2fs_load_compressed_page(sbi, page, blkaddr);
> > if (PageUptodate(page)) {
> > if (!atomic_dec_return(>pending_pages)) {
> > -   bool verity =
> > -   f2fs_need_verity(inode, start_idx);
> > -
> > -   f2fs_do_decompress_pages(dic, verity);
> > -   if (verity) {
> > +   f2fs_do_decompress_pages(dic, for_verity);
> > +   if (for_verity) {
> > f2fs_verify_pages(dic->rpages,
> > dic->cluster_size);
> > f2fs_free_dic(dic);
> > -- 
> > 2.29.2
> > 
> > Thanks,
> > 
> > On 2020/12/9 16:43, Chao Yu wrote:
> > > Support to use address space of inner inode to cache compressed block,
> > > in order to improve cache hit ratio of random read.
> > > 
> > > Signed-off-by: Chao Yu 
> > > ---
> > >   Documentation/filesystems/f2fs.rst |   3 +
> > >   fs/f2fs/compress.c | 198 +++--
> > >   fs/f2fs/data.c |  29 -
> > >   fs/f2fs/debug.c|  13 ++
> > >   fs/f2fs/f2fs.h |  34 -
> > >   fs/f2fs/gc.c   |   1 +
> > >   fs/f2fs/inode.c|  21 ++-
> > >   fs/f2fs/segment.c  |   6 +-
> > >   fs/f2fs/super.c|  19 ++-
> > >   include/linux/f2fs_fs.h|   1 +
> > >   10 files changed, 305 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/Documentation/filesystems/f2fs.rst 
> > > b/Documentation/filesystems/f2fs.rst
> > > index dae15c96e659..5fa45fd8e4af 100644
> > > --- a/Documentation/filesystems/f2fs.rst
> > > +++ b/Documentation/filesystems/f2fs.rst
> > > @@ -268,6 +268,9 @@ compress_mode=%s   Control file compression mode. 
> > > This supports "fs" and "user"
> > >choosing the target file and the timing. The 
> > > user can do manual
> > >compression/decompression on the compression 
> > > enabled files using
> > >ioctls.
> > > +compress_cacheSupport to use address space of a filesystem 
> > > managed inode to
> > > +  cache compressed block, in order to improve cache hit 
> > > ratio of
> > > +  random read.
> > >   inlinecrypt  When possible, encrypt/decrypt the contents of 
> > > encrypted
> > >files using the blk-crypto framework rather 
> > > than
> > >filesystem-layer encryption. This allows the 
> > > use of
> > > diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> > > index 4bcbacfe3325..446dd41a7bad 100644
> > > --- a/fs/f2fs/compress.c
> > > +++ b/fs/f2fs/compress.c
> > > @@ -12,9 +12,11 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include "f2fs.h"
> > >   #include "node.h"
> > > +#include "segment.h"
> > >   #include 
> > >   static struct kmem_cache *cic_entry_slab;
> > > @@ -721,25 +723,14 @@ static int f2fs_compress_pages(struct compress_ctx 
> > > *cc)

[PATCH 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-06 Thread Jaegeuk Kim

From: Jaegeuk Kim 

This fixes a warning caused by wrong reserve tag usage in __ufshcd_issue_tm_cmd.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

And, in ufshcd_err_handler(), we can avoid to send tm_cmd before aborting
outstanding commands by waiting a bit for IO completion like this.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1678cec08b51..377da8e98d9b 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -44,6 +44,9 @@
 /* Query request timeout */
 #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
 
+/* LINERESET TIME OUT */
+#define LINERESET_IO_TIMEOUT_MS(3) /* 30 sec */
+
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT 100 /* msecs */
 
@@ -5899,6 +5902,8 @@ static void ufshcd_err_handler(struct work_struct *work)
 * check if power mode restore is needed.
 */
if (hba->saved_uic_err & UFSHCD_UIC_PA_GENERIC_ERROR) {
+   ktime_t start = ktime_get();
+
hba->saved_uic_err &= ~UFSHCD_UIC_PA_GENERIC_ERROR;
if (!hba->saved_uic_err)
hba->saved_err &= ~UIC_ERROR;
@@ -5906,6 +5911,20 @@ static void ufshcd_err_handler(struct work_struct *work)
if (ufshcd_is_pwr_mode_restore_needed(hba))
needs_restore = true;
spin_lock_irqsave(hba->host->host_lock, flags);
+   /* Wait for IO completion to avoid aborting IOs */
+   while (hba->outstanding_reqs) {
+   ufshcd_complete_requests(hba);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   schedule();
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
+   LINERESET_IO_TIMEOUT_MS) {
+   dev_err(hba->dev, "%s: timeout, 
outstanding=%x\n",
+   __func__, hba->outstanding_reqs);
+   break;
+   }
+   }
+
if (!hba->saved_err && !needs_restore)
goto skip_err_handling;
}
@@ -6302,9 +6321,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6371,11 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED |
+   BLK_MQ_REQ_NOWAIT);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
@@ -9355,6 +9382,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem 
*mmio_base, unsigned int irq)
 
hba->tmf_tag_set = (struct blk_mq_tag_set) {
.nr_hw_queues   = 1,
+   .reserved_tags  = 1,
.queue_depth= hba->nutmrs,
.ops= _tmf_ops,
.flags  = BLK_MQ_F_NO_SCHED,
-- 
2.29.2.729.g45daf8777d-goog

[PATCH 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..1678cec08b51 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

[PATCH 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-06 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..1678cec08b51 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

Re: [PATCH v2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-05 Thread Jaegeuk Kim

On 01/05, Martin K. Petersen wrote:
> 
> Jaegeuk,
> 
> > When gate_work/ungate_work gets an error during hibern8_enter or exit,
> >  ufshcd_err_handler()
> >ufshcd_scsi_block_requests()
> >ufshcd_reset_and_restore()
> >  ufshcd_clear_ua_wluns() -> stuck
> >ufshcd_scsi_unblock_requests()
> >
> > In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery 
> > flows
> > such as suspend/resume, link_recovery, and error_handler.
> >
> > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
> > Signed-off-by: Jaegeuk Kim 
> 
> Please resubmit instead of replying to an existing patch. Both b4 and
> patchwork get confused.

Ok, I posted a new one. Thanks,

> 
> Thanks!
> 
> -- 
> Martin K. PetersenOracle Linux Engineering

[PATCH] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-05 Thread Jaegeuk Kim

When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e221add25a7e..29a62552f6f1 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3963,6 +3963,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -5968,6 +5970,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6908,14 +6913,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -8745,6 +8747,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8855,6 +8858,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog

[PATCH] f2fs: handle unallocated section and zone on pinned/atgc

2020-12-23 Thread Jaegeuk Kim

If we have large section/zone, unallocated segment makes them corrupted.

E.g.,

  - Pinned file:   -1 119304647 119304647
  - ATGC   data:   -1 119304647 119304647

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index e81eb0748e2a..229814b4f4a6 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -101,11 +101,11 @@ static inline void sanity_check_seg_type(struct 
f2fs_sb_info *sbi,
 #define BLKS_PER_SEC(sbi)  \
((sbi)->segs_per_sec * (sbi)->blocks_per_seg)
 #define GET_SEC_FROM_SEG(sbi, segno)   \
-   ((segno) / (sbi)->segs_per_sec)
+   (((segno) == -1) ? -1: (segno) / (sbi)->segs_per_sec)
 #define GET_SEG_FROM_SEC(sbi, secno)   \
((secno) * (sbi)->segs_per_sec)
 #define GET_ZONE_FROM_SEC(sbi, secno)  \
-   ((secno) / (sbi)->secs_per_zone)
+   (((secno) == -1) ? -1: (secno) / (sbi)->secs_per_zone)
 #define GET_ZONE_FROM_SEG(sbi, segno)  \
GET_ZONE_FROM_SEC(sbi, GET_SEC_FROM_SEG(sbi, segno))
 
-- 
2.29.2.729.g45daf8777d-goog

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4753 matches

Mail list logo